Tuesday, August 06, 2013

DECOMPOSE this: when what you see is not exactly what you think you see

Always check out the original article at http://www.oraclequirks.com for latest comments, fixes and updates.

I was trying to call function DECOMPOSE while investigating a strange problem I had with some Unicode strings that could not be properly indexed by Oracle Text and, amazingly, I found out that most pages where usage examples of such DECOMPOSE function are given, most likely never attempted to go beyond its default form as given by SQL Reference and I'd dare to say that they didn't even try to execute this function once.

Even more amazing is the fact that the syntax diagrams of DECOMPOSE in *all* versions up until 11.2 are wrong.

But let's start from the beginning.
As per SQL reference example, if you call this

select DECOMPOSE('Crónicas') d from dual;

you should get this result:

D
--------- 
Cro´nicas

Unfortunately what you see in SQL Developer  (or Apex SQL Workshop) is instead:

D
-------- 
Crónicas

So, first of all, apparently either this function doesn't work as advertised (my db is AL32UTF8) or the functional description lacks some important detail.
After various attempts I realized that the "client" software must be playing a hidden role in this messy situation.
If, instead of displaying the resulting string, I get the length of the original string and its decomposed equivalent, I see that the latter is one character longer.


select length('Crónicas') l0
     , length(DECOMPOSE('Crónicas')) l1
  from dual; 

        L0         L1
---------- ----------
         8          9 


Indeed, if I get a hex dump, I can spot the difference:

select rawtohex('Crónicas') s0
     , rawtohex(DECOMPOSE('Crónicas')) s1
  from dual;
 
S0                 S1                 
------------------ --------------------
4372C3B36E69636173 43726FCC816E69636173 

The decomposed string contains two characters:
  1. 6F is "o"
  2. CC81 is a special Unicode character called "Combining acute accent"
So, it turns out that the result displayed in the Oracle SQL Reference is fictitious because the operating system may kick in and recombine the two distinct characters in order to display them according to the Unicode standard and you can't visually spot the difference.

Unfortunately, Oracle Text in 10.2 doesn't seem to cope with these combined accented characters, so you might be looking at two apparently identical strings on your monitor that are the result of distinct unicode character combinations which makes somewhat difficult to understand why one was found by a search query, while the latter wasn't.

But this was just the first part of the intriguing story...

As I already said, DECOMPOSE's syntax diagrams are dead wrong.
According to them, you should be able to call this functions also in the following forms:

select DECOMPOSE('Crónicas' CANONICAL) d from dual;

but all you get is the infamous ORA-00907:

ORA-00907: missing right parenthesis

According to the book "PLSQL Programming", you should be able to call DECOMPOSE as follows:

select DECOMPOSE('Crónicas', CANONICAL) from dual; -- added the comma

but even in this case you hit another error:

ORA-00904: "CANONICAL": invalid identifier

It turns out that the correct form is instead:

select DECOMPOSE('Crónicas','CANONICAL') from dual;
select DECOMPOSE('Crónicas','COMPATIBILITY') from dual;

I was unable to quickly find an example returning different results for the two modes, may be I'll find them later.

As a final note, if you specify a wrong literal as the second parameter, you'll get the following error:

ORA-12702: invalid NLS parameter string used in SQL function

No comments:

yes you can!

Two great ways to help us out with a minimal effort. Click on the Google Plus +1 button above or...
We appreciate your support!

latest articles