I was trying to call function DECOMPOSE while investigating a strange problem I had with some Unicode strings that could not be properly indexed by Oracle Text and, amazingly, I found out that most pages where usage examples of such DECOMPOSE function are given, most likely never attempted to go beyond its default form as given by SQL Reference and I'd dare to say that they didn't even try to execute this function once.
Even more amazing is the fact that the syntax diagrams of DECOMPOSE in *all* versions up until 11.2 are wrong.
But let's start from the beginning.
As per SQL reference example, if you call this
select DECOMPOSE('Crónicas') d from dual;
you should get this result:
Unfortunately what you see in SQL Developer (or Apex SQL Workshop) is instead:
So, first of all, apparently either this function doesn't work as advertised (my db is AL32UTF8) or the functional description lacks some important detail.
After various attempts I realized that the "client" software must be playing a hidden role in this messy situation.
If, instead of displaying the resulting string, I get the length of the original string and its decomposed equivalent, I see that the latter is one character longer.
select length('Crónicas') l0
, length(DECOMPOSE('Crónicas')) l1
L0 L1 ---------- ---------- 8 9
Indeed, if I get a hex dump, I can spot the difference:
select rawtohex('Crónicas') s0 , rawtohex(DECOMPOSE('Crónicas')) s1
S0 S1 ------------------ -------------------- 4372C3B36E69636173 43726FCC816E69636173
The decomposed string contains two characters:
- 6F is "o"
- CC81 is a special Unicode character called "Combining acute accent"
Unfortunately, Oracle Text in 10.2 doesn't seem to cope with these combined accented characters, so you might be looking at two apparently identical strings on your monitor that are the result of distinct unicode character combinations which makes somewhat difficult to understand why one was found by a search query, while the latter wasn't.
But this was just the first part of the intriguing story...
As I already said, DECOMPOSE's syntax diagrams are dead wrong.
According to them, you should be able to call this functions also in the following forms:
select DECOMPOSE('Crónicas' CANONICAL) d from dual;
but all you get is the infamous ORA-00907:
ORA-00907: missing right parenthesis
According to the book "PLSQL Programming", you should be able to call DECOMPOSE as follows:
select DECOMPOSE('Crónicas', CANONICAL) from dual; -- added the comma
but even in this case you hit another error:
ORA-00904: "CANONICAL": invalid identifier
It turns out that the correct form is instead:
select DECOMPOSE('Crónicas','CANONICAL') from dual;
select DECOMPOSE('Crónicas','COMPATIBILITY') from dual;
I was unable to quickly find an example returning different results for the two modes, may be I'll find them later.
As a final note, if you specify a wrong literal as the second parameter, you'll get the following error:
ORA-12702: invalid NLS parameter string used in SQL function