Thursday, December 21, 2006

The strange case of oracle text query rewrite templates: ORA-29902

I don't really expect that you have ever used an oracle text query rewrite template during your life as a developer.
The more i delve into this matter, the more i feel like a missionaire.

I post questions in the forums and i get virtually no answers.
I search for solutions on metalink and i can't find anything significant.

Am i really the only dude who ever attempted to use query templates most advanced features since they exist?

End of rant.

And now for the freshly baked quirk of the day.

I created a query rewrite template in order to expand a text search to include stemming and prefixes. I understand that at this point i've started talking to a few fellows, so i won't get upset if you leave the discussion now ;-)

You can see my original postings on Oracle Text Forum.
After a while i realized that my original xml query template was flawed by a syntax error, there was a missing < sign before the second closing SEQ element.
It wasn't just a typo, i cut and pasted the original query snippet taken from my SQL Command window, so i had actually executed that piece of xml.

Or at least i thought.

I suddenly felt excited by this finding because i expected that some unexpected failures that i was experiencing when adding more rewrite rules could be caused by the invalid syntax, so i set out and repeated all my previous attempts after correcting the problem.

But i got puzzling results. Sometimes i got syntax errors like:

ORA-29902: error in executing ODCIIndexStart() routine ORA-20000: Oracle Text error:

Sometimes not.
Sometimes i got less rows than expected.

I soon realized that the position of the invalid SEQ block is important.
If the offending SEQ block is in the last position inside the PROGRESSION block, then an error is raised. If it is followed by a valid SEQ block, not only the syntax error is not raised, but the rewrite rule is partially executed, that is the first TRANSFORM rule is applied, but the following one is ignored. If a third one is present, it will be executed as well.
And don't think it was easy to spot this.

So, in the end, i don't know how Oracle Text is crunching this xml fragment. Probably it's just giving a small bite, spitting what tastes bad.
Initially i thought it was parsing the string using the internal XML parser, but now i think it is not, otherwise the invalid syntax should be detected at all times.

Perhaps oracle text is parsing this string with some other homebrew routine that accepts a fake XML fragment for performance reasons. In the end there would be some overhead if the full fledged XML parser were invoked just for parsing a short xml fragment.

But i don't really expect anyone to confirm or deny this.

Happy parsing.

ORA-29902: errore durante l'esecuzione della routine ODCIIndexStart()
ORA-29902: error al ejecutar la rutina ODCIIndexStart()
ORA-29902: error en executar la rutina ODCIIndexStart()
ORA-29902: erreur d'exécution de la routine ODCIIndexStart()
ORA-29902: Fehler bei der Ausführung von Routine ODCIIndexStart()
ORA-29902: σφάλμα στην εκτέλεση της ρουτίνας ODCIIndexStart()
ORA-29902: fejl under udførelse af ODCIIndexStart()-rutine
ORA-29902: ett fel inträffade när rutinen ODCIIndexStart() kördes
ORA-29902: feil ved utføring av ODCIIndexStart()-rutine
ORA-29902: virhe suoritettaessa ODCIIndexStart()-rutiinia
ORA-29902: hiba történt az ODCIIndexStart() rutin végrehajtása során
ORA-29902: eroare la execuţia rutinei ODCIIndexStart()
ORA-29902: Fout bij uitvoering van ODCIIndexStart()-routine.
ORA-29902: erro na execução da rotina ODCIIndexStart()
ORA-29902: erro ao executar a rotina ODCIIndexStart()
ORA-29902: ошибка в выполнении программы ODCIIndexStart()
ORA-29902: při vykonávání rutiny ODCIIndexStart() se vyskytla chyba
ORA-29902: chyba pri vykonávaní rutiny ODCIIndexStart()
ORA-29902: błąd podczas wykonywania podprogramu ODCIIndexStart()
ORA-29902: ODCIIndexStart() programını yürütme hatası


Gabriel Molina said...

I have another strange case with text query templates.There is no documentation (or I can't find it anywhere) about query template syntax.
When writing something like this:
transform((TOKENS, "fuzzy({", "},70,150,W)", "ACCUM"))
I get the described error.
But a subtle change, (a whitespace after ACCUM works:
transform((TOKENS, "fuzzy({", "},70,150,W)", "ACCUM "))
I think it is an error of whatever it uses to parse the pseudo-xml.
But its not the only strange behaviour. Performance problems arise when using a template and score() function. I have over a million documents indexed, and when using a query template and order by score() the query executes in about 2 minutes (or more) while when writing the query (as I guess the rewrite function should be doing it) it only last at most 2 seconds.

Byte64 said...

Hola Gabriel,
good to know.
I guess that all these nice features sometimes show odd behaviors because there isn't such a big audience for these advanced use cases, so bugs are more likely to occur as you start going fancy with all the possibilities that the Oracle Text module offers.

Hasta luego

Gabriel Molina said...

Thanks for the reply. I'm glad that you speak a little Spanish.
I don't think that query templates are so fancy. It would be very useful if worked as expected. Maybe people are using another technologies instead of Oracle Text (Apache Lucene, Solr, etc) that are more configurable.
By the way, it's very helpful to know that this is a known error and that I'm not the only one who is using this module.


Byte64 said...

if you want me to speak some more Spanish, we'd be better off talking about mexican food, that's the specialty of the house.

Que tengas un buen día.

yes you can!

Two great ways to help us out with a minimal effort. Click on the Google Plus +1 button above or...
We appreciate your support!

latest articles