Friday, March 02, 2007

ORA-31011 and ORA-19202


ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00200: could not convert from encoding UTF-8 to UCS2
Error at line 1931


I must admit that when i saw this exception for the first time, i thought it would take a long time to be sorted out.
But, fortunately, things went much better!

For some reason the XML parser of Oracle doesn't like some character in the file, however line 1931 or whatever number is present in the message you've got, doesn't normally match with the line number that you can see if you open the xml file with a text editor like Ultraedit or XMLSpy.

Even if you don't know precisely where is the offending charater, you can be sure that it will look like some weird glyph, it was a square in my case and when i opened the xml file with the hex viewer of Ultraedit, i could see it was some kind of junk character whose hex code was FDFF.

I don't really know why Oracle rejected it if the file was meant to be UTF-8, i'll investigate the problem later if i'll have the time.

I tried to to think of a way of recognizing this kind of situations upfront and with the help of an XSL transformation probably one can get rid of these characters or replace them with other symbols, however, for some reason, my old XMLSpy version seems unable to cope with a translate function containing characters represented by their hex code like ﷿ or at least so does the Evaluate XPath menu function.

My idea was to look for elements matching the following expression:

//elem[contains(translate(.,'﷿','¿'),'¿')]

But XMLSpy failed to find any element, until i copied and pasted the offending character from the xml file directly in place of ﷿.

Later i'll try with a real transformation and if xmlspy fails, i'll stick to the good ole Saxon.


In the meanwhile, happy searching!



Updated March 6, 2007

PS: Well, if you are unfamiliar with Unicode, UTF-8, UCS-2 and other character encoding issues, i bet you'll find this article very helpful and also very entertaining!
For instance now i am finally clear with one of the issues: FDFF must be read the other way around, FFFD, in big-endian mode and it represents a so-called replacement character, that is a placeholder for a character that is not defined in Unicode.

What i am still not clear with is if Oracle should accept this character or not.
I posted a message in the XML DB forum, let's see if the Oracle gurus come up with an answer.

But, at any rate, now i know exactly what to look for in the files.

Updated March 7, 2007

I downloaded Saxon-B version 8.9, i "upgraded" my original XSLT from 1.0 to 2.0 and now, before outputting the text nodes of my elements, i replace any unwanted character with a more readable string.

replace( . ,'�','** U+FFFD **')

See message translations for ORA-31011, ORA-19202, LPX-00200 and search additional resources.



ORA-31011: Analisi XML non riuscita
ORA-19202: Errore durante XML processing

ORA-31011: fallo en el análisis de XML
ORA-19202: Se ha producido un error en el procesamiento de XML

ORA-31011: Ha fallat l'anàlisi XML
ORA-19202: S'ha produït un error en el processament XML

ORA-31011: Echec d'analyse XML
ORA-19202: Une erreur s'est produite lors du traitement la fonction XML ()

ORA-31011: XML-Parsing nicht erfolgreich
ORA-19202: Fehler bei XML-Verarbeitung aufgetreten

ORA-31011: Η ανάλυση XML απέτυχε
ORA-19202: Παρουσιάστηκε σφάλμα στην επεξεργασία XML

ORA-31011: XML-analyse fejlede
ORA-19202: Fejl opstod ved XML-behandling

ORA-31011: XML-analys misslyckades
ORA-19202: Ett fel uppstod vid XML-bearbetningen

ORA-31011: XML-analysen mislyktes
ORA-19202: Det oppstod en feil i XML-behandlingen

ORA-31011: XML-jäsennys epäonnistui
ORA-19202: Virhe XML-käsittelyssä

ORA-31011: Az XML-elemzés nem sikerült
ORA-19202: Hiba lépett fel az XML-feldolgozás során:

ORA-31011: Nu s-a reuşit analizarea XML
ORA-19202: Eroare la procesarea XML

ORA-31011: Ontleden van XML is mislukt.
ORA-19202: Fout in XML-verwerking ().

ORA-31011: falha na análise XML
ORA-19202: Ocorreu um erro no processamento XML

ORA-31011: Falha na análise de XML
ORA-19202: Ocorrência de erro no processamento de XML

ORA-31011: сбой разбора XML
ORA-19202: Возникла ошибка при обработке XML

ORA-31011: selhala analýza XML
ORA-19202: Vyskytla se chyba při zpracování XML

ORA-31011: Syntaktická analýza XML zlyhala
ORA-19202: Pri spracovaní XML sa vyskytla chyba

ORA-31011: Niepowodzenie analizy składniowej XML
ORA-19202: Wystąpił błąd podczas przetwarzania XML

ORA-31011: XML ayrıştırılamadı
ORA-19202: XML işlenirken hata ortaya çıktı

7 comments:

hadeath82 said...

I've the same problem when I use updateXML with this character ('éàè...').

have you find a solution ?

thanks

Byte64 said...

Jacques,
do you mean the horizontal ellipsis character (the three dots) or any of those accented characters?

Anonymous said...

i also have a problem with XML and accented letters.
the à should be converted, but it is not

Byte64 said...

Anonymous,
the only time i had problems with accented characters in XML file was when the database character set was not AL32UTF8 (it was WE8ISO8859P1).
At that time every "à" was converted into a double character string.
Also the euro symbol was a major problem until the database was migrated to AL32UTF8.

Does this scenario look like yours?

Anonymous said...

Its the : character that is giving me a problem . Please let me konw if u have a solution.

Thanks
-Pradip (pradipc@gmail.com)

Byte64 said...

Pradip,
you mean you are getting this error because your xml file contains a standard ASCII "colon" character?

Anonymous said...

I think yes because in the xml i just said
xslprocessor.selectNodes(xmldom.makeNode(l_doc),'/soapenv:Envelope');
and this give me the error.

yes you can!

Two great ways to help us out with a minimal effort. Click on the Google Plus +1 button above or...
We appreciate your support!

latest articles