Wednesday, May 31, 2023

Handling of non-ASCII characters in JSON objects in a non-AL32UTF8 database

Always check out the original article at http://www.oraclequirks.com for latest comments, fixes and updates.
declare
 js   json_object_t;
begin
 js := json_object_t.parse('
{
  "name"       : "whatever",
  "values" : [
              {"name" : "x", "type" : "number", "value" : 2023},
              {"name" : "y", "type" : "string", "value" : "some non-ASCII char àùì"}
             ]
}
');
end;
/

Error report -
ORA-40441: JSON syntax error
ORA-06512: at "SYS.JDOM_T", line 4
ORA-06512: at "SYS.JSON_OBJECT_T", line 86
ORA-06512: at line 4
40441. 00000 -  "JSON syntax error"
*Cause:    The provided JavaScript Object Notation (JSON) data had invalid
           syntax and could not be parsed.
*Action:   Provide JSON data with the correct syntax.

 

Dealing with non-ASCII characters inside JSON objects in a non-AL32UTF8 database requires definitely some extra work, depending also on the final use of the information.

 For instance, if JSON's contents are consumed by a browser, the following escape method works well:

declare
js json_object_t;
begin
js := json_object_t.parse(REGEXP_REPLACE(ASCIISTR('
{
"name" : "whatever",
"values" : [
{"name" : "x", "type" : "number", "value" : 2023},
{"name" : "y", "type" : "string", "value" : "some non-ASCII char àùì"}
]
}
'), '\\([0-9A-F]{4})', '&#x\1;'));
end;
/

Escaping the characters with ASCIISTR only doesn't work because the resulting presence of the "\" characters also breaks the parser as "\" is normally used to escape JSON "reserved" characters, such as \ and ".

If the data need to be converted back from the JSON then the reverse process must be applied:

select
unistr(regexp_replace('some non-ASCII char àùì',
'&#x([0-9A-F]+);',
'\\\1',1,0)) original
from dual;

 

If a lossy character translation is acceptable then CONVERT function could be used.

declare
js json_object_t;
begin
js := json_object_t.parse(CONVERT('
{
"name" : "whatever",
"values" : [
{"name" : "x", "type" : "number", "value" : 2023},
{"name" : "y", "type" : "string", "value" : "some non-ASCII char àùì"}
]
}
','US7ASCII'));
end;
/
-- equivalent to:
declare
 js   json_object_t;
begin
 js := json_object_t.parse('
{
  "name"       : "whatever",
  "values" : [
              {"name" : "x", "type" : "number", "value" : 2023},
              {"name" : "y", "type" : "string", "value" : "some non-ASCII char aui"}
             ]
}
');
end;
/

Please note that in a database with AL32UTF8 character set, the first PL/SQL block would have been processed just fine without the need of escaping anything.

No comments:

yes you can!

Two great ways to help us out with a minimal effort. Click on the Google Plus +1 button above or...
We appreciate your support!

latest articles