If you are looking for a concise yet funny guide to the differences between UTF-8, UTF-16, UCS-2, UCS-4 and so on, you'd better off checking out "the Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets".
So, this morning i needed to find out the Unicode value for the special punctuation character "…".
This character is catalogued in the Unicode character index by name as horizontal ellipsis and its hex code point is 2026.
As the database character set is defined as "AL32UTF8", my first temptation was to convert this hex number into its decimal equivalent:
select chr(8230) c from dual;But what I've got here is the ASCII character 38.
C
-
&
At first one could think that this caused by a wrong NLS setting or by the fact that perhaps function CHR does not return multibyte characters.
Nothing of the kind.
This result is easily explained trying out the following:
select chr(mod(8230,256)) cMoreover, if the db character set is AL32UTF8:
from dual;
C
-
&
select chr(14844070) cSo, CHR can return multibyte characters, indeed it returns any character in the database character set.
from dual;
C
-
…
But what has to do 14844070 with 8230?
My short answer is nothing.
I presume 14844070 is the numeric code in the internal UCS2 representation of this Unicode character that I've got from:
select ascii('…') nAnyway, while it could be interesting to know where this 14844070 comes from in the end, there is an easy way to get a character from its unicode code point:
from dual;
N
-
14844070
select unistr('\2026') cThen, if for some reason you need to know the numeric code of a character starting from its unicode code point, the easiest way is to run:
from dual;
C
-
…
select ascii(unistr('\2026')) nAgain, the numeric value returned may vary with the database character set, although i could not try this out because i couldn't find any database in a different character set.
from dual;
N
-
14844070
Another interesting function is ASCIISTR that is converting a string into a plain ASCII string where non-ASCII characters are represented by their UTF-16 code unit.
select asciistr('hello world…') sThis function can be especially useful if you need to dump data into a plain ASCII file (i.e. not UTF-8 or UTF-16 encoded).
from dual;
S
-
hello world\2026
For an Oracle official document discussing the topic, especially for the maximum string length implications for database columns and PL/SQL variables, see the Oracle Globalization Support Guide.
PS: on Windows another way of getting a unicode code point (or copying the character to the clipboard directly for pasting into a SQL source...) is to search for the desired character using the accessories/system/character map.
4 comments:
Thanks for the helpful information! :=
The value that Oracle shows for ASCII('…') (Unicode ellipsis character, U+2026) is 14844070. In hexadecimal, this is 0xE280A6. This corresponds to the UTF-8 encoding of this character taken as a big-endian number.
Cheers, Adrian
Thanks for the information Adrian.
Database1: UTF8
------------------------
select chr(233) from dual --> null
select chr(50089) from dual --> é
select ascii('é') from dual --> 50089
select ascii(tablename.fieldname) from tablename --> 50089
Database2: AL16UTF16
-----------------------
select chr(233) from dual --> null
select chr(50089) from dual --> é
select ascii('é') from dual --> 50089
select ascii(substr(product_description,2,1)) from product_master_data where product_code = '6' --> 233
why only in case of query its showing 233 I don't understand.
Post a Comment