UNISTR

Syntax

UNISTR
(
  str  IN text
)
RETURNS text;

Overview

The UNISTR function converts Unicode escape sequences in the input text to actual Unicode characters and returns them.

The supported escape formats are as follows:

  • \XXXX : 4-digit hexadecimal format

  • \uXXXX : 4-digit hexadecimal format (prefixed with a 'u')

  • \+XXXXXX : hexadecimal format consisting of six digits following the ‘+’ symbol.

  • \UXXXXXXXX : 8-digit hexadecimal format following the ‘U’ symbol.

The function converts to the correct Unicode code points through surrogate pair processing, and raises an error if an invalid surrogate pair is detected.

Parameter

Parameter
Description

str

text type; Target text to convert. The Unicode escape sequences within this string are replaced with actual Unicode characters.

Example

-- Example using 4-digit hexadecimal Unicode escape (e.g., \0041 is converted to ‘A’)
SELECT oracle.UNISTR('\0041\0042\0043');
-- result: 'ABC'

 unistr 
--------
 ABC
(1 row)

-- Example including escape sequences in various formats
SELECT oracle.UNISTR('\u0041 \+00420042 \U00000041');

  unistr  
----------
 A 42 A
(1 row)


select oracle.unistr('\0441\043B\043E\043D');

 unistr 
--------
 слон
(1 row)

select oracle.unistr('d\u0061t\U00000061');

 unistr 
--------
 data
(1 row)

-- Example of incorrect format
SELECT oracle.unistr('wrong: \db99');
ERROR:  invalid Unicode surrogate pair

SELECT oracle.unistr('wrong: \db99\0061');
ERROR:  invalid Unicode surrogate pair

SELECT oracle.unistr('wrong: \+00db99\+000061');
ERROR:  invalid Unicode surrogate pair

SELECT oracle.unistr('wrong: \+2FFFFF');
ERROR:  invalid Unicode escape value

SELECT oracle.unistr('wrong: \udb99\u0061');
ERROR:  invalid Unicode surrogate pair

SELECT oracle.unistr('wrong: \U0000db99\U00000061');
ERROR:  invalid Unicode surrogate pair

SELECT oracle.unistr('wrong: \U002FFFFF');
ERROR:  invalid Unicode escape value

Last updated