Re: [icu-support] Unicode property

8 views
Skip to first unread message

Markus Scherer

unread,
Mar 10, 2025, 2:41:38 AMMar 10
to icu-support, Georges MURR
Oh, and please subscribe to, and use, the icu-support mailing list on unicode.org. We have moved off of the SourceForge lists.

On Sun, Mar 9, 2025 at 11:37 PM Markus Scherer <marku...@gmail.com> wrote:
Hi Georges,

On Sun, Mar 9, 2025 at 5:34 PM Georges MURR via icu-support <icu-s...@lists.sourceforge.net> wrote:
Is there  way to find in which Unicode version the Unicode property of a character was defined? For instance I would like to know when the Unicode property of the surrogate pair
\uD804\uDFB8 was defined.

First, this is not really an ICU (library) question but a question about the Unicode encoding standard.

Second, you won't find much by looking for the escaped UTF-16 notation of a character. The data files and documentation use code points. In this case, U+113B8.

Third, there are something like 100 Unicode properties. Which one are you looking for?

FYI: Just a simple web search for "113B8" gives me some useful results; in this case, the second one is https://codepoints.net/U+113B8?lang=en which says right in the result snippet:
“U+113B8 TULU-TIGALARI VOWEL SIGN AA: 𑎸 – Unicode    U+113B8 was added in Unicode version 16.0 in 2024. ...”

In the ICU repo, you can find a file with a significant subset of the Unicode character data in reformatted form:

Search for 113B8 -->
cp;113B8;gc=Mc;GCB=EX;-Gr_Base;Gr_Ext;-IDS;InCB=Extend;InPC=Right;InSC=Vowel_Dependent;lb=CM;na=TULU-TIGALARI VOWEL SIGN AA;NFC_QC=M;NFKC_QC=M;SB=EX;WB=Extend;-XIDS

This is specific to ICU but uses Unicode short property names and short property value names.

I hope this helps,
markus
Reply all
Reply to author
Forward
0 new messages