Re: [icu-support] Unicode property

9 views

Skip to first unread message

Markus Scherer

unread,

Mar 10, 2025, 2:41:38 AMMar 10

to icu-support, Georges MURR

Oh, and please subscribe to, and use, the icu-support mailing list on unicode.org. We have moved off of the SourceForge lists.

On Sun, Mar 9, 2025 at 11:37 PM Markus Scherer <marku...@gmail.com> wrote:

Hi Georges,

On Sun, Mar 9, 2025 at 5:34 PM Georges MURR via icu-support <icu-s...@lists.sourceforge.net> wrote:
Is there way to find in which Unicode version the Unicode property of a character was defined? For instance I would like to know when the Unicode property of the surrogate pair
\uD804\uDFB8 was defined.

First, this is not really an ICU (library) question but a question about the Unicode encoding standard.
https://www.unicode.org/consortium/distlist.html

Second, you won't find much by looking for the escaped UTF-16 notation of a character. The data files and documentation use code points. In this case, U+113B8.

Third, there are something like 100 Unicode properties. Which one are you looking for?

FYI: Just a simple web search for "113B8" gives me some useful results; in this case, the second one is https://codepoints.net/U+113B8?lang=en which says right in the result snippet:
“U+113B8 TULU-TIGALARI VOWEL SIGN AA: 𑎸 – Unicode U+113B8 was added in Unicode version 16.0 in 2024. ...”

In the ICU repo, you can find a file with a significant subset of the Unicode character data in reformatted form:
https://raw.githubusercontent.com/unicode-org/icu/refs/heads/main/icu4c/source/data/unidata/ppucd.txt

Search for 113B8 -->
cp;113B8;gc=Mc;GCB=EX;-Gr_Base;Gr_Ext;-IDS;InCB=Extend;InPC=Right;InSC=Vowel_Dependent;lb=CM;na=TULU-TIGALARI VOWEL SIGN AA;NFC_QC=M;NFKC_QC=M;SB=EX;WB=Extend;-XIDS

This is specific to ICU but uses Unicode short property names and short property value names.
See here for the gory details: https://unicode-org.github.io/icu/design/props/ppucd

I hope this helps,
markus

Reply all

Reply to author

Forward

0 new messages