icu4c: is there an API for NamedSequences.txt or DoNotEmit.txt?

David Mandelberg

unread,

May 11, 2025, 6:01:08 PMMay 11

to icu-s...@unicode.org

Hi,

I'm working on a new input method that semi-automatically generates a
map from mnemonics to output text using combining characters. E.g.,
given the configuration that l is Latin, e is the letter e, ^ is
circumflex, and ' is acute, it translates le^' into ế without any
additional configuration.

To generate letters that don't have single code point precomposed forms,
I'm currently using CLDR exemplar sets to try to get a list of all
letters from all languages. I'd like to use
https://www.unicode.org/Public/UNIDATA/NamedSequences.txt (and
NamedSequencesProv.txt too) to supplement that, but I don't see any API.
Am I missing some way to get a list of all named sequences with icu4c?

To prevent generating discouraged sequences, I'm currently testing for
the Deprecated property. I'd like to also check
https://www.unicode.org/Public/UNIDATA/DoNotEmit.txt but I don't see an
API for that either. Same question as above, am I missing something?

Markus Scherer

unread,

May 11, 2025, 8:18:09 PMMay 11

to David Mandelberg, icu-s...@unicode.org

We do not currently have APIs for these data files, nor have plans for them.

It sounds like you would need these for offline building of IME models. I suggest you just parse these simple text files directly.

Best regards,

markus

David Mandelberg

unread,

May 12, 2025, 2:18:42 PMMay 12

to Markus Scherer, icu-s...@unicode.org

Op 2025-05-11 om 20:17 schreef Markus Scherer:

> We do not currently have APIs for these data files, nor have plans for them.

Oh well, thanks for confirming that.

> It sounds like you would need these for offline building of IME models.
> I suggest you just parse these simple text files directly.

Currently it's offline, but it would be the same online too, either way
I'd need some way to know when the letter is done.

For now it's not worth writing a custom parser* for this, but I'll look
into that later if I ever really need those files.

* Even a simple one, I literally just found and fixed a segfault in an
unrelated "simple" parser.

Reply all

Reply to author

Forward