I’m finalising my personNames implementation for the Elixir language and some tests are failing in the `zh` locale when taking the initials of a name. The base issue is that Unicode utils (
https://util.unicode.org/UnicodeJsps/breaks.jsp) shows no word break in “德威” (ie its treated as one word) but it does find a word break in “东升” (treated as two words).
As best I can tell, the `common/segments/root.xml` is the CLDR source of the Unicode Segmentation algorithm (UAX 29) and I can’t see a rule that would place a break in “德威”. More perplexing is that `root.xml` content for word breaks says specifically:
<!-- Otherwise, break everywhere (including around ideographs). —>
How to I work out what rules CLDR ia applying when word segmenting text, specifically Hans/Hant text?
Thanks for any help or pointers.