Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

German collation routines for YottaDB UTF-8 mode

68 views
Skip to first unread message

K.S. Bhaskar

unread,
Nov 21, 2021, 3:20:45 PM11/21/21
to
Characters in Unicode order are often not the linguistically or culturally correct order. For example, from YottaDB in UTF-8 mode:

YDB>set sz="ß",SZ="ẞ" write $ascii(sz)," ",$ascii(SZ)
223 7838
YDB>set umch="äëïöüÿÄËÏÖÜŸ" for i=1:1:$length(umch) write $ascii($extract(umch,i))," "
228 235 239 246 252 255 196 203 207 214 220 376
YDB>write "Öhman"]"Pfaff"," ","Ohman"]"Pfaff"
1 0
YDB>write "Öhman"]]"Pfaff"," ","Ohman"]]"Pfaff"
1 0
YDB>

Has anyone developed collation routines (https://docs.yottadb.com/ProgrammersGuide/internatn.html#creating-the-alternate-collation-routines) so that YottaDB can correctly display German words and names? Thank you very much.

Regards
– Bhaskar

ed de moel

unread,
Nov 22, 2021, 3:13:17 PM11/22/21
to
I don't have any code for this "on the shelf", but I'd start by going through the strings, and replacing all the compound characters with their components, i.e. translate "ä" into "ae", "ß" into "sz", etc., and then comparing them in the "old-fashioned" way.
(which would work for most cases, my German isn't too good, but I am aware that "ß" sometimes should become "sz" and sometimes "ss"...)

Hope this works as a starting point,
Ed

K.S. Bhaskar

unread,
Nov 22, 2021, 3:39:34 PM11/22/21
to
Thanks Ed. That's a good suggestion, but for performance reasons, the database engine doesn't quite work that way. It requires a forward transformation, which should be fairly straightforward (e.g., ä→ae), but the reverse transformation is not always clear (e.g., should all occurrences of ae in subscripts be converted to ä)? But this gives me something to think about.

Regards
– Bhaskar

Jens

unread,
Nov 23, 2021, 8:13:10 AM11/23/21
to
I'm german, but I wasn't sure about the correct sort-order.
It seems that there are two options:

1. ä=a, ö=o, ü=u, ß=ss ------ Example: Bäcker->Bader->Bäder->Busse->Buße
2. ä=ae,ö=oe,ü=ue, ß=ss ---- Example: Bader->Bäcker->Bäder->Busse->Buße

Both versions are used in some cases. MS Word uses option 1, Phonebooks are sorted like option 2.

Hope, this helps.

Jens


K.S. Bhaskar

unread,
Nov 23, 2021, 10:16:16 AM11/23/21
to
Thanks Jens. I was trying to help a German friend who uses YottaDB to index some text as a personal, post-retirement project. It seems that things are not as simple as I thought! Out of curiosity, what sorting order do dictionaries use?

Regards
– Bhaskar

Jens

unread,
Nov 23, 2021, 10:30:33 AM11/23/21
to
I just looked into a German/English dictionary and this is sorted like option 1

Regards Jens

PS: if I can help your friend in any way, I would do so. I still like coding in M
PSS: Just working on the Visual Studio Code extension to check correct NEWing of M local variables. :-)

K.S. Bhaskar

unread,
Nov 23, 2021, 2:48:09 PM11/23/21
to
Jens –

My friend would be glad of any assistance. Would you please send your e-mail address to me: bhaskar at yottadb dot com? Thank you very much in advance.

Regards
– Bhaskar
0 new messages