Let's talk Unicode normalization

28 views

Skip to first unread message

Nathan Smith

unread,

Mar 7, 2013, 7:24:06 PM3/7/13

to openscr...@googlegroups.com

. . . specifically in Python:
http://docs.python.org/2/library/unicodedata.html#unicodedata.normalize

My feeling is that Normal Form C (NFC) is most appropriate for Biblical
texts. I see that is what Weston used in the initial API reference
implementation code.

Is there any compelling reason to use NFKC instead? More info here:
http://www.unicode.org/faq/normalization.html

--
Nathan Smith
http://nathan.smithfam.info/
PGP key ID 0x147aed15

Weston Ruter

unread,

Mar 8, 2013, 12:33:21 AM3/8/13

to Open Scriptures Group

Well, the normalization FAQ says in comparison with NFC, that NFKC will “lose information and [is] thus most appropriate for a restricted domain such as identifiers”. So just on this basis, it seems NFC is better as it would mean no information could be lost. If the application does need an NFKC normalization, then it would be trivial to convert the NFC text it into that form.

--
You received this message because you are subscribed to the Google Groups "Open Scriptures" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openscripture...@googlegroups.com.
To post to this group, send email to openscr...@googlegroups.com.
Visit this group at http://groups.google.com/group/openscriptures?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Weston Ruter
http://weston.ruter.net/
@westonruter - Facebook - Google+ - PGP key - X-Team

Reply all

Reply to author

Forward

0 new messages