Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

UTF-8B support for libiconv

6 views
Skip to first unread message

Ben Wiley Sittler

unread,
Apr 2, 2006, 11:51:38 PM4/2/06
to
[ this is in response to a truly ancient linux-utf8 thread ]

i wrote a patch that provides UTF-8 + binary in one codec with no
hand-waving, using Markus Kuhn's brilliant proposal to encode invalid
bytes 0xyz using unpaired surrogates U+DCyz. this means there need not
be a text/binary distinction for UTF-8-using programs. legal UTF-8
decodes/encodes correctly, and other bytes are handled as "opaque"
U+DCxx on input and correctly serialized on output. so one can once
again consider editing a binary format with a "notepad"-type editor
without sacrificing internationalization support.

Markus Kuhn's description of the idea: (search for "option d")

http://mail.nl.linux.org/linux-utf8/2000-07/msg00040.html

the patch:

http://xent.com/~bsittler/libiconv-1.9.1-utf-8b.diff

enjoy! (not sure how/whether this fits into the official distro, but i
hope it gets used)

-ben

--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/


0 new messages