Right after I finish the locale data verification, I'll be working on this
stuff full time with the alias information. More help is always
appreciated :-)
The <uconv_class> has a very important distinction between MBCS and DBCS.
The DBCS has an implied state table, unless overridden. See User's Guide
for details.
All the ibm-* tables are generated directly from the IBM official Unicode
charset table repository. The other tables are generated by various tools
by collecting it directly from the platform. Most of the tools have small
flaws that I keep on stumbling over. I try to fix them when I encounter a
bug. Unfortunately, those tools create some other tables that I can't put
into the CVS repository because I get really weird data. For example,
none of the tools can handle iso-2022, EBCDIC stateful or some other
platform converter that has a bizarre behavior.
Generating the correct state table does take some time. Part of the
problem is that there are so many tools to do this, and they all do it
differently. Some try to figure out the state table, others parse the
convrtrs.txt file to find the alias for a converter and then it puts in a
precanned state table. The various tools need to be integrated a bit
more. The previous authors had the very wrong assumption that the tools
would only be used once.
The <lead_bytes> tag is ignored. That was put in accidentally by the UCM
generation tool from Windows from a previous author. Oops.
Generating the reverse fallbacks from Windows is still very difficult,
especially since Windows is inconsistent on their C and COM APIs. I tried
to file a bug report to Microsoft, but no one would listen at Microsoft.
So I gave up on the bug report.
There are many other issues for creating correct information for each
platform. If you have any other issues with my tools, you can ask me
directly.
George Rhoten
IBM Globalization Center of Competency/ICU San Jose, CA, USA
Yves Arrouye <
yv...@realnames.com>
02/11/2002 02:53 PM
To: George Rhoten/Cupertino/IBM@IBMUS
cc: "'
icu-ch...@oss.software.ibm.com'"
<
icu-ch...@www-126.southbury.usf.ibm.com>
Subject: RE: Charset roundtrip info