Paul, I investigated this some more. It turns out that we do have a
defective mapping table file in ICU.
The one that we got generated from a different team for ICU 1.7 (I think)
incorrectly marks the mapping of U+3000 to A1A1 as a "substitution
mapping", which gets ignored in ICU because it has user-customizable
handling of substitution etc.
The file in our mapping table repository has this pair marked properly as
a roundtrip mapping. Please download the correct file from
http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/ucm/ibm-1383_P110-2000.ucm?rev=1.1&content-type=text/plain
and replace the ibm-1383.ucm file in your ICU build.
This does not show up when converting from Unicode to GB 2312 because,
using the default converter fallback, you still get A1A1 as output for
U+3000 because that is the substitution character.
The problem is when converting from GB 2312 to Unicode, where A1A1 is
marked as unassigned and the default callback writes the Unicode
substitution character U+FFFD.
I am going to submit a bug report, and we will fix the mapping table for
ICU 2.1.
Thank you very much for bringing this to our attention!
2002-01-30 02:03 PM
To: Markus Scherer/Cupertino/IBM@IBMUS, George Rhoten/Cupertino/IBM@IBMUS
cc: <
icu-ch...@www-124.southbury.usf.ibm.com>
Subject: RE: Troube with GB2312
#### zh.html has been removed from this note on January 30 2002 by Markus
Scherer