IBM codepoint to Unicode conversion question

1 view
Skip to first unread message

BGut...@xenos.com

unread,
Mar 25, 2002, 1:45:38 PM3/25/02
to icu-ch...@www-126.southbury.usf.ibm.com

Hello, we have been successfully using the ICU tables for converting AFP DBCS data into Unicode.  However, we've hit a glitch with the font X0M24T, whose codepages have a GraphicCharSetGID of 0 - so we cannot find the matching ICU table.

All charsets have the name C0TB00##, where ## is the section code (first hex byte of DBCS)
All codepages have the name T1T000##, where ## is the section code (first hex byte of DBCS)
The Typeface string is always: TRD CHINESE MING 22X22 #24X24#
The glyph names are IK0####0, where #### "appears" to be the hex DBCS code point.

There are 13792 glyph names:

IK041410
IK041420
IK041430......

Can you offer any advise on translating text that uses this font into Unicode?


Billy Guthrie
Chief Product Advisor
xenos | the data to e-content company(tm)
Main:   +1 972 857 0776
Fax:    +1 972 857 0979
Email:  mailto:bgut...@xenos.com
Web:    http://www.xenos.com/

Markus Scherer

unread,
Mar 25, 2002, 6:55:26 PM3/25/02
to BGut...@xenos.com, icu-ch...@www-126.southbury.usf.ibm.com
Hello,

Unfortunately, we are not very familiar with font encodings, i.e., with
mappings from some character encoding to a glyph encoding (mapping
characters to glyph IDs). We mostly deal with mapping tables between
character encodings.
You may want to contact the maker of this particular font, or font
companies like Agfa/Monotype and Adobe.

Some fonts have several mapping tables. In addition to the table that you
are seeing, there may also be a Unicode CMAP (a table that maps from
Unicode code points to glyph IDs) for the same font. You could then
compare the result glyph IDs between the encoding table and the Unicode
CMAP and construct an encoding<->Unicode mapping table.

If the character encoding were to actually use double-byte sequences
starting from 0x41 0x41 then that points to the encoding being one of the
EBCDIC DBCS family. ASCII-based DBCS encodings typically start at 0x21
0x21 or 0xA1 0xA1.
IBM CCSIDs for Simplified Chinese EBCDIC DBCS include: 837, 4933, 9029,
13125
IBM CCSIDs for Traditional Chinese EBCDIC DBCS include: 835, 4931, 9027

You can find Unicode<->codepage mapping tables for these at
http://oss.software.ibm.com/icu/charset/
It appears that higher numbers indicate expanded repertoires. You could
give this a try and see if you get results as expected.

Good luck!
markus

Markus Scherer IBM GCoC-Unicode/ICU San José, CA
markus....@us.ibm.com (also for SameTime)





BGut...@xenos.com
Sent by: icu-chars...@www-124.southbury.usf.ibm.com
2002-03-25 10:45


To: icu-ch...@www-124.southbury.usf.ibm.com
cc:
Subject: IBM codepoint to Unicode conversion question
Reply all
Reply to author
Forward
0 new messages