ISO 5426:1983 mapping

1 view
Skip to first unread message

Markus Schöpflin

unread,
Dec 18, 2002, 7:05:11 AM12/18/02
to icu-ch...@www-126.southbury.usf.ibm.com
Hello,

I would like to ask if you would consider adding support for the ISO
5426:1983 charset. It is an extension to ISO 646:1983 (US-ASCII).

Here is a descpription from
http://www.niso.org/international/SC4/sc4stand.html

<quote>
ISO 5426:1983

Extension of the Latin alphabet coded character set for bibliographic
information interchange

Contains a set of 76 graphic characters with their coded
representations. It includes a code table and a legend showing each
graphic, its name and use, and explanatory notes. Primarily intended
for information interchange among data processing systems and within
message transmission systems, this character set is designed to handle
information in 39 specified languages, as well as transliterated or
romanized forms of an additional 32 languages. These characters,
together with the characters in the international reference version of
ISO 646 (ISO escape sequence ESC 2/8 4/0), constitute a character set
for the international interchange of bibliographic citations,
including their annotations, in the Latin alphabet
</quote>

TIA, Markus
--
_____________________________________________________________________

GINIT Technology GmbH markus.s...@ginit-technology.com
Markus Schoepflin www.ginit-technology.com
Emmy-Noether-Str. 11 phone: +49-721-96681-0
D-76131 Karlsruhe fax: +49-721-96681-111

Markus Scherer

unread,
Dec 20, 2002, 12:56:24 PM12/20/02
to Markus Schöpflin, icu-ch...@www-126.southbury.usf.ibm.com
Moin Markus (und Gruß nach Karlsruhe),

As you discovered, supporting this charset is not trivial because its text
encoding model is different from Unicode's. It is possible to add support
for such charsets under ICU's ucnv_* API as we have shown with ISCII, but
it is not easy.

There are many charsets that require more than 1:1 code point mappings,
but each requires different special handling. The problem is that these
charsets are all used much, much less frequently than the ones ICU already
supports, so we are getting into a field of diminishing returns for
increasing efforts.

Our current thinking is to make a new API with existing implementations:
Combining a converter and a transliterator to perform complex conversions
in a two-step manner but with a single call.

It would be better for everyone to just work with Unicode charsets; that
would give us more time to add more interesting features, like beefing up
our regular expression support :-)

Having said this, I encourage you to do several things:
- You could file an RFE in our Jitterbug system for support of ISO 5426.
+ Note that there is an RFE already to marry an ICU converter with a
transliterator
in an easy API.
- You could use a converter+transliterator yourself in your code, as
George suggested.
- Since ICU is open source, you could implement either a dedicated
converter or the cnv+translit RFE.

Viele Grüße/Frohe Weihnachten/Guten Rutsch,

markus

Markus Scherer IBM GCoC-Unicode/ICU San José, CA
markus....@us.ibm.com





Markus Schöpflin <markus.s...@ginit-technology.com>
Sent by: icu-chars...@www-124.southbury.usf.ibm.com
2002-12-18 04:05


To: icu-ch...@www-124.southbury.usf.ibm.com
cc:
Subject: ISO 5426:1983 mapping



Hello,

I would like to ask if you would consider adding support for the ISO
5426:1983 charset. It is an extension to ISO 646:1983 (US-ASCII).

Here is a descpription from
http://www.niso.org/international/SC4/sc4stand.html
...


Reply all
Reply to author
Forward
0 new messages