Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ISO-2022-JP to unicode

115 views
Skip to first unread message

Neil Dando

unread,
Nov 20, 2000, 3:00:00 AM11/20/00
to
My aim is actually to convert messages from ISO-2022-JP charset into
Unicode. So far I haven't been very successful apart from locating a unix
utility iconv.

Does anyone know if there are any microsoft libraries that have the same
functionality as the iconv convertor that is available on Unix.

I have found the GNU licenced version but would prefer to use a microsoft
supplied library if one exists.


Thanks

Neil

Michael (michka) Kaplan

unread,
Nov 20, 2000, 3:00:00 AM11/20/00
to
Microsoft does have MLang but others have commented that the conversions may
have problems?

--
MichKa

a new book on internationalization in VB at
http://www.i18nWithVB.com/

"Neil Dando" <shandy...@THISmyrealbox.com> wrote in message
news:enwuVMxUAHA.196@cppssbbsa04...

Neil Dando

unread,
Nov 21, 2000, 3:00:00 AM11/21/00
to
I've tried using MultiByteToWideChar with code page 50220 and also the MLang
library but without any success.

To give some more detail I'm trying to convert the text of an email from a
LPSTR as retrieved by an SDK we are using into a Unicode format to be passed
on to a handheld device. The email has the useual Mime headers with
charset="ISO-2022-JP", and the content is:

I think you're a big bad ぽとあとろあと. So there.


When I view the email in outlook express it recognises the encoding as
Japanese (Autoselect) and displays it correctly as: (Hope this displays
correctly on the newsgroup post).
I think you're a big bad ???????. So there

N.B. I'm running Windows2000 with the Japanese IME also loaded.

If I force the encoding being used to be Western European (Windows) the
content is displayed in the raw form.

Will using MultibyteToWide or MLang enable me to convert the string into a
unicode CString or is there another approach I need to use.

Thanks for any advice

Neil


"Michael (michka) Kaplan" <forme...@spamfree.trigeminal.nospam.com> wrote
in message news:OgcZljxUAHA.296@cppssbbsa04...

Michael (michka) Kaplan

unread,
Nov 21, 2000, 3:00:00 AM11/21/00
to
Well, I would continue with the MLang route, since it is the one IE is
using.

--
MichKa

a new book on internationalization in VB at
http://www.i18nWithVB.com/

"Neil Dando" <shandy...@THISmyrealbox.com> wrote in message

news:uVbFjB8...@cppssbbsa02.microsoft.com...

David Williss

unread,
Nov 29, 2000, 3:00:00 AM11/29/00
to
I would recommend Ken Lunde's book Understanding Japanese Information
Processing from O'Reilly. Last I knew, he was working on a more up-to-date
version of the book, but I don't know if it's out yet. I think he has code
available although through a web site.

In a nutshell, ISO-2022-JP (actually ISO-2022 in general) uses escape
sequences to switch character sets. Text is initially assumed to be in
ISO-8859-1 (one byte) but the <ESC>$B sequence means that what
follows is in JIS-X-0202-1983 (2 byte)
An <ESC>(J switches to JIS-Roman (one byte - ASCII in lower 127,
Katakana in upper 128)
There are other escape sequences as well.
Once you've figured out how many bytes in the charset (and byte order
for 2-byte stuff) it's just a matter of running them through a lookup
table.

Neil Dando

unread,
Nov 29, 2000, 3:00:00 AM11/29/00
to
Michael and David thanks for the info.

I've now been quite succesful using MLang and ConvertToUnicode using 50220
and 1200 as the source and destination code page respectively for
initializing the ConvertCharset object.

My next problem is with the mail subject line where the same approach does
not work.

If I look at the source of the message I see
Subject: =?ISO-2022-JP?B?QSBqYXBhbmVzZSBlbWFpbCAbJEIhSiRRJCkkPyQqIUsbKEo=?=

But when viewed in Outlook express the japanese charaters are displayed
correctly and the charset info removed.

Does anyone have any ideas how to convert this into unicode for display on a
Japanese device.

Thanks

Neil

"David Williss" <dwil...@microimages.com> wrote in message
news:Ox9V5.1770$ie4....@nntp3.onemain.com...

Neil Dando

unread,
Nov 30, 2000, 3:00:00 AM11/30/00
to
Further reading seems to give me the answer

RFC 1522 along with RFC 1521 and RFC 822 seem to indicate that I need to
parse the string to get the encoded word, decode using BASE64 for B encoding
or Quoted Printable for Q then perform the charset conversion to unicode..

Time to experiment....

"Neil Dando" <shandy...@THISmyrealbox.com> wrote in message

news:Ob$qUFjWAHA.242@cppssbbsa03...

SM

unread,
Dec 5, 2000, 6:23:48 PM12/5/00
to
On Wed, 29 Nov 2000 17:43:58 -0000, "Neil Dando"
<shandy...@THISmyrealbox.com> wrote:
>If I look at the source of the message I see
>Subject: =?ISO-2022-JP?B?QSBqYXBhbmVzZSBlbWFpbCAbJEIhSiRRJCkkPyQqIUsbKEo=?=
>
>But when viewed in Outlook express the japanese charaters are displayed
>correctly and the charset info removed.

Decode QSBqYXBhbmVzZSBlbWFpbCAbJEIhSiRRJCkkPyQqIUsbKEo= using Base64.
The character set used is ISO-2022-JP.

-sm

0 new messages