Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

MBCS to UTF-8 in VC++

88 views
Skip to first unread message

Galen Earl Murdock

unread,
Nov 5, 2001, 3:00:00 PM11/5/01
to
Hello --

One of our applications written in VC++ uses _MBCS (not _UNICODE) to support
international text. The application generates XML by "hand" in an MFC
CString and then POSTs that XML to an ASP page using WinInet. The ASP page
is hosted on Windows 2000 Server, and now has CODEPAGE=65001 set, which
means it expects to receive its data (notably this XML) in UTF-8 format.
The app supports Win98 and Win2000, and the Microsoft Layer for Unicode
(MSLU) is not currently being used. The plumbing works fine for English (of
course) but now I've been brought on to the team and asked to see that it
works with XML containing data in 17 other languages.

I'm trying to find the best way for them to convert the MBCS CString to
UTF-8 before posting it. I'm thinking:
1) Use the MSXML parser to load the XML in a designated code page, then
extract the XML in UTF-8. One lingering question is: Does MSXML 3.0 or 4.0
require the MSLU to do these translations on Win98?
2) Compile the application with the MSLU and do the conversions manually
(<code page x> to UCS-2 to UTF-8 using MultiByteToWideChar and
WideCharToMultiByte). This option, however, would require significant
re-testing of the application, which would be difficult as they are nearing
deployment.
3) Use a third-party library, like International Components for Unicode
(ICU), to do the conversions.

Can anyone give me some suggestions?

Galen Murdock

Michael (michka) Kaplan

unread,
Nov 5, 2001, 7:34:36 PM11/5/01
to
microsoft.public.platformsdk.mslayerforunicode is a better place to ask MSLU
questions, FWIW.

But anyway, MSXML does not use MSLU at all. And since MSXML fully supports
encoding, doing it yourself really does not make good sense and I would
truly recommend rethinking this "roll your own" mentality, unless you are:

a) paid by the hour, and
b) not subject to regular review for productive use of time.

If either of these is NOT true, then sticking to approproate usage makes a
lot more sense.


--
MichKa

Michael Kaplan
(principal developer of the MSLU)
Trigeminal Software, Inc. -- http://www.trigeminal.com/
the book -- http://www.i18nWithVB.com/


"Galen Earl Murdock" <ga...@veracitysolutions.nospam.com> wrote in message
news:uLmRQRjZBHA.1992@tkmsftngp03...

Galen Earl Murdock

unread,
Nov 6, 2001, 3:26:02 PM11/6/01
to
Excellent. I'll direct future MSLU questions there.

As for "best" development practices, that's exactly what I'm trying to come
up with. I've been charged with discovering, addressing, and helping to
resolve i18N issues on our team of 16 developers. So far I've written a
30-page document with "Unicode Essentials" section and a "Best Practices"
section. This latter section recommends against the "roll your own"
mentality for the exact reasons you listed.

So, in my list of options that I originally posted, I'd definitely prefer
#1, especially now that I know MSXML doesn't depend on MSLU on Win98.

So the sample code I'll now write is "How to use the MSXML parser to load
XML from a MBCS CString and extract the XML in UTF-8". I just wanted a
sanity check before proceeding down a dead end. I'm assuming, then that
this is a recommended approach for this MBCS VC++ app that needs to
communicate in UTF-8 XML?

Galen
--

"Michael (michka) Kaplan" <forme...@spamfree.trigeminal.nospam.com> wrote
in message news:#FsbUzlZBHA.1868@tkmsftngp04...

Michael (michka) Kaplan

unread,
Nov 6, 2001, 7:25:57 PM11/6/01
to
Well, its fair to say that anything that needs to use MSLU will ship it, so
if everything is installed by its suggested methods then it will be present?
:-)

For best practices, non-Unicode apps are their own form of dead end, at this
point -- may be worth considering this? <g>


--
MichKa

Michael Kaplan
(principal developer of the MSLU)
Trigeminal Software, Inc. -- http://www.trigeminal.com/
the book -- http://www.i18nWithVB.com/


"Galen Earl Murdock" <ga...@veracitysolutions.nospam.com> wrote in message

news:#E#tfEwZBHA.1704@tkmsftngp07...

0 new messages