MBCS vs UNICODE

--== Alain ==--

unread,

Oct 2, 2006, 12:51:39 PM10/2/06

to

Hi,

If i understood well MBCS is also used for eastern languages like
chinese, japanese and so on.... as UNICODE is.

So which one is the best format ?
I see a lot of applications developed with UNICODE.

What are the pros/cons of UNICODE vs MBCS ? (except 2 bytes characters
coding)

Till now i wrote applications only in ANSI. I want to allow my
application to work with Eastern Languages (e.g. Asian characters), so
what should i consider as best solution ?

thanks a lot,

Alain

Bruno van Dooren [MVP VC++]

unread,

Oct 2, 2006, 1:02:22 PM10/2/06

to

I haven't used MBCS, but considering that the whole of .NET uses unicode,
I'd go with that.

--

Kind regards,
Bruno van Dooren
bruno_nos_pa...@hotmail.com
Remove only "_nos_pam"

Alex Blekhman

unread,

Oct 2, 2006, 1:38:54 PM10/2/06

to

"Bruno van Dooren [MVP VC++]" wrote:
>
> I haven't used MBCS, but considering that the whole of
> .NET uses unicode, I'd go with that.

Actually _MBCS is what most of the people call "ANSI build".
Strictly speaking, ANSI build doesn't exist in the sense
that each language caracter gets exactly one `char' for
representation. Once you defined _MBCS for a project the
same `char*' string can be multibyte. However, nobody pays
attention to that, so we call it ANSI strings. If you're
planning to use your application with Asian multibyte
languages, then you will need to call _mbXXX family of
functions when working with strings.

Fortunately, due to prevalence of Win2K/XP nowadays we can
abandon Win9x/Me (with its multibyte strings) and do Unicode
builds that suit everyone.

Igor Tandetnik

unread,

Oct 2, 2006, 1:38:52 PM10/2/06

to

--== Alain ==-- <nos...@noemail.com> wrote:
> What are the pros/cons of UNICODE vs MBCS ? (except 2 bytes characters
> coding)

http://joelonsoftware.com/articles/Unicode.html

"The Absolute Minimum Every Software Developer Absolutely, Positively
Must Know About Unicode and Character Sets (No Excuses!)"

--
With best wishes,
Igor Tandetnik

With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925

--== Alain ==--

unread,

Oct 2, 2006, 2:13:36 PM10/2/06

to

Thx Alex.

So i will migrate everything to UNICODE

Heinz Ozwirk

unread,

Oct 2, 2006, 4:13:37 PM10/2/06

to

"--== Alain ==--" <nos...@noemail.com> schrieb im Newsbeitrag
news:uCEHHMk5...@TK2MSFTNGP03.phx.gbl...

> Hi,
>
> If i understood well MBCS is also used for eastern languages like chinese,
> japanese and so on.... as UNICODE is.

A large difference between Unicode and MBCS is that there is only one
Unicode (relevant to the Windows API) but there is a very large number of
multi-byte character sets. And usually a single MBCS is only usefull for a
single language. Of cause you can use all MBCS available in Windows to
represent (american) English, but you cannot use a single MBCS to represent
both Japanese and Korean, or even Greek and Russian. With Unicode, however,
you can represent many languages with a single character set, and you don't
have to worry about codepages as you have with MBCS. And if you do it right,
and if VC 2020 finally will implement full Unicode (UCS-32) support, your
Unicode app might even be able to display Klingonean text.

> So which one is the best format ?
> I see a lot of applications developed with UNICODE.
>
> What are the pros/cons of UNICODE vs MBCS ? (except 2 bytes characters
> coding)

Unicode is much easier to handle than MBCS. All characters have the same
size (currently 2 bytes) and you can use such characters and strings as easy
as single-byte character sets like ASCII or ANSI.

> Till now i wrote applications only in ANSI. I want to allow my application
> to work with Eastern Languages (e.g. Asian characters), so what should i
> consider as best solution ?

Unicode is probaly easier to use if you want to target multiple languages
simultaniously or if you don't know the language in advance. A well written
MBCS application might work with whatever codepage selected by the user of
your app, but you have to be very carefull and many string functions don't
work really well with true multi-byte character sets.

Also, API calls expecting or returning strings are slightly faster with
Unicode than they are with MBCS or even plain ANSI. Internally Windows
NT/2000/XP only uses Unicode and all MBCS strings must be translated into
Unicode before Windows can actually use them.

HTH
Heinz

David Wilkinson

unread,

Oct 3, 2006, 12:15:01 PM10/3/06

to

Heinz Ozwirk wrote:

[snip]

>
> Unicode is much easier to handle than MBCS. All characters have the same
> size (currently 2 bytes) and you can use such characters and strings as easy
> as single-byte character sets like ASCII or ANSI.
>
>

[snip]

Unfortunately this is not true. Microsoft "Unicode" as implemented in
Windows XP and later is UTF-16, which contains multi-byte characters
(perhaps not in common usage for most people, but it does).

The real advantage of "Unicode" is that it can handle characters from
all languages in a single universal encoding, so that we no longer have
the nightmare of different code pages for different languages. But this
advantage is shared by UTF-8 (an 8-bit encoding of Unicode) as well as
UTF-16 (a 16-bit encoding) and UTF-32 (a 32-bit encoding). Only in
UTF-32 is every character the same length.

For me, UTF-16 was the worst choice because it wastes space (at least
for Western languages), while still requiring multi-byte characters (for
some languages). UTF-8 would have been much better, IMHO, but it is too
late for MS to change now.

David Wilkinson

Alex Blekhman

unread,

Oct 3, 2006, 12:51:39 PM10/3/06

to

David Wilkinson wrote:
> [snip]
>> Unicode is much easier to handle than MBCS. All characters have the same
>> size (currently 2 bytes) and you can use such characters and strings as easy
>> as single-byte character sets like ASCII or ANSI.
>

> Unfortunately this is not true. Microsoft "Unicode" as implemented in
> Windows XP and later is UTF-16, which contains multi-byte characters
> (perhaps not in common usage for most people, but it does).

Actually, if you mean surrogate pairs, then it was
introduced with Windows 2000. I don't know about any other
multi-byte Unicode implementation in Windows.

> For me, UTF-16 was the worst choice because it wastes space (at least
> for Western languages), while still requiring multi-byte characters (for
> some languages). UTF-8 would have been much better, IMHO, but it is too
> late for MS to change now.

The problem is that first Windows NT was designed before
UTF-8 was invented. So, given the choice between UCS-2 and
UCS-4 (which was too wasteful for early 90's) UCS-2 looks
not so bad.