Questions About WideCharToMultiByte

Jonathan Wood

unread,

Aug 12, 1999, 3:00:00 AM8/12/99

to

I'm using WideCharToMultiByte to convert a VB string to an ANSI string, and
I'm having trouble getting clarification on one aspect of this function.

int WideCharToMultiByte(
UINT CodePage, // code page
DWORD dwFlags, // performance and mapping flags
LPCWSTR lpWideCharStr, // address of wide-character string
int cchWideChar, // number of characters in string
LPSTR lpMultiByteStr, // address of buffer for new string
int cbMultiByte, // size of buffer
LPCSTR lpDefaultChar, // address of default for unmappable
// characters
LPBOOL lpUsedDefaultChar // address of flag set when default
// char. used
);

My question is with the cbMultiByte argument, the size of the target buffer.

At first glance, I would have thought both the cchWideChar and cbMultiByte
arguments would be the same. The first one specifies the number of
characters in the source string, and the second one specifies the number of
bytes in the target string. Since ANSI uses one byte per character, they
would be the same.

And in fact, if I set cbMultiByte to 0, the function returns the number of
bytes required, and it returns 3 for the string "abc".

However, in the Platform SDK documentation, it states "If the function
succeeds, and cbMultiByte is nonzero, the return value is the number of
bytes written to the buffer pointed to by lpMultiByteStr. The number
includes the byte for the null terminator." It suggests additional bytes are
required.

To confuse the issue, the API Reference documentation does not include the
sentence about a byte for the null terminator.

To further confuse things, most of the few MSDN examples I could find for
this function tend to set cbMultiByte to Len(s) * 2!!!

My question is: How many bytes are needed for the target buffer and does it
ever differ from 1 byte per character?

--
Jonathan Wood
SoftCircuits Programming
http://www.softcircuits.com

Felix Kasza [MVP]

unread,

Aug 12, 1999, 3:00:00 AM8/12/99

to

Jon,

> To confuse the issue, the API Reference documentation does not include the
> sentence about a byte for the null terminator.

WCTMB() converts counted buffers, _or_ you can set the source character
count to -1, in which case the function assumes that the source string
is null-terminated (and that the target string should also be). As long
as you supply a count for the source, you must make certain that the
null terminator is included, if you want one.

> To further confuse things, most of the few MSDN examples I could find for
> this function tend to set cbMultiByte to Len(s) * 2!!!

Easy: len(s) gets the number of characters; one is added for the final
null character; and another character is added because the author trusts
neither his memory, nor the docs, nor his code.

--

Cheers,

Felix.

If you post a reply, kindly refrain from emailing it, too.
Note to spammers: fel...@mvps.org is my real email address.
No anti-spam address here. Just one comment: IN YOUR FACE!

Jonathan Wood

unread,

Aug 12, 1999, 3:00:00 AM8/12/99

to

Hi Felix,

> WCTMB() converts counted buffers, _or_ you can set the source character
> count to -1, in which case the function assumes that the source string
> is null-terminated (and that the target string should also be). As long
> as you supply a count for the source, you must make certain that the
> null terminator is included, if you want one.

Well it is the count for the destination that I'm confused about.

> > To further confuse things, most of the few MSDN examples I could find
for
> > this function tend to set cbMultiByte to Len(s) * 2!!!
>
> Easy: len(s) gets the number of characters; one is added for the final
> null character; and another character is added because the author trusts
> neither his memory, nor the docs, nor his code.

Eh? It was Len(s) TIMES 2 (not plus). In other words, I've seen a lot of
sample code in MSDN that seems to assume the result will require 2 bytes per
character and I'm wonder if that is ever possible.

Thanks.

Felix Kasza [MVP]

unread,

Aug 12, 1999, 3:00:00 AM8/12/99

to

Jon,

> It was Len(s) TIMES 2 (not plus).

Sorry. I misread the text and made the unforgivable error of assuming
ANSI. "Multibyte" characters may or may not have a lead byte. Since you
don't know beforehand, the worst-case assumption is that every character
will require one. If that happens, then strlen(s) * 2 leaves no room for
the terminating null byte ...

Jonathan Wood

unread,

Aug 13, 1999, 3:00:00 AM8/13/99

to

I think I'm getting it. Looks like the best approach might be to query the
function for the number of bytes needed and just use that with space for the
terminator.

Thanks.

--
Jonathan Wood
SoftCircuits Programming
http://www.softcircuits.com

Felix Kasza [MVP] <fel...@mvps.org> wrote in message
news:37b33a82....@207.46.180.25...

Ted Miller

unread,

Aug 24, 1999, 3:00:00 AM8/24/99

to

Jonathan Wood <jw...@softcircuits.com> wrote in message
news:#BWbYSN5#GA...@cppssbbsa02.microsoft.com...

[snip]

> My question is with the cbMultiByte argument, the size of the target
buffer.
>
> At first glance, I would have thought both the cchWideChar and cbMultiByte
> arguments would be the same. The first one specifies the number of
> characters in the source string, and the second one specifies the number
of
> bytes in the target string. Since ANSI uses one byte per character, they
> would be the same.

If the codepage you specify for conversion is for a DBCS character set (ie,
you specify Japanese/932, or CP_ACP on a system where the ANSI codepage is
932, etc), then you could get multibyte characters that are 2 bytes in
length. Assuming that one Unicode character equals one byte when converted
is broken -- and invites future memory corruption problems that are
notoriously difficult to debug.

In the "worst" case, *each* Unicode character will require 2 bytes of
storage when converted. IF you include the terminating unicode nul character
in the conversion (either by specifying -1 as the input length, or by
passing a true length that encompasses the terminating unicode nul), then
you need maximally one additional byte to hold the terminating multi-byte
nul (which is always one byte in length).

So let's say you will convert Unicode strings whose length is known
beforehand to be limited to say 100 characters, including the nul.

Then you could have a static conversion buffer (or allocate one) that is
(99*2)+1 bytes long and use that for the conversion target with a guarantee
that you will *never* overflow that buffer no matter what you throw at it.
In actuality, even this is not 100% safe -- there are some special cases
such as CP_UTF7 and CP_UTF8, where the multibyte equivalent is maximally 3
or 4 bytes per unicode character. The *really* bulletproof way to handle
this is to use GetCPInfo and check the MaxCharSize that is returned and use
that as the multipler in the above expression.

Of course, calling WCTOMB once to get the actual required length, then
ensuring your buffer is large enough, and then calling it again, can be a
fine thing to do also, depending on perf and architecture requirements.