Code Page problem in SetWindowText

Marco Hung

unread,

Sep 4, 2007, 11:54:55 PM9/4/07

to

Hi All,

I've created a MFC project in MBCS. I need to show some set special characters ( ASCII code > 128) in a CStatic controls. It shows correctly in all English locale window. However all those special character becames "?" in non-English window. How to solve this problem? Here's part of my source code

void MySprcialCharacterDlg::OnUpdate()

{

TCHAR stringToShow[10];

ZeroMemory( stringToShow, sizeof(stringToShow) );

stringToShow[0] = 129;

stringToShow[1] = 130;

stringToShow[2] = 131;

stringToShow[3] = 132;

stringToShow[4] = 133;

stringToShow[5] = 134;

stringToShow[6] = 135;

stringToShow[7] = 136;

stringToShow[8] = 137;

::SetWindowText( GetDlgItem(IDC_STATIC_SPECIAL_CHAR), stringToShow );

}

Many thx.

Marco

Tom Serface

unread,

Sep 5, 2007, 12:06:50 AM9/5/07

to

You may want to try calling SetWindowTextW() instead.

This article may be interesting to you:

http://www.codeproject.com/string/cppstringguide1.asp

Tom

"Marco Hung" <marco.h...@gmail.com> wrote in message news:ucAsGC37...@TK2MSFTNGP06.phx.gbl...

Mihai N.

unread,

Sep 5, 2007, 3:41:42 AM9/5/07

to

> I've created a MFC project in MBCS. I need to show some set special
> characters ( ASCII code > 128) in a CStatic controls.

If you start now with a new project, there is no reason no to go Unicode.

The only reason for MBCS is to support Win 9x (with a new project?),
to learn about things that will be obsolete in 2-3 years,
or for being a masochist :-)

--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

Giovanni Dicanio

unread,

Sep 5, 2007, 3:52:02 AM9/5/07

to

"Marco Hung" <marco.h...@gmail.com> ha scritto nel messaggio
news:ucAsGC37...@TK2MSFTNGP06.phx.gbl...

> I've created a MFC project in MBCS. I need to show some set special
> characters
> ( ASCII code > 128) in a CStatic controls. It shows correctly in all
> English locale window.
> However all those special character becames "?" in non-English window.
> How to solve this problem?

Hi,

This is a classical example of the importance of using *Unicode* to store
characters and strings.
IMHO, you should forget about ANSI (or MBCS), and consider *Unicode* as the
type for characters and strings (like modern programming languages like
Java, Python, C#, etc. do).

Basically, Unicode provides *unique* number for every character, no matter
what the programming language, or the operating system, etc.

I don't know what character you want to display, but e.g. suppose that you
want to display a lower-case Greek "omega" (kind of "w").
In Unicode UTF-16 encoding, the "unique number" associated to this character
is 0x03C9 (hex, note that its 16 bits, not 8 bits like for ANSI).

The C++ code to display that character in a message-box is like so:

// Build a string of Unicode UTF-16 characters:
// "omega" (0x03C9), end-of-string (0x0000)
wchar_t omega[] = { 0x03C9, 0x0000 };

// Display Unicode text (note the W and the L)
MessageBoxW( NULL, omega, L"Unicode Test", MB_OK );

The L before "Unicode Test" string literal identifies this string as Unicode
and not ANSI.
The W after MessageBox is a Win32 API naming convention to identify the
Unicode (and not the ANSI) version of MessageBox API.

If you compile in Unicode mode, you can avoid the W and just write
MessageBox; the C/C++ preprocessor will expand MessageBox as MessageBoxW.

You might find the Unicode FAQ http://unicode.org/faq/ and Mihai's blog
http://www.mihai-nita.net/ to be both interesting.

Giovanni

Joseph M. Newcomer

unread,

Sep 5, 2007, 9:15:13 AM9/5/07

to

There are many problems. First, there is no reason to use an array as you show; you could
just as easily have written

TCHAR stringToShow[] = { 129, 130, 131, 132, 133, 134, 135, 136, 137 };
or

TCHAR stringToShow[] = _T("\x81\x82\x83\x84\x85\x86\x87\x88\x89");

You should NOT be using GetDlgItem; this should be considered as obsolete except in very
rare and exotic situations, which you do not have an instance of. Create member
variables.

Part of the problem is that you are using MBCS, which means that character codes >=128 are
not actual characters, but part of a multibyte encoding, and therefore they are going to
be misinterpreted in all kinds of fascinating ways.

As already pointed out, forget that MBCS exists. It is dead technology. Use Unicode.
There is no real choice these days.
joe

Joseph M. Newcomer [MVP]
email: newc...@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Tom Serface

unread,

Sep 5, 2007, 11:28:02 AM9/5/07

to

Hi Mihai,

For the most part I agree with what you say here, the only exception
being... if you are using a lot of strings and doing a lot of string
handling and don't need anything except English then using MBCS may be a bit
faster to execute, better in memory storage, and quicker to read and write
files since Unicode doubles all character sizes whether needed or not. I
wish Windows/MFC/all those good things had better handling for other methods
like UTF-8 that would give similar results as MBCS.

That say, the differences in most cases are not all that significant and
I've gone to using Unicode all the time.

Tom

"Mihai N." <nmihai_y...@yahoo.com> wrote in message
news:Xns99A2712...@207.46.248.16...

David Ching

unread,

Sep 5, 2007, 6:37:57 PM9/5/07

to

"Tom Serface" <tom.n...@camaswood.com> wrote in message
news:ebx6JG97...@TK2MSFTNGP05.phx.gbl...

> Hi Mihai,
>
> For the most part I agree with what you say here, the only exception
> being... if you are using a lot of strings and doing a lot of string
> handling and don't need anything except English then using MBCS may be a
> bit faster to execute, better in memory storage, and quicker to read and
> write files since Unicode doubles all character sizes whether needed or
> not. I wish Windows/MFC/all those good things had better handling for
> other methods like UTF-8 that would give similar results as MBCS.
>
> That say, the differences in most cases are not all that significant and
> I've gone to using Unicode all the time.
>

But much of the inherent speed advantage of MBCS is negated by the native
API in Win2K/XP/Vista being Unicode, so having a Unicode app allows us to
call these API's directly and not go through thunks. But I've not done
speed tests.

-- David

Tom Serface

unread,

Sep 5, 2007, 7:02:33 PM9/5/07

to

That's a really good point. I hadn't thought of that before. So I guess if
you are moving strings in and out of controls a lot there could even be a
performance improvement using Unicode.

Tom

"David Ching" <d...@remove-this.dcsoft.com> wrote in message
news:8dGDi.1727$Sd4...@nlpi061.nbdc.sbc.com...

Giovanni Dicanio

unread,

Sep 5, 2007, 7:37:34 PM9/5/07

to

"Tom Serface" <tom.n...@camaswood.com> ha scritto nel messaggio
news:OIPsKEB8...@TK2MSFTNGP04.phx.gbl...

> That's a really good point. I hadn't thought of that before. So I guess
> if you are moving strings in and out of controls a lot there could even be
> a performance improvement using Unicode.

Hi Tom,

I completely agree with this analysis by David, at least on the (real)
operating systems like Win2K/XP/Vista, that are Unicode-native.
(Win9x "toys" are a different thing, maybe there ANSI is faster than
Unicode, because they are ANSI/MBCS-native, but the Win9x family is not
interesting for me.)

G

Marco Hung

unread,

Sep 5, 2007, 12:32:29 AM9/5/07

to

Thx Tom,

I've follow the sample. But seems that the special character still can't display correctly.

"Tom Serface" <tom.n...@camaswood.com> wrote in message news:uUzAeJ37...@TK2MSFTNGP04.phx.gbl...

Marco Hung

unread,

Sep 5, 2007, 12:32:29 AM9/5/07

to

Thx Tom,

I've follow the sample. But seems that the special character still can't display correctly.

"Tom Serface" <tom.n...@camaswood.com> wrote in message news:uUzAeJ37...@TK2MSFTNGP04.phx.gbl...

Marco Hung

unread,

Sep 5, 2007, 9:29:02 PM9/5/07

to

Thx all of you.

I understand that Unicode is the best way of string operations for morden
application. However, my appliaction need to communicate with a "Old" system
thr some API calls, which will always return string in "single character"
format. I think MBCS may be the only choice for it.

I've tried to convert the string to unicode using function like
"MultiByteToWideChar" and "SetWindowTextW", but the same output in display.
Is there any way to make the conversion correctly in all language windows?

Marco

"Joseph M. Newcomer" <newc...@flounder.com> wrote in message
news:amatd3d0d2eu0chtp...@4ax.com...

David Wilkinson

unread,

Sep 5, 2007, 9:33:51 PM9/5/07

to

Marco:

Please don't send HTML mail to the newsgroups. Text only.

--
David Wilkinson
Visual C++ MVP

David Wilkinson

unread,

Sep 5, 2007, 9:41:09 PM9/5/07

to

Marco Hung wrote:
> Thx all of you.
>
> I understand that Unicode is the best way of string operations for morden
> application. However, my appliaction need to communicate with a "Old" system
> thr some API calls, which will always return string in "single character"
> format. I think MBCS may be the only choice for it.
>
> I've tried to convert the string to unicode using function like
> "MultiByteToWideChar" and "SetWindowTextW", but the same output in display.
> Is there any way to make the conversion correctly in all language windows?

Marco:

If you know the code page of the 8-bit strings, then
MultiByteToWideChar() should work. If you don't you are in trouble.

Joseph M. Newcomer

unread,

Sep 6, 2007, 12:20:20 AM9/6/07

to

Note that MBCS is not the same as "ANSI" (a bad name choice). MBCS uses sequences of
8-bit characters to represent characters, and as far as I know, there are no API calls
that take MBCS strings. They take either ANSI or Unicode.

You can't just say "MultiByteToWideChar" since there are critical parameters that you have
omitted telling us about, such as what code page you specified, and whether or not you
have true MBCS (e.g., UTF-7, UTF-8) or just 8-bit characters. Certainly the example you
gave of 128, 129, 130, ...137 is not UTF-8, and in fact these code points are not defined
in most character sets (although 128 is the official Euro symbol in a lot of fonts), so
you have supplied rather incomplete information on what you are doing, trying to do, and
how you are doing it. MBCS is *not* a substitute for ANSI, since there are no APIs that
actually use it. So you need to say a lot more about what is going on here before the
question even begins to make sense.
joe

Joseph M. Newcomer

unread,

Sep 6, 2007, 1:17:21 AM9/6/07

to

I've had lots of people insist that it is faster to use ANSI apps because "the strings are
shorter". They don't realize that since all of WIndows is written in Unicode, every ANSI
API has to first convert its arguments to Unicode, then call the Unicode version of the
API, so ANSI would be inherently slower.

In an experiment I ran, Unicode is on the average slightly faster than ANSI, for something
as simple as a repeated SetWindowText, although the variance of the samples is high.
joe

Marco Hung

unread,

Sep 6, 2007, 2:22:57 AM9/6/07

to

Sorry for my misleading question. Let me explain more in my problem.

My application will call an extranl dll, which will return a string as
result ( should be a list of ASCII code from 0~255 ). My application will
then display the result in an Edit box.

The result only consists of characters from A~Z plus 2 special characters
( ? (0x87) & ¤ (0xA4) ). The edit box display correct if I run my
application in English Windws. However in non-English system, all these 2
characters will display as "?"

Here's my exact coding in my application.

OnStart(CString strCommand)
{
CMyLiberaryObject MyLibObj;
char *strResult = MyLibObj.ProcessCommand( (LPCTSTR) strCommand ); //
return type is char*

BSTR bstr = NULL;
int nConvertedLen = MultiByteToWideChar(1252, MB_COMPOSITE, strResult
, -1, NULL, NULL);
bstr = ::SysAllocStringLen(NULL, nConvertedLen);
if (bstr != NULL)
MultiByteToWideChar(1252, MB_COMPOSITE, (LPCTSTR)strResult , -1,
bstr, nConvertedLen);
SetWindowTextW(GetDlgItem(IDC_ED_CMDRESULT)->GetSafeHwnd(), bstr);
SysFreeString(bstr);

MyLibObj.Complete();
}

Rgds,
Marco

"Joseph M. Newcomer" <newc...@flounder.com> wrote in message

news:novud35mmqurjajnn...@4ax.com...

Mihai N.

unread,

Sep 6, 2007, 2:40:11 AM9/6/07

to

> I understand that Unicode is the best way of string operations for morden
> application. However, my appliaction need to communicate with a "Old"
system
> thr some API calls, which will always return string in "single character"
> format. I think MBCS may be the only choice for it.
>
> I've tried to convert the string to unicode using function like
> "MultiByteToWideChar" and "SetWindowTextW", but the same output in display.
> Is there any way to make the conversion correctly in all language windows?

Then maybe the best thing is to have the whole application Unicode, and
convert back and forth when you comunicate with the legacy part.

Message has been deleted

David Wilkinson

unread,

Sep 6, 2007, 5:34:13 AM9/6/07

to

Marco Hung wrote:
> Sorry for my misleading question. Let me explain more in my problem.
>
> My application will call an extranl dll, which will return a string as
> result ( should be a list of ASCII code from 0~255 ). My application will
> then display the result in an Edit box.
>
> The result only consists of characters from A~Z plus 2 special characters
> ( ? (0x87) & � (0xA4) ). The edit box display correct if I run my
> application in English Windws. However in non-English system, all these 2
> characters will display as "?"
>
> Here's my exact coding in my application.
>
> OnStart(CString strCommand)
> {
> CMyLiberaryObject MyLibObj;
> char *strResult = MyLibObj.ProcessCommand( (LPCTSTR) strCommand ); //
> return type is char*
>
> BSTR bstr = NULL;
> int nConvertedLen = MultiByteToWideChar(1252, MB_COMPOSITE, strResult
> , -1, NULL, NULL);
> bstr = ::SysAllocStringLen(NULL, nConvertedLen);
> if (bstr != NULL)
> MultiByteToWideChar(1252, MB_COMPOSITE, (LPCTSTR)strResult , -1,
> bstr, nConvertedLen);
> SetWindowTextW(GetDlgItem(IDC_ED_CMDRESULT)->GetSafeHwnd(), bstr);
> SysFreeString(bstr);
>
> MyLibObj.Complete();
> }

Marco:

If your characters are all ISO-8859-1 characters (as would seem to be
the case) then, as I said before, you should just be able to copy (not
convert) them into an array of wchar_t, and use SetWindowTextW. This is
because the first 256 code points of Unicode (and the UTF-16 encoding of
it) are the same as ISO-8859-1. Or you could use MultiByteToWideChar()
with the code page always set to English. You do not want to use
MultiByteToWideChar() with the local code page.

Actually, I am confused by your code. The only purpose to using TCHAR,
LPCTSTR, etc, is to have an app that will compile as both ANSI and
Unicode. This surely cannot be the case for you, as this would mean that
your legacy CLibrarayObject::ProcessCommand() would have to accept a
const whar_t* and return a char*.

I think you would be best to write your whole app in Unicode and do what
you have to to convert to and from 8-bit strings only when using your
legacy library.

Giovanni Dicanio

unread,

Sep 6, 2007, 6:09:11 AM9/6/07

to

"Joseph M. Newcomer" <newc...@flounder.com> ha scritto nel messaggio
news:q90vd3picmebnvab9...@4ax.com...

> I've had lots of people insist that it is faster to use ANSI apps because
> "the strings are
> shorter". They don't realize that since all of WIndows is written in
> Unicode, every ANSI
> API has to first convert its arguments to Unicode, then call the Unicode
> version of the
> API, so ANSI would be inherently slower.

Yes, Joe! The key point is the converstion ANSI -> Unicode made internally
by Windows, as you pointed.

Giovanni

Joseph M. Newcomer

unread,

Sep 6, 2007, 10:13:58 AM9/6/07

to

See below...

On Thu, 6 Sep 2007 14:22:57 +0800, "Marco Hung" <marco.h...@gmail.com> wrote:

>Sorry for my misleading question. Let me explain more in my problem.
>
>My application will call an extranl dll, which will return a string as
>result ( should be a list of ASCII code from 0~255 ). My application will
>then display the result in an Edit box.
>
>The result only consists of characters from A~Z plus 2 special characters
>( ? (0x87) & ¤ (0xA4) ). The edit box display correct if I run my
>application in English Windws. However in non-English system, all these 2
>characters will display as "?"
>
>Here's my exact coding in my application.
>
>OnStart(CString strCommand)
>{
> CMyLiberaryObject MyLibObj;
> char *strResult = MyLibObj.ProcessCommand( (LPCTSTR) strCommand ); //
>return type is char*

****
There's a problem here. What is the parameter of the function ProcessCommand? Is it
really LPCTSTR (8-bit or Unicode depending on compilation mode)? Or is it 8-bit? The
LPCTSTR cast would be dangerous in a Unicode build if the function takes char *.

Given it returns a char *, who is freeing it? This is inherently dangerous that it would
return a pointer to a fixed buffer, so it should really be returning a CStringA, or at the
very least a char * on the heap which needs to be freed.

Code that returns a pointer to a fixed buffer is not thread-safe, and should be considered
*dangerously obsolete* at this point (think Unicode, think multithreading, ALWAYS)
***
>
> BSTR bstr = NULL;
****
Why are you allocating a BSTR here? Why not an LPWSTR? BSTRs have additional overheads,
such as reference counting, and since you are not using any of those, an LPWSTR would be
fine.
****

> int nConvertedLen = MultiByteToWideChar(1252, MB_COMPOSITE, strResult
>, -1, NULL, NULL);

****
This tells you to convert the string using code page 1252, ISO-8859-1 (Latin-1). Given
that you have said that you only use A-Z and two special characters, MB_COMPOSITE has no
meaning here, and should be omitted.
****
> bstr = ::SysAllocStringLen(NULL, nConvertedLen);
****
LPWSTR bstr = new WCHAR[nConvertedLen];

there was no need to declare an initialize a pointer before it is used, and there is
certainly no need for a BSTR, so get rid of it
****

> if (bstr != NULL)
> MultiByteToWideChar(1252, MB_COMPOSITE, (LPCTSTR)strResult , -1,
>bstr, nConvertedLen);

****
Get rid of the MB_COMPOSITE
****
> SetWindowTextW(GetDlgItem(IDC_ED_CMDRESULT)->GetSafeHwnd(), bstr);
****
Create a control variable. Generally, assume that if you have written GetDlgItem, except
in EXTREMELY RARE CIRCUMSTANCES (of which this is not one) you have made a fundamental
design error. Because you are trying to write a Unicode string in an ANSI app, you would
need to write
::SetWindowTextW(c_Result.m_hWnd, bstr);
although it would make much more sense to compile this as a Unicode app (beware the
parameter issue already mentioned!) and just write
c_Result.SetWindowText(bstr);
****
> SysFreeString(bstr);
*****
delete [] bstr;
why use something as complicated as a BSTR for such a trivial purpose?

Now you've got some other issues here. For example, what font is loaded into the edit
control? Is the result of the MultiByteToWideChar correct, or does it already have the
erroneous '?' in it? There are too many variables here and you have not isolated the
problem adequately.
****

David Wilkinson

unread,

Sep 6, 2007, 10:36:06 AM9/6/07

to

David Wilkinson wrote:
> If your characters are all ISO-8859-1 characters (as would seem to be
> the case) then, as I said before, you should just be able to copy (not
> convert) them into an array of wchar_t, and use SetWindowTextW. This is
> because the first 256 code points of Unicode (and the UTF-16 encoding of
> it) are the same as ISO-8859-1. Or you could use MultiByteToWideChar()
> with the code page always set to English. You do not want to use
> MultiByteToWideChar() with the local code page.
>
> Actually, I am confused by your code. The only purpose to using TCHAR,
> LPCTSTR, etc, is to have an app that will compile as both ANSI and
> Unicode. This surely cannot be the case for you, as this would mean that
> your legacy CLibrarayObject::ProcessCommand() would have to accept a
> const whar_t* and return a char*.
>
> I think you would be best to write your whole app in Unicode and do what
> you have to to convert to and from 8-bit strings only when using your
> legacy library.

Marco:

I see you are already converting using code page 1252 (I didn't notice
that before). This should work if you do it correctly, but I'm not sure
you are (see Joe's reply).

Tom Serface

unread,

Sep 6, 2007, 11:57:25 AM9/6/07

to

I guess it all depends on what you are going to do with the strings. If you
are manipulating them in memory and not using them for any Windows things
then certainly ANSI would save memory and time, but it's difficult to
quantify the difference and I suspect it is negligible so Unicode seems a
better way to go in my opinion. If you really need to minimize memory space
(like you're trying to run an MFC application on your watch or something)
then perhaps, but ...

Tom

"Joseph M. Newcomer" <newc...@flounder.com> wrote in message

news:q90vd3picmebnvab9...@4ax.com...

Tom Serface

unread,

Sep 6, 2007, 12:06:23 PM9/6/07

to

Hi Marco,

You can get this to work so long as you know the code page you need for the
language or you are running only on the machine where that language is
installed and the correct region is set. We tried this for years and could
never get it to work right since our software was installed in so many
configurations so we finally went to Unicode and we just convert the
external strings and files to Unicode to use them rather than trying to go
the other way. So far this approach has worked well. So to answer your
question, yes you can theoretically get it to work, but the number of
parameters involved is often difficult to control.

Tom

"Marco Hung" <marco.h...@gmail.com> wrote in message

news:OQmEQVC8...@TK2MSFTNGP03.phx.gbl...

Tom Serface

unread,

Sep 6, 2007, 12:08:05 PM9/6/07

to

Yeah, I hadn't thought of that angle. Sometimes I get all focused on local
heap memory and forget about all the automatic stuff happening behind the
scenes. I can say with a degree of certainty that going to Unicode has
worked for us over the last few years. There was some initial shock in the
transition, but it wasn't difficult to work through. Now we just do
everything Unicode so it makes life easier all around.

Tom

"Giovanni Dicanio" <giovanni...@invalid.it> wrote in message
news:eJ81NXB8...@TK2MSFTNGP05.phx.gbl...

Message has been deleted

Norman Diamond

unread,

Sep 6, 2007, 8:12:45 PM9/6/07

to

> Note that MBCS is not the same as "ANSI"

Huh?

> (a bad name choice).

Yes "ANSI" is a bad name choice, but the meaning is the same as MBCS.

> MBCS uses sequences of 8-bit characters to represent characters,

Yes.

> and as far as I know, there are no API calls that take MBCS strings.

The ones that end in "A" take MBCS strings. Most of them work by converting
to Unicode before calling NT internal routines and converting back to MBCS
before returning to the caller. Some such as WTSQuerySessionInformationA
don't work. (ANSI applications have to call WTSQuerySessionInformationW
explicitly, including the W, and do the conversions themselves.)

> They take either ANSI or Unicode.

Yes. The ones that end in A take "ANSI" i.e. MBCS, and the ones that end in
W take Unicode i.e. UTF-16.

"Joseph M. Newcomer" <newc...@flounder.com> wrote in message

news:novud35mmqurjajnn...@4ax.com...

Norman Diamond

unread,

Sep 6, 2007, 8:18:14 PM9/6/07

to

> ( should be a list of ASCII code from 0~255 )

ASCII codes are 0~127.

If you're having code page problems it's because you're dealing with ANSI
code pages other than ASCII. Some code pages (mostly European) are 0~255.
Some (Asian) are basically 0~65535, but of course some portions of that
range can't be used, so they use 0~127 and part of 32768~65535.

If a value isn't a valid character in your code page (for example number 529
in code page 1252 or number 129 in code page 932) then of course you get
garbage.

"Marco Hung" <marco.h...@gmail.com> wrote in message

news:%23u9Vf5E...@TK2MSFTNGP02.phx.gbl...

Joseph M. Newcomer

unread,

Sep 7, 2007, 1:13:59 AM9/7/07

to

And how many strings do you need before you start to see any real impact on memory space?

Suppose I have 10MB 'characters'. In ANSI, these would take 10MB of RAM; in Unicode they
would take 20MB of RAM. On a typical end-user machine of 1GB of memory, this means that I
would occupy 0.5% of physical RAM, or 0.25% of my virtual address space, with 8-bit
strings, and in Unicode, I'd use a whopping 1% of my physical address space and 0.5% of my
virtual address space. I somehow cannot get excited about this problem, given all the
additional problems of complex code, possibility of error, cost of development and
debugging, etc. that it would cost to use 8-bit characters.
joe

On Fri, 7 Sep 2007 00:04:10 +0200, "Giovanni Dicanio" <giovanni...@invalid.it> wrote:

>
>"Tom Serface" <tom.n...@camaswood.com> ha scritto nel messaggio

>news:Oxv$i6J8HH...@TK2MSFTNGP03.phx.gbl...

>
>> but it's difficult to quantify the difference and I suspect it is
>> negligible so Unicode seems a better way to go in my opinion.
>

>Hi Tom,
>
>I agree with you.
>
>And maybe if memory space saving is the main target, UTF-8 could be used as
>the encoding for Unicode, instead of UTF-16.
>But maybe for historical reasons, it seems that internal Windows format for
>Unicode is UTF-16 :(
>On the other side, IIRC Mac OS X and Linux tend to use UTF-8, but I may be
>in mistake...

>
>
>> If you really need to minimize memory space (like you're trying to run an
>> MFC application on your watch or something) then perhaps, but ...
>

>IIRC, Windows CE (which should be suited to embedded platforms and platforms
>with memory limits, not like the "huge" 1-2 GB of RAMs in current desktop
>PCs) uses Unicode (UTF-16) and not ANSI :)
>
>Giovanni

Tom Serface

unread,

Sep 7, 2007, 1:52:24 AM9/7/07

to

OK, point, set, match... I can't argue with that one :o) Not to mention
that arguing makes no sense since I'm a Unicode convert anyway.

Tom

"Joseph M. Newcomer" <newc...@flounder.com> wrote in message

news:d6n1e3938ejv9bb7j...@4ax.com...

Mihai N.

unread,

Sep 7, 2007, 3:01:30 AM9/7/07

to

> Yes "ANSI" is a bad name choice, but the meaning is the same as MBCS.

Almost.

ANSI can be SBCS or MBCS. But it is one of them.
The system has one ANSI code page and only one at a certain time
(the system code page), and changing it requires a reboot.

932 (Shift-JIS), 950 (Big5), etc, are all MBCS.
Any one of them can be the ANSI code page in a certain session.
But not all of them.
Then you have other code pages, like EUC-JP or GBK, that are DBCS,
but cannot be ANSI (they can never be used as system locale).

But this is just lingo.

For a programmer using Dev Studio the lingo means something else.

If you go in Dev Studio you only have 3 options for Character set
1. Not set (nothing defined)
2. Multi-Byte Character Set (_MBCS defined)
3. Unicode Character Set (_UNICODE and UNICODE defined)

In most cases there is no difference between 1. and 2.
If you use MessageBox for 1. and 2. will become MessageBoxA,
and for 3. it will become MessageBoxW.

But look at things like _tcsclen.
In case 1. will become strlen, in case 2. it becomes _mbslen,
and in case 3. it becomes wcslen.

This is why sometimes you have to be very carefull what you use
when you convert to generic text handling. Will you replace strlen
with _tcslen, or with _tcsclen?
(in most cases the answer is _tcslen, but there are exceptions)

Mihai N.

unread,

Sep 7, 2007, 3:29:15 AM9/7/07

to

> And maybe if memory space saving is the main target, UTF-8 could be used as
> the encoding for Unicode, instead of UTF-16.
> But maybe for historical reasons, it seems that internal Windows format for
> Unicode is UTF-16 :(
> On the other side, IIRC Mac OS X and Linux tend to use UTF-8, but I may be
> in mistake...

As a general rule: UTF-16 for processing, UTF-8 for transfer/storage
(and like any general rule it has exceptions, but you have to know when
to do that)

Mac OS X string API uses UTf-16. Same for Apache Xerces
(XML parsing library), ICU (IBM's International Components for Unicode),
Qt, Java.
Here is a good read: http://unicode.org/notes/tn12/tn12-1.html

Giovanni Dicanio

unread,

Sep 7, 2007, 5:13:00 AM9/7/07

to

"Mihai N." <nmihai_y...@yahoo.com> ha scritto nel messaggio
news:Xns99A44F6...@207.46.248.16...

> As a general rule: UTF-16 for processing, UTF-8 for transfer/storage
> (and like any general rule it has exceptions, but you have to know when
> to do that)

Yes.

> Mac OS X string API uses UTf-16. Same for Apache Xerces
> (XML parsing library), ICU (IBM's International Components for Unicode),
> Qt, Java.
> Here is a good read: http://unicode.org/notes/tn12/tn12-1.html

Thank you for having corrected my wrong information about Mac OS X.
I'm going to read the web page you linked.

Giovanni

Giovanni Dicanio

unread,

Sep 7, 2007, 5:14:51 AM9/7/07

to

"Joseph M. Newcomer" <newc...@flounder.com> ha scritto nel messaggio
news:d6n1e3938ejv9bb7j...@4ax.com...

> I somehow cannot get excited about this problem, given all the
> additional problems of complex code, possibility of error, cost of
> development and
> debugging, etc. that it would cost to use 8-bit characters.

I believe that both you and me (and others, of course) use Unicode for
strings.
My point was about UTF-16 vs UTF-8 (both *Unicode*, not ANSI 8 bits).

Giovanni

Giovanni Dicanio

unread,

Sep 7, 2007, 5:37:54 AM9/7/07

to

"Tom Serface" <tom.n...@camaswood.com> ha scritto nel messaggio

news:OjbtJNR8...@TK2MSFTNGP04.phx.gbl...

> OK, point, set, match... I can't argue with that one :o) Not to mention
> that arguing makes no sense since I'm a Unicode convert anyway.

Hi Tom,

Yes, ANSI is kind of computer archaeology in these days :)

G.

Tom Serface

unread,

Sep 7, 2007, 11:07:34 AM9/7/07

to

If I remember correctly, although technically incorrect, in regards to
Windows and MFC specifically, ANSI is MBCS. I've always considered SBCS to
just be a subset of MBCS.

Of course any program I've ever done has either MBCS or UNICODE defined so
perhaps that where I'm getting it.

Tom

"Mihai N." <nmihai_y...@yahoo.com> wrote in message
news:Xns99A4420...@207.46.248.16...

Tom Serface

unread,

Sep 7, 2007, 11:09:16 AM9/7/07

to

Unfortunately, we may think that philosophically, but I think there are
still more applications using MBCS than anything. The more people continue
to use VC 6 the more this will be the case in my opinion. MSFT should do
everything it can to make the VC6 people happy enough with a new version to
update. That would help the cause more than any other new feature.

Tom

"Giovanni Dicanio" <giovanni...@invalid.it> wrote in message

news:u3sgcLT8...@TK2MSFTNGP05.phx.gbl...

Giovanni Dicanio

unread,

Sep 7, 2007, 11:16:00 AM9/7/07

to

"Tom Serface" <tom.n...@camaswood.com> ha scritto nel messaggio

news:ufntWEW8...@TK2MSFTNGP05.phx.gbl...

> Unfortunately, we may think that philosophically, but I think there are
> still more applications using MBCS than anything. The more people
> continue to use VC 6 the more this will be the case in my opinion.

Hi Tom,

VC6 has no problem with Unicode...

http://www.mihai-nita.net/article.php?artID=20060723a

...Am I missing something?

G

Tom Serface

unread,

Sep 7, 2007, 11:28:25 AM9/7/07

to

Indeed, but it suggests MBCS by default and can't handle Unicode RC files.
I think it encourages programs to be MBCS with this behavior.

Tom

"Giovanni Dicanio" <giovanni...@invalid.it> wrote in message

news:uPRnQIW8...@TK2MSFTNGP06.phx.gbl...

PackAddict

unread,

Sep 7, 2007, 12:46:06 PM9/7/07

to

I'm struggling with a similar issue. I have encryption algorithms peppered
throught some legacy code (some written in VB3). They don't work when a
regional code page that is Unicode based is selected.

Is there a way to override the system regional code page setting to force a
VB 6 application to use "English (United States)"?

Joseph M. Newcomer

unread,

Sep 7, 2007, 5:28:55 PM9/7/07

to

Encryption should be independent of locale, so I'm curious how there could be a problem.
In addition, "algorithms peppered throughout" the code suggests that there are deeper
architectural problems, since in most cases there is exactly ONE instance of the
algorithm, in ONE place. They are probably written in terms of char*, which assumes a
NUL-terminated 8-bit character string, which makes them instantly obsolete. They should
be written in terms of counted byte strings, not character strings.
joe

On Fri, 7 Sep 2007 09:46:06 -0700, PackAddict <PackA...@discussions.microsoft.com>
wrote:

Mihai N.

unread,

Sep 8, 2007, 1:25:44 AM9/8/07

to

> Indeed, but it suggests MBCS by default and can't handle Unicode RC files.

This is still true for VS 2003.
VS 2005 was the first one to switch (and still bugy at that).

Mihai N.

unread,

Sep 8, 2007, 1:32:25 AM9/8/07

to

> If I remember correctly, although technically incorrect, in regards to
> Windows and MFC specifically, ANSI is MBCS. I've always considered SBCS to
> just be a subset of MBCS.

I would agree that SBCS is just be a subset of DBCS,
and DBCS a subset of MBCS.
ANSI is the MBCS the that is currenty system code page :-)

The MS lingo in this area is a mess, so one should be pretty
flexible with the definitions here :-)

For a programmer the only important part is: what are the implications
of defining _MBCS / UNICODE / _UNICODE / nothing?

Giovanni Dicanio

unread,

Sep 8, 2007, 5:20:31 AM9/8/07

to

"Mihai N." <nmihai_y...@yahoo.com> ha scritto nel messaggio
news:Xns99A4E42A...@207.46.248.16...

>> Indeed, but it suggests MBCS by default and can't handle Unicode RC
>> files.
>
> This is still true for VS 2003.
> VS 2005 was the first one to switch (and still bugy at that).

So, in VC6 or VS2003 Unicode-built app we can't have e.g. a string-table
resource with Japanese characters in Unicode?

Is there any workaround?

Should we use external custom file encoded e.g. in UTF-8 and read it and
convert it dynamically to UTF-16?

Thanks in advance,
Giovanni

Tom Serface

unread,

Sep 8, 2007, 12:06:40 PM9/8/07

to

OK, so same point, only I think that VS 2005 allows people to use Unicode RC
files without saving them outside the IDE.

I think people have more trouble updating from VC6 to VS.NET than they do
updating to any other version since then. I think it would make sense for
Microsoft to make a really easy upgrade path from VC6/VS6 to VS 2008 to
encourage people to move up.

Tom

"Mihai N." <nmihai_y...@yahoo.com> wrote in message

news:Xns99A4E42A...@207.46.248.16...

Tom Serface

unread,

Sep 8, 2007, 12:08:04 PM9/8/07

to

No doubt about it being a mess. I think it would be wise to just abandon
ANSI/MBCS/Whatever and go to Unicode with good file handling for UTF-8 in a
future version of MFC. Dropping ANSI doesn't seem to keep anyone from using
C# and .NET.

Tom

"Mihai N." <nmihai_y...@yahoo.com> wrote in message

news:Xns99A4E54C...@207.46.248.16...

Tom Serface

unread,

Sep 8, 2007, 12:10:16 PM9/8/07

to

Hi Giovanni,

In 2003 you can have a Unicode RC file, but it is initially created in MBCS
and you just have to open the .RC file in Notepad then save it back as
Unicode. The IDE will use it after that. I think 2005 creates them as
Unicode in the first place.

In VC6 and 2003 (using ANSI) it relies on the codepage and fonts to do the
correct characters so you can have Japanese, but it wouldn't be Unicode. I
think there are some characters that MBCS can't handle, but I don't know
what they are off hand.

Tom

"Giovanni Dicanio" <giovanni...@invalid.it> wrote in message

news:eRG6Umf8...@TK2MSFTNGP03.phx.gbl...

Joseph M. Newcomer

unread,

Sep 8, 2007, 10:01:13 PM9/8/07

to

Actually, I think you can, but you have to use \xNNNN syntax and mark it as a wide string
because the text is stored in 8-bit characters. But the resource compiler will do the
right thing. For example

IDS_MU L"Gray Cats say \x03BC!"

will produce the right result. The problem is that I'm no longer sure how to produce the
L" form of the string short of hand-editing, and if you just type in \x it converts it to
\\x. But it works, and the correct result is displayed providing the font you have has
the Greek letter 'mu' in it.
joe

Mihai N.

unread,

Sep 8, 2007, 10:42:30 PM9/8/07

to

> So, in VC6 or VS2003 Unicode-built app we can't have e.g. a string-table
> resource with Japanese characters in Unicode?

The compiled resource files are always Unicode.
The source resource files (.rc) can be Unicode, but you cannot edit them with
the resource editor in VS 6/2002/2003

Ok, in VS 2005 editor, if you know about some of the bugs:
- The RichEdit controls in dialogs are always ansi
(http://www.mihai-nita.net/article.php?artID=20050709b)
- The .rc is not Unicode unless you ask for it
(http://www.mihai-nita.net/article.php?artID=20051030a)
- The DLGINIT used for combo-boxes in MFC is always ansi
(I have reported it for Orcas, marked as fixed)

And only UTF-16LE is supported (no UTF-8!)

> Is there any workaround?
> Should we use external custom file encoded e.g. in UTF-8 and read it and
> convert it dynamically to UTF-16?

Set the system locale to Japanese and reboot.
(http://www.mihai-nita.net/article.php?artID=20050611a)
It is the best option, because you need WYSIWYG for proper resizing.

This might also come in handy:
http://www.mihai-nita.net/article.php?artID=20070503a

Giovanni Dicanio

unread,

Sep 9, 2007, 6:47:00 AM9/9/07

to

Thank you!

Giovanni

----

"Mihai N." <nmihai_y...@yahoo.com> ha scritto nel messaggio

news:Xns99A5C87C...@207.46.248.16...

Norman Diamond

unread,

Sep 9, 2007, 8:46:17 PM9/9/07

to

"Giovanni Dicanio" <giovanni...@invalid.it> wrote in message

news:eRG6Umf8...@TK2MSFTNGP03.phx.gbl...

>
> "Mihai N." <nmihai_y...@yahoo.com> ha scritto nel messaggio
> news:Xns99A4E42A...@207.46.248.16...
>>> Indeed, but it suggests MBCS by default and can't handle Unicode RC
>>> files.
>>
>> This is still true for VS 2003.
>> VS 2005 was the first one to switch (and still bugy at that).
>
> So, in VC6 or VS2003 Unicode-built app we can't have e.g. a string-table
> resource with Japanese characters in Unicode?

It's the other way around. In VC6 or VS2003 Unicode-built apps, we can't
have e.g. a string-table resource with any NON-JAPANESE characters in
Unicode.

> Is there any workaround?

Use Notepad to edit the RC file. (Facts are funnier than jokes, eh?)

Tom Serface

unread,

Sep 10, 2007, 11:18:08 AM9/10/07

to

Hi Norman,

You'd be surprised how many times I do this sort of thing. The problem is
with pre-2005 versions if you edit the wrong resource by mistake the RC
editor would trash all of your other resources (yielding ???) unless you
were in the correct region (locale) while editing. Fortunately, this
doesn't seem to be a problem with Unicode RC files. Still, I use Notepad to
make some changes since the search and replace works so much nicer :o)

Tom

"Norman Diamond" <ndia...@community.nospam> wrote in message
news:OfFqHQ08...@TK2MSFTNGP03.phx.gbl...

PackAddict

unread,

Sep 10, 2007, 11:52:14 AM9/10/07

to

I probably described the problem wrong.

A better description might be that there are Asc and Chr function calls
throughout the code, not duplication of algorithms throughout the code.
Those Asc and Chr function calls cause problems when a Chinese code page is
set as the default language. Each time that we hit a hex value with no
corresponding Ascii value in the code page, we get the "?' returned.
Needless to say, that causes some significant discrepences when
encrypting/decrypting a string of data.

I figured that I was going to have to move to byte arrays, but thought I'd
take a stab in the dark at a solution that would allow me to just overrid the
code page.

Giovanni Dicanio

unread,

Sep 10, 2007, 3:34:49 PM9/10/07

to

"Tom Serface" <tom.n...@camaswood.com> ha scritto nel messaggio
news:e95Dg378...@TK2MSFTNGP03.phx.gbl...

> You'd be surprised how many times I do this sort of thing.

Well, also the graphics/image-editing capabilities of Visual Studio are not
great, so also to edit images it is good to go to external "ad hoc"
programs...

G

Joseph M. Newcomer

unread,

Sep 15, 2007, 11:56:54 PM9/15/07

to

Since VB3 is an antique, I'm not at all sure what it was doing with its functions. So I
can't even begin to think about what might be going on (I thought VB3 was the Win16
version!)
joe

On Mon, 10 Sep 2007 08:52:14 -0700, PackAddict <PackA...@discussions.microsoft.com>

Message has been deleted

Uwe Kotyczka

unread,

Apr 27, 2020, 5:18:12 AM4/27/20

to

On Thursday, September 6, 2007 at 8:22:57 AM UTC+2, Marco Hung wrote:
> [...]
> wchar_t* str = [...]; // generate unicode string properly
> SetWindowTextW(GetDlgItem(IDC_ED_CMDRESULT)->GetSafeHwnd(), str);

I know, I'm a little late (almost 13 years in fact). But I just had
to face the same problem and found a solution, which is not mentioned
in this old thread.

In my case I wanted to show an "arrow up" sign on a square button.
I found that SetWindowTextW was working and was happy. But then I
complied the very same project on another computer and found that
the button did not show an "arrow up", but a "question mark" instead.
Digging around with GetWindowTextW I found that the button returned
diffrerent wide character strings on both machines.

I found this working on both computers:

BOOL WINAPI SafeSetWindowTextW(HWND hWnd, LPCWSTR lpString)
{
// switch to DefWindowProcW
LONG_PTR originalWndProc = GetWindowLongPtrW(hWnd, GWLP_WNDPROC);
SetWindowLongPtrW(hWnd, GWLP_WNDPROC, (LONG_PTR) DefWindowProcW);

// set window text
BOOL bResult = SetWindowTextW(hWnd, lpString);

// switch to back to originalWndProc
SetWindowLongPtrW(hWnd, GWLP_WNDPROC, originalWndProc);

return bResult;
}

LOGFONT lf;
HFONT hOrigFont = (HFONT)SendMessage(GetDlgItem(hDlg, IDC_BUTTON_SHIFT_UP), WM_GETFONT, 0, 0);
GetObject(hOrigFont, sizeof(lf), &lf);
lf.lfHeight *= 2;
memset(lf.lfFaceName, 0, sizeof(lf.lfFaceName));
strncpy(lf.lfFaceName, "Consolas", min(sizeof(lf.lfFaceName)-1, strlen("Consolas")));
HFONT hOtherFont = CreateFontIndirect(&lf);
SendMessage(GetDlgItem(hDlg, IDC_BUTTON_SHIFT_UP), WM_SETFONT, (WPARAM)hOtherFont, 0);
SendMessage(GetDlgItem(hDlg, IDC_BUTTON_SHIFT_DOWN), WM_SETFONT, (WPARAM)hOtherFont, 0);
SafeSetWindowTextW(GetDlgItem(hDlg, IDC_BUTTON_SHIFT_UP), L"\x2191");
SafeSetWindowTextW(GetDlgItem(hDlg, IDC_BUTTON_SHIFT_DOWN), L"\x2193");

Of course this will not work on Win9x, but it shouldn't be a problem nowadays.

HTH

Pitzelpatz

unread,

May 20, 2020, 8:09:06 AM5/20/20

to

// switch to DefWindowProcW
> LONG_PTR originalWndProc = GetWindowLongPtrW(hWnd, GWLP_WNDPROC);
> SetWindowLongPtrW(hWnd, GWLP_WNDPROC, (LONG_PTR) DefWindowProcW);
>
> // set window text
> BOOL bResult = SetWindowTextW(hWnd, lpString);
>
> // switch to back to originalWndProc
> SetWindowLongPtrW(hWnd, GWLP_WNDPROC, originalWndProc);

Hi Uwe,

great! Thank you, I had the same Problem today and your solution worked
for me.

Br
Christian