Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ANSI string from UNICODE app.

27 views
Skip to first unread message

William GS

unread,
Jul 28, 2006, 5:31:01 PM7/28/06
to
Hello everybody.
I have to read a string from registry, this string is ANSI, when I read it
with ANSI version of my app, all is fine, but if I read it with UNICODE
version of my app, the string obtained is wrong, so, how can I read a ANSI
string (from registry) from a UNICODE app?.

Thansk in advance,
William GS

David Lowndes

unread,
Jul 28, 2006, 7:23:54 PM7/28/06
to
>I have to read a string from registry, this string is ANSI, when I read it
>with ANSI version of my app, all is fine, but if I read it with UNICODE
>version of my app, the string obtained is wrong, so, how can I read a ANSI
>string (from registry) from a UNICODE app?.

William,

A REG_SZ value will be read as Unicode or ANSI depending on whether
you use the Unicode or ANSI version of the API - so the quick answer
to your question is to explicitly use the ANSI API - RegQueryValueExA.


Dave

Mihai N.

unread,
Jul 29, 2006, 4:09:25 AM7/29/06
to
> I have to read a string from registry, this string is ANSI, when I read it
> with ANSI version of my app, all is fine, but if I read it with UNICODE
> version of my app, the string obtained is wrong, so, how can I read a ANSI
> string (from registry) from a UNICODE app?.
If when you read it with the Unicode app and is wrong, then something is
wrong with the calls in the application.
The strings in registry are Unicode. Reading it as ANSI from an Unicode
application might be a work-around, but is better if you manage to find the
real problem causing this.


--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email

GS@discussions.microsoft.com William GS

unread,
Jul 29, 2006, 11:34:01 AM7/29/06
to
"Mihai N." wrote:

> If when you read it with the Unicode app and is wrong, then something is
> wrong with the calls in the application.
> The strings in registry are Unicode. Reading it as ANSI from an Unicode
> application might be a work-around, but is better if you manage to find the
> real problem causing this.
>
>
> --
> Mihai Nita [Microsoft MVP, Windows - SDK]
> http://www.mihai-nita.net
> ------------------------------------------
> Replace _year_ with _ to get the real email
>


Maybe my code for read the string is not ok. This is the code:

lReturn = ::RegQueryValueEx (hKey, pszKey, NULL, NULL, NULL, &dwSize);
if (lReturn != ERROR_SUCCESS)
return FALSE;
char *pszChain = (char*)malloc(dwSize + 1);
if (!pszChain)
return FALSE;
dwSize++;
lReturn = ::RegQueryValueEx (hKey, pszKey, NULL, &dwType, (LPBYTE)pszChain,
&dwSize);
if (lReturn != ERROR_SUCCESS)
{
free(pszChain);
return FALSE;
}
//...
//...
free(pszChain);

Thanks in advance,
William GS

David Lowndes

unread,
Jul 29, 2006, 12:26:03 PM7/29/06
to
>Maybe my code for read the string is not ok. This is the code:
>
>lReturn = ::RegQueryValueEx (hKey, pszKey, NULL, NULL, NULL, &dwSize);
> if (lReturn != ERROR_SUCCESS)
> return FALSE;
> char *pszChain = (char*)malloc(dwSize + 1);

To handle both an ANSI and Unicode build, you ought to have a TCHAR *
(or LPTSTR) for pszChain - unless you specifically call
RegQueryValueExA. The docs say the null terminator is already included
in the length.

Dave

Vipin

unread,
Jul 29, 2006, 5:14:48 PM7/29/06
to
Besides the other comments you are getting, why don't you use a neatened
code like this:-

lReturn = ::RegQueryValueEx (hKey, pszKey, NULL, NULL, NULL, &dwSize);
if (lReturn != ERROR_SUCCESS)
return FALSE;

CString str;


lReturn = ::RegQueryValueEx (hKey, pszKey, NULL, &dwType, (LPBYTE)

str.GetBuffer(dwSize), &dwSize);
str.ReleaseBuffer();


if (lReturn != ERROR_SUCCESS)
return FALSE;

--
Vipin Aravind
http://www.explorewindows.com


"William GS" <William G...@discussions.microsoft.com> wrote in message
news:6F96E7CB-E5DF-42CC...@microsoft.com...

Joseph M. Newcomer

unread,
Jul 29, 2006, 11:13:08 PM7/29/06
to
Strings are stored in the Registry as Unicode. Thus, when a Unicode app retrieves a
string value, it sees the Unicode string. When an ANSI app retrieves the string value, it
is downconverted to an ANSI string (with possible data lossage if it can't map to 8-bit).

When a string is set in the Registry, it is converted to Unicode.

Note that if you store the string as a Byte type (REG_BINARY) then it will fail, and the
simple answer is 'don't do that'.

I have apps where I have both Unicode and ANSI versions of the app and they work perfectly
fine storing things in the Registry. Either app can store or retrieve. Can you explain
what symptoms you are seeing that convinces you the string is "wrong"?
joe

On Fri, 28 Jul 2006 14:31:01 -0700, William GS <Will...@discussions.microsoft.com>
wrote:

Joseph M. Newcomer [MVP]
email: newc...@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

Joseph M. Newcomer

unread,
Jul 29, 2006, 11:21:09 PM7/29/06
to
See below...
On Sat, 29 Jul 2006 08:34:01 -0700, William GS <William G...@discussions.microsoft.com>
wrote:

>"Mihai N." wrote:
>
>> If when you read it with the Unicode app and is wrong, then something is
>> wrong with the calls in the application.
>> The strings in registry are Unicode. Reading it as ANSI from an Unicode
>> application might be a work-around, but is better if you manage to find the
>> real problem causing this.
>>
>>
>> --
>> Mihai Nita [Microsoft MVP, Windows - SDK]
>> http://www.mihai-nita.net
>> ------------------------------------------
>> Replace _year_ with _ to get the real email
>>
>
>
>Maybe my code for read the string is not ok. This is the code:
>
>lReturn = ::RegQueryValueEx (hKey, pszKey, NULL, NULL, NULL, &dwSize);
> if (lReturn != ERROR_SUCCESS)
> return FALSE;
> char *pszChain = (char*)malloc(dwSize + 1);

*****
RIGHT HERE YOU ARE IN TROUBLE! Why are you using a char * variable in a Unicode app???

FORGET that char and char * exist except in very rare and exotic circumstances, and this
is not one of them. Never forget to multiply or divide appropriately by sizeof(TCHAR).
Note when you write the Unicode string, you must write
string_length * sizeof(TCHAR)
(for your string_length value in characters) as the size, or it will be stored
incorrectly, and hence retrieved as a bogus value.

LPTSTR pszChain = new TCHAR[ (dwSize + sizeof(TCHAR)) / sizeof(TCHAR)];
*****


> if (!pszChain)
> return FALSE;
> dwSize++;

*****
Why are you incrementing this here? You are only adding one byte, and Unicode would
require 2 bytes
dwSize += sizeof(TCHAR);
would make sense, but ++ does not
*****


> lReturn = ::RegQueryValueEx (hKey, pszKey, NULL, &dwType, (LPBYTE)pszChain,
>&dwSize);
> if (lReturn != ERROR_SUCCESS)
> {
> free(pszChain);
> return FALSE;
> }
> //...
> //...
> free(pszChain);
>
>Thanks in advance,
>William GS

Mihai N.

unread,
Jul 30, 2006, 4:24:51 AM7/30/06
to
> Maybe my code for read the string is not ok. This is the code:
The code is ok.

> char *pszChain = (char*)malloc(dwSize+1);
...
> dwSize++;
The function returns the size needed for the zero, si no need to increment.
But there is no harm in it :-)

Basically the function is not designed to return strings, but binary data.
Now, if the dwType is REG_SZ, the bytes are in fact "Null-terminated string.
It will be a Unicode or ANSI string, depending on whether you use the Unicode
or ANSI functions." (the doc).

So, since you are calling the Unicode API, those bytes can be cast to TCHAR*.
Or, even easier, take your code, as it is, and change


char *pszChain = (char*)malloc(dwSize + 1);

to
TCHAR *pszChain = (TCHAR*)malloc(dwSize);

You can also remove dwSize++, and all it fine (I have actually tried it :-)

William GS

unread,
Jul 30, 2006, 10:10:02 AM7/30/06
to
"Joseph M. Newcomer" wrote:

> Strings are stored in the Registry as Unicode. Thus, when a Unicode app retrieves a
> string value, it sees the Unicode string. When an ANSI app retrieves the string value, it
> is downconverted to an ANSI string (with possible data lossage if it can't map to 8-bit).
>
> When a string is set in the Registry, it is converted to Unicode.
>
> Note that if you store the string as a Byte type (REG_BINARY) then it will fail, and the
> simple answer is 'don't do that'.
>
> I have apps where I have both Unicode and ANSI versions of the app and they work perfectly
> fine storing things in the Registry. Either app can store or retrieve. Can you explain
> what symptoms you are seeing that convinces you the string is "wrong"?
> joe
>

Thanks to everyone. The symptom is: if I read it with ANSI version of my
app, all is fine, if I read it with UNICODE version, the string has strange
characters and the length is wrong (is greater). I will replace char with
TCHAR and appropiate sizeof.

William GS

Joseph M. Newcomer

unread,
Jul 30, 2006, 5:29:38 PM7/30/06
to
The "length" should be twice the length in the ANSI mode, because it is returning a
Unicode string which is twice as long. When reporting problems, don't use vague terms
like "is greater". Say something like "The string length expected was 4, and I saw 8".
That is useful and informative. It also says that your string is the correct length and
the call is working correctly. Explain why you think it has "strange characters". What
did you use to display the characters? Did you look at the string in the debugger? Did
you look at the string with RegEdit? All of this is useful information, and without it we
are depending upon ethereal vibrations to carry this information to us, and since the
ether doesn't exist, we find this quite challenging to deduce what is going on from such
vague descriptions.
joe

On Sun, 30 Jul 2006 07:10:02 -0700, William GS <Will...@discussions.microsoft.com>
wrote:

>"Joseph M. Newcomer" wrote:

Joseph M. Newcomer [MVP]

Mihai N.

unread,
Jul 31, 2006, 4:53:46 AM7/31/06
to
Joseph M. Newcomer <newc...@flounder.com> wrote in
news:b19qc21s4daa8vffp...@4ax.com:

> The "length" should be twice the length in the ANSI mode, because it is
> returning a Unicode string which is twice as long.

Not in this case. The api does not return a string, but a bunch of bytes.
Sometimes "it might happen" that the bytes are a valid Unicode string.

Mihai N.

unread,
Jul 31, 2006, 4:54:37 AM7/31/06
to
> Thanks to everyone. The symptom is: if I read it with ANSI version of my
> app, all is fine, if I read it with UNICODE version, the string has strange
> characters and the length is wrong (is greater). I will replace char with
> TCHAR and appropiate sizeof.
Please read my post, and do the changes recomended there. It works, is
tested.

Norman Diamond

unread,
Jul 31, 2006, 6:49:33 AM7/31/06
to
"Joseph M. Newcomer" <newc...@flounder.com> wrote in message
news:bv8oc2totdioj35vr...@4ax.com...

> On Sat, 29 Jul 2006 08:34:01 -0700, William GS <William
> G...@discussions.microsoft.com> wrote:

>>lReturn = ::RegQueryValueEx (hKey, pszKey, NULL, NULL, NULL, &dwSize);
>> if (lReturn != ERROR_SUCCESS)
>> return FALSE;
>> char *pszChain = (char*)malloc(dwSize + 1);
> *****
> RIGHT HERE YOU ARE IN TROUBLE! Why are you using a char * variable in a
> Unicode app???

Because RegQueryValueEx reports data length in bytes no matter what the
datatype and no matter whether the compilation environment is Unicode or
not. Of course adding 1 byte to the length is still wrong.

I would have set dwSize to 0 before the first call to RegQueryValueEx but
I'm not quite sure if that's necessary. The first call sets dwSize to the
number of bytes needed.

Joseph M. Newcomer

unread,
Jul 31, 2006, 4:32:39 PM7/31/06
to
See below...

On Mon, 31 Jul 2006 19:49:33 +0900, "Norman Diamond" <ndia...@community.nospam> wrote:

>"Joseph M. Newcomer" <newc...@flounder.com> wrote in message
>news:bv8oc2totdioj35vr...@4ax.com...
>> On Sat, 29 Jul 2006 08:34:01 -0700, William GS <William
>> G...@discussions.microsoft.com> wrote:
>
>>>lReturn = ::RegQueryValueEx (hKey, pszKey, NULL, NULL, NULL, &dwSize);
>>> if (lReturn != ERROR_SUCCESS)
>>> return FALSE;
>>> char *pszChain = (char*)malloc(dwSize + 1);
>> *****
>> RIGHT HERE YOU ARE IN TROUBLE! Why are you using a char * variable in a
>> Unicode app???
>
>Because RegQueryValueEx reports data length in bytes no matter what the
>datatype and no matter whether the compilation environment is Unicode or
>not. Of course adding 1 byte to the length is still wrong.

****
Oh. Then why are you using char when you should be using BYTE? Why are you using malloc
of bytes when yoou could be using new (this IS C++, you know; a good touchstone is if you
write malloc you've made a mistake).

LPTSTR pszChain = new TCHAR[dwSize / sizeof(TCHAR)];
does the correct allocation and returns an object of the correct type to be used as a
string buffer.
joe
****


>
>I would have set dwSize to 0 before the first call to RegQueryValueEx but
>I'm not quite sure if that's necessary. The first call sets dwSize to the
>number of bytes needed.

Joseph M. Newcomer

unread,
Jul 31, 2006, 4:34:56 PM7/31/06
to
If the type is REG_SZ, REG_MULTI_SZ, or REG_EXPAND_SZ, the bytes returned represent
characters. So there is no point to interpreting what the values are until it is known
that the type is one of the acceptable string types. Typically, if the program is
expecting REG_SZ, any other type should generate an error that the Registry value is
invalid by whatever error reporting mechanism is appropriate for the application. I had
presumed that the allegation that the string contained garbage was based on a well-behaved
program that knew the value was REG_SZ.
joe

On Mon, 31 Jul 2006 01:53:46 -0700, "Mihai N." <nmihai_y...@yahoo.com> wrote:

>Joseph M. Newcomer <newc...@flounder.com> wrote in
>news:b19qc21s4daa8vffp...@4ax.com:
>
>> The "length" should be twice the length in the ANSI mode, because it is
>> returning a Unicode string which is twice as long.
>Not in this case. The api does not return a string, but a bunch of bytes.
>Sometimes "it might happen" that the bytes are a valid Unicode string.

Norman Diamond

unread,
Jul 31, 2006, 9:34:38 PM7/31/06
to
"Joseph M. Newcomer" <newc...@flounder.com> wrote in message
news:k3qsc259f12qchdkk...@4ax.com...

> On Mon, 31 Jul 2006 19:49:33 +0900, "Norman Diamond"
> <ndia...@community.nospam> wrote:
>>"Joseph M. Newcomer" <newc...@flounder.com> wrote in message
>>news:bv8oc2totdioj35vr...@4ax.com...
>>> On Sat, 29 Jul 2006 08:34:01 -0700, William GS <William
>>> G...@discussions.microsoft.com> wrote:
>>>
>>>>lReturn = ::RegQueryValueEx (hKey, pszKey, NULL, NULL, NULL, &dwSize);
>>>> if (lReturn != ERROR_SUCCESS)
>>>> return FALSE;
>>>> char *pszChain = (char*)malloc(dwSize + 1);
>>> *****
>>> RIGHT HERE YOU ARE IN TROUBLE! Why are you using a char * variable in a
>>> Unicode app???
>>
>>Because RegQueryValueEx reports data length in bytes no matter what the
>>datatype and no matter whether the compilation environment is Unicode or
>>not. Of course adding 1 byte to the length is still wrong.
>
> Oh. Then why are you using char when you should be using BYTE?

The original poster will have to answer that. Though for myself there have
been occasions when I've declared things as char for either of at least two
possible reasons:
(1) force of habit, since the C language had type char before it (in some
implementations) had type BYTE, and char does mean something the size of a
byte (not necessarily the size of a character) though we can't make any
assumptions about the signedness of char.
(2) specs in the standard and/or MSDN and/or SDK header files.

> Why are you using malloc of bytes when yoou could be using new

The original poster might have an answer for this too, but in this case I do
too. I still call realloc() a lot. The C++ language still lacks renew.

Mihai N.

unread,
Aug 1, 2006, 2:04:26 AM8/1/06
to
> If the type is REG_SZ, REG_MULTI_SZ, or REG_EXPAND_SZ, the bytes returned
> represent characters. So there is no point to interpreting what the values
> are until it is known that the type is one of the acceptable string types.
True. << Sometimes "it might happen" >> was just a figure of speach.
Maybe not the best idea though. Between "English as a second language" for me
and for others reading the answer, it can be quite confusing.

Joseph M. Newcomer

unread,
Aug 1, 2006, 10:58:03 AM8/1/06
to
I didn't see any ESL error here. It is true that in general, RegQueryValueEx returns a
sequence of bytes, but if it is a REG_SZ, then it will be a string.

I encountered at least one situation in which the programmer didn't check the return type,
and was expecting a REG_DWORD value. However, the user had hand-edited the value, and
made it a string. Since the number was more than four digits, he got four bytes which he
then interpreted as a REG_DWORD (even though it came back as a REG_DWORD), and the program
promptly proceeded to do immense destruction to his data. This is why I consider it
mandatory to check for the type. But if the type is one of the string types, you should
safely assume the bytes are a string.
joe


On Mon, 31 Jul 2006 23:04:26 -0700, "Mihai N." <nmihai_y...@yahoo.com> wrote:

>> If the type is REG_SZ, REG_MULTI_SZ, or REG_EXPAND_SZ, the bytes returned
>> represent characters. So there is no point to interpreting what the values
>> are until it is known that the type is one of the acceptable string types.
>True. << Sometimes "it might happen" >> was just a figure of speach.
>Maybe not the best idea though. Between "English as a second language" for me
>and for others reading the answer, it can be quite confusing.

Joseph M. Newcomer

unread,
Aug 1, 2006, 11:21:27 AM8/1/06
to
My standard line is "assume char does not exist except in rare and exotic situations. When
you encounter one of these, you will recognize it".

The number of times I've been nuked, and seen others nuked, by using 'char' when they
meant 'unsigned char' (aka BYTE) that I've really learned to avoid it. When I want byte,
I use byte. I also tend to use CByteArray or CArray<BYTE, BYTE> rather than do any form
of explicit allocation if all I need it only within a limited scope, e.g.,
CByteArray data;
data.SetSize(n);
,,,
is a better approach than
LPBYTE data = new BYTE[n];
...
delete [] data;

because you don't have to worry about any deallocation issue; the data type deletes the
space in its destructor.

The only time you would find 'char' in a header file is when you have a situation where
you need an actual 8-bit character string. There are very few times this is ever
required. For example, strcpy uses char, but you wouldn't write strcpy in Unicode-aware
code (you might use tcscpy, or one of the safe functions, such as StringCchCopy, which are
Unicode-aware). So there are very few situations in which you would find this need in
Unicode-aware code [I've delivered Unicode-aware programs on the order of 100K lines, and
had very few problems in doing the conversions; typically one or two places where I forget
the appropriate conversion by sizeof(TCHAR), and I make very few of those errors the
longer I program].

The places where char appears are in handling 8-bit character data files and 8-bit
character network transfers; those qualify as the exotic situations. I convert them to
"native character set" (that is, nothing happens in an ANSI app and they get expanded to
Unicode in a Unicode app) and continue with the rest of the program.

Ultimately, except at the exotic interface case to an 8-bit world, I have encountered
almost no situations in which I had a compulsion to use char when TCHAR would do. This
means that when a client says to me, "how long will it take to convert this to Unicode" I
can say "I'll call you back in ten minutes". [This is a bit of a cheat...Unicode is only
a small part of localization, but I tend to, more and more these days, write locale-aware
code as well]. In at least six recent cases, I converted to Unicode and kept coding
without slowing down. Everything worked the first time.

By the way, have you ever studied realloc and discovered how often it ends up doing a
copy? In one application, I found that realloc gave surprisingly poor performance (this
was done by putting instrumentation code under an #ifdef and discovering how apalling the
statistics were). Using CByteArray, for example, you can use Add, and use a second
parameter to SetSize to specify the additional allocation to use when expanding. This
usually gives much better performance than realloc.
joe

Doug Harrison [MVP]

unread,
Aug 1, 2006, 4:50:51 PM8/1/06
to
On Tue, 01 Aug 2006 11:21:27 -0400, Joseph M. Newcomer
<newc...@flounder.com> wrote:

>By the way, have you ever studied realloc and discovered how often it ends up doing a
>copy? In one application, I found that realloc gave surprisingly poor performance (this
>was done by putting instrumentation code under an #ifdef and discovering how apalling the
>statistics were). Using CByteArray, for example, you can use Add, and use a second
>parameter to SetSize to specify the additional allocation to use when expanding. This
>usually gives much better performance than realloc.

But IIRC, it's still a linear reallocation policy, so serial reallocation
is ultimately O(n^2) in the final size of the array. So maybe you use 1K as
the growth increment, which sounds pretty big, until you consider growing
from nothing to 1MB. If each growth operation requires reallocation and
copying, that's going to be bad. To get amortized constant time, you have
to grow exponentially, say, by a factor of 1.5 each time. I once had to
implement this on top of CSharedFile to get decent performance; here's what
I wrote at the time to describe my HGLOBAL_File:

// This class implements an exponential regrowth policy, because
// CSharedFile is linear and has horrible performance even if the regrowth
// block size is the default 4096 bytes. We're talking half a second vs.
// > 10 sec to write 2 MB.

If you have a hard upper limit on the array size, it's possible to grow the
array but eliminate copying by using the VirtualAlloc API, but often
there's no such limit, and exponential growth is the best general purpose
alternative I know of.

--
Doug Harrison
Visual C++ MVP

Norman Diamond

unread,
Aug 1, 2006, 8:33:54 PM8/1/06
to
Some of the Win32 APIs use only ANSI strings, no Unicode version, even under
Windows CE. An example is GetAdaptersInfo.
#ifdef _UNICODE
More calls to WideCharToMultiByte and MultiByteToWideChar
#else
Antique but ISO standard calls to strcpy etc.
#endif

In all cases that I can remember calling realloc, it doesn't matter if
realloc does copying internally, because if I managed the memory manually
then I'd have to do the same copying myself anyway. If the amount of data
waiting to be stored is larger than the original estimate then the storage
space needs to grow.

Of course if I store chunks of strings in a linked list (or store chunks of
arrays in linked lists etc.) then I don't need to do occasional reallocs
with their occasional copies, but then I get to traverse linked lists all
the time instead of doing simple address calculations.

When I call realloc, I tend to grow the amount of memory neither
arithmetically nor exponentially, but geometrically. Some MFC classes are
really convenient to use, so in situations where it's convenient to use MFC,
I do use them with their arithmetical growth and larger numbers of copies.
If performance is a problem THEN I go back and use more development time to
cut down the execution time.

P.S. Yes I know you (Dr. Newcomer) wrote the book on memory allocation. I
own your book on device drivers not the book that includes memory
allocation, and I do mean this with all due respect, but when reasons arise
for respectful disagreement that is what we have.


"Joseph M. Newcomer" <newc...@flounder.com> wrote in message

news:i0ruc25d6pn15e95g...@4ax.com...

Mihai N.

unread,
Aug 2, 2006, 4:26:08 AM8/2/06
to
> I encountered at least one situation in which the programmer didn't check
> the return type, and was expecting a REG_DWORD value. However, the user
> had hand-edited the value, and made it a string.
LOL! Indeed, this should be the standard "preventive programming" :-)

Mihai N.

unread,
Aug 2, 2006, 4:33:37 AM8/2/06
to
All very good advice. I would add only two points:


> typically one or two places where I forget
> the appropriate conversion by sizeof(TCHAR)

I had very good experience by defining
#define DIM(s) (sizeof(s)/sizeof(s[0]))
then I use it whenever I need something for "character count"
You just have to be carefull not to use it on pointers :-)
I also had a C++ construct complaining about pointers, but now
VS2005 added _countof doing exactly that.


> This is a bit of a cheat...Unicode is only
> a small part of localization,

I guess is a slip and you mean internationalization, not localization

Doug Harrison [MVP]

unread,
Aug 2, 2006, 1:21:16 PM8/2/06
to
On Wed, 02 Aug 2006 01:33:37 -0700, "Mihai N." <nmihai_y...@yahoo.com>
wrote:

>All very good advice. I would add only two points:


>
>
>> typically one or two places where I forget
>> the appropriate conversion by sizeof(TCHAR)
>
>I had very good experience by defining
> #define DIM(s) (sizeof(s)/sizeof(s[0]))
>then I use it whenever I need something for "character count"
>You just have to be carefull not to use it on pointers :-)
>I also had a C++ construct complaining about pointers, but now
>VS2005 added _countof doing exactly that.

Yes, VS2005 is using a little template hackery to exclude pointers and turn
their use into a compile-time error. It's a neat variation on the usual
method of defining a function template that returns the size:

template <typename _CountofType, size_t _SizeOfArray>
char (*__countof_helper(UNALIGNED _CountofType
(&_Array)[_SizeOfArray]))[_SizeOfArray];

#define _countof(_Array) sizeof(*__countof_helper(_Array))

Breaking it down, __countof_helper declares a function whose parameter is
an array reference, so only an array will do here. It returns a pointer to
a char array having the same number of elements as the argument. The macro
_countof dereferences this return value, which becomes the operand to
sizeof, yielding the number of elements in the _countof argument. What's
interesting is that __countof_helper has no body, yet _countof appears to
call the function. The trick is, sizeof doesn't evaluate its argument, so
this is all done purely at compile-time by looking at the types involved.

Mihai N.

unread,
Aug 3, 2006, 2:18:10 AM8/3/06
to
> Yes, VS2005 is using a little template hackery to exclude pointers and turn
> their use into a compile-time error. It's a neat variation on the usual
> method of defining a function template that returns the size:
I know, I have looked to see how they did it.
My thing was this:

==============================================
#ifdef __cplusplus

#include <assert.h>

template <typename T> struct IsPointer {
inline size_t operator()() { return false; };
};
template <typename T> struct IsPointer<T*> {
inline size_t operator()() { return true; };
};

template <typename T>
inline size_t COUNTOF( const T &t ) {
assert ( !IsPointer<T>()() );
return sizeof(t)/sizeof(t[0]);
}

#else // __cplusplus

#define COUNTOF(a) (sizeof(a)/sizeof(a[0]))

#endif // __cplusplus
==============================================

Doug Harrison [MVP]

unread,
Aug 5, 2006, 12:59:12 AM8/5/06
to
On Wed, 02 Aug 2006 23:18:10 -0700, "Mihai N." <nmihai_y...@yahoo.com>
wrote:

>I know, I have looked to see how they did it.


>My thing was this:
>
>==============================================
>#ifdef __cplusplus
>
> #include <assert.h>
>
> template <typename T> struct IsPointer {
> inline size_t operator()() { return false; };
> };
> template <typename T> struct IsPointer<T*> {
> inline size_t operator()() { return true; };
> };
>
> template <typename T>
> inline size_t COUNTOF( const T &t ) {
> assert ( !IsPointer<T>()() );
> return sizeof(t)/sizeof(t[0]);
> }
>
>#else // __cplusplus
>
> #define COUNTOF(a) (sizeof(a)/sizeof(a[0]))
>
>#endif // __cplusplus
>==============================================

Here's yet another way that exploits array references, which like the MS
approach, rejects pointers without using assert:

template<typename T, size_t n>
inline size_t ArraySizef(T (&array)[n])
{
return n;
}

One difference between this method (and yours) and the MS approach is the
latter yields a true compile-time constant that can be used (say) in an
array declaration as the array size.

Mihai N.

unread,
Aug 5, 2006, 4:35:59 AM8/5/06
to
> One difference between this method (and yours) and the MS approach is the
> latter yields a true compile-time constant that can be used (say) in an
> array declaration as the array size.
True. Now I use the MS one. This is why recently I have added:
#ifdef _countof
#define COUNTOF _countof
#else
... // the old crap
#endif

Joseph M. Newcomer

unread,
Aug 8, 2006, 12:28:59 AM8/8/06
to
What used to be called "internationalization" is now formally referred to as
"localization", because in some countries there are multiple official languages and you
have to support the one in the locale in which the app is being used, and also it
encompasses the notion that a Norwegian in Canada might be happier with a Norwegian
interface than a French or English one.
joe

On Wed, 02 Aug 2006 01:33:37 -0700, "Mihai N." <nmihai_y...@yahoo.com> wrote:

>All very good advice. I would add only two points:
>
>
>> typically one or two places where I forget
>> the appropriate conversion by sizeof(TCHAR)
>
>I had very good experience by defining
> #define DIM(s) (sizeof(s)/sizeof(s[0]))
>then I use it whenever I need something for "character count"
>You just have to be carefull not to use it on pointers :-)
>I also had a C++ construct complaining about pointers, but now
>VS2005 added _countof doing exactly that.
>
>
>> This is a bit of a cheat...Unicode is only
>> a small part of localization,
>
>I guess is a slip and you mean internationalization, not localization

Joseph M. Newcomer

unread,
Aug 8, 2006, 12:26:15 AM8/8/06
to
Indeed, the O(n^2) problem exists, except for the nGrowBy parameter, which avoids
gratuitous reallocation. In the case where someone was having problems where he expected
to allocate between 500K and 1.5M values, I did an experiment setting nGrowBy to 500,000,
and was able to create 15,000,000 elements/second using Add() in a linear fashion. So for
a large number of insertions, I agree that 1000 would be an inapproprate value. It's a
bit wasteful if, for example, you only needed 500,001 elements, but after the array is
created you can actually trim the size and discard the unused space, which is in fact the
situation that applied there. But 500K is only 128 pages, and if the pages aren't used,
eventually they will flow out of the working set and not contribute materially to program
performance.
joe

Mihai N.

unread,
Aug 8, 2006, 1:38:59 AM8/8/06
to
> What used to be called "internationalization" is now formally referred to
> as "localization", because in some countries there are multiple official
> languages and you have to support the one in the locale in which the app is
> being used, and also it encompasses the notion that a Norwegian in Canada
> might be happier with a Norwegian interface than a French or English one.

Sorry, but internationalization and localization did not change meaning in
more than 10 years, from what I know.

Internationalization (I18N) = making sure the app works in any locale,
with any kind of data, using NSL API and so on.
(This covers what you are saying, and is still called i18n)
"i18n is what developers do"


Localization = translating (localizing) the UI and (maybe) the extras
(help, samples, manuals, etc.)
"l10n is what translators do"


Localizability = the fact that an application can be localized (externalized
strings and dialogs, etc.)


Globalization (G11N) = "g11n is what companies do" (when they "go global"),
meaning market study, marketing campaing, legal research, open branches,
local partners/aliances, certifications, etc.


The MS glossary on globaldev mixes up things a bit:
(http://www.microsoft.com/globaldev/reference/glossary.mspx)

But wait, there is more:
http://blogs.msdn.com/drintl/archive/2005/06/07/426289.aspx
(and this is what something between "going overboard" and "engineers
highjacking the globalization from CEOs", which is dangerous :-)


But truth is that if you Google for "Internationalization"
you get mostly programming (engineer) stuff, Google for "Globalization"
and you get economy, global markets, etc. (CEO stuff :-)
And Google for "Localization" and you get localization companies.

Doug Harrison [MVP]

unread,
Aug 8, 2006, 1:44:41 AM8/8/06
to
On Tue, 08 Aug 2006 00:26:15 -0400, Joseph M. Newcomer
<newc...@flounder.com> wrote:

>Indeed, the O(n^2) problem exists, except for the nGrowBy parameter, which avoids
>gratuitous reallocation. In the case where someone was having problems where he expected
>to allocate between 500K and 1.5M values, I did an experiment setting nGrowBy to 500,000,
>and was able to create 15,000,000 elements/second using Add() in a linear fashion.

That's the trouble with a linear policy; you must have a *very* good idea
of your allocation needs in order to tune nGrowBy appropriately. With such
a good idea in mind, you may be able to use VirtualAlloc and reallocate in
place, eliminating all copying.

>So for a large number of insertions, I agree that 1000 would be an inapproprate value. It's a
>bit wasteful if, for example, you only needed 500,001 elements, but after the array is
>created you can actually trim the size and discard the unused space, which is in fact the
>situation that applied there. But 500K is only 128 pages, and if the pages aren't used,
>eventually they will flow out of the working set and not contribute materially to program
>performance.

Even CSharedFile's default 4096 bytes is surprisingly bad for relatively
small arrays. If there's one thing std::vector has taught me, it's that
exponential growth policy is a much better default. :) If I had tested my
program that used CSharedFile with (say) 100 KB of data, I would have
thought it fine. It's only because I tested with a much more unusual 2 MB
of data that I noticed the linear policy, and it was the difference between
1/2 sec and 10 sec. The downside of course is that exponential may waste
more memory, and a lot of people prefer 1.5 to 2 as a growth factor to
mitigate this. That said, unless you're growing a lot these arrays and
keeping them alive at the same time, memory wastage shouldn't be a huge
problem, unless you're getting into really large allocations, in which
case, you probably need to exert more control over the allocations, e.g. by
using std::vector::resize/reserve.

Kurt Grittner

unread,
Aug 8, 2006, 10:11:54 AM8/8/06
to
This is from a dev. thread on kde.org

http://lists.kde.org/?l=kde-devel&m=95436648512712&w=2

> > l10n = localization (country-specific settings such as how to represent
> > numbers, dates, money, ...)
> > i18n = internationalization (translations)
>
> Btw, does anybody knows why are they called with those names ?
> I've been wondering for a long time :-)

Basically it's as simple as: both are very long names and people are
lazy.
Localization is 10 characters, internationalization is 18, hence l10n
en
i18n.

It looks like at kde.org anyhow they consider the i18n to be the
translator's job.

-Kurt

Joseph M. Newcomer

unread,
Aug 8, 2006, 11:19:43 AM8/8/06
to
If you click on "internationalization" in the MSDN, you get
=======================
The .NET Framework provides extensive support for the development of world-ready
applications. When developing a world-ready application, it is recommended that you divide
the process up into three steps: globalization, localizability, and localization.

Globalization is the first step in the process of creating a world-ready application. In
this step, the application's executable code is written. A truly global application should
be culture-neutral and language-neutral. Therefore, you should focus attention on creating
an application that will be able to support localized user interfaces and regional data
for all users. Note that although a globalized application has this flexibility, the
globalization process itself does not involve translating the user interface. Instead, you
should strive to create an application with functionality that works equally well for
users from all cultures and regions your application supports.

Before proceeding to localization, you should perform an intermediate check to determine
the application's localizability. If an application is localizable, you have correctly
separated the application's executable code from its resources. If you correctly assess
your application's localizability, the application's source code will not need to be
modified during localization.

The final step in building a world-ready application is localization, where you customize
the application for specific cultures or regions. If the globalization and localizability
steps have been performed correctly, localization should consist primarily of translating
the user interface.
===================
So Microsoft has now adopted the term "Globalization" for the act of creating an app that
is locale-independent, not "internationalization". The word "Internationalization"
doesn't actually appear in the text of the entry indexed by "internationalization".
Without localization, there is no evidence that globalization/internationalization has
been successful. So what I have called "localization" I should have properly referred to
as "localizability".

At a recent Microsoft conference, I was told that "internationalization" has now been
replaced with "localization" because it only matters if it *can* be localized. But I
think they also really mean "localizability".

I think the I18N acronym is one of the most offensive concepts to have come out of this,
at least when used in books (several otherwise good books on this problem have been
rendered unreadable because of it); it says "I'm too lazy to type, and I never heard of a
replace command, so you have to go along with this cutesy-poo acronym so I can save a
nanocent of ink in this book". I hadn't realized the idiots who think this is cute also
extended it to "localilzation" as well. I don't even like to use these in newsgroup posts.
But books do not need to cater to semiskilled typists and people incapable of using
fundamental word processing tools.

Joseph M. Newcomer

unread,
Aug 8, 2006, 11:23:14 AM8/8/06
to
The exponential growth algorithm is frequently used for some of the APIs that merely
truncate without telling you; the trick is to allocate n elements in an array; do the API,
and if the result is size n, change the allocation to 2*n, and continue until the actual
number delivered is less than the buffer size.

I've sometimes done this by ignoring the nGrowBy and using an explicit SetSize of the
array, using the exponential method.
joe

Mihai N.

unread,
Aug 9, 2006, 4:53:17 AM8/9/06
to
...

> It looks like at kde.org anyhow they consider the i18n to be the
> translator's job.
Then, I am sorry to say, the dev at kde.org does not know what he was talking
about.
Or he was just tired and mixed things up. It can happen to anyone.

Mihai N.

unread,
Aug 9, 2006, 5:06:16 AM8/9/06
to
> If you click on "internationalization" in the MSDN, you get
>===================
...

>===================
> So Microsoft has now adopted the term "Globalization" for the act of
> creating an app that is locale-independent, not "internationalization".
> The word "Internationalization" doesn't actually appear in the text of
> the entry indexed by "internationalization".

> Without localization, there is no evidence
> that globalization/internationalization has
> been successful. So what I have called "localization" I should have
> properly referred to as "localizability".
> At a recent Microsoft conference, I was told that "internationalization"
> has now been replaced with "localization" because it only matters if it
> *can* be localized. But I think they also really mean "localizability".

I think mixing stuff like this is wrong.
Globalization means something else, for along time, and the meaning was
clear. Same for internationalization.


Let's take this affirmation with my definitions:


> Without localization, there is no evidence that
> globalization/internationalization has been successful.

globalization = I don't do, as a programmer.
that is the ceo/marketing/legal problem :-)

internationalization = my business

I can say if an application is properly internationalized because I can run
it on any system language and process any international data, respecting user
locale settings or data associated language settings.
I might have all the error messages hard-coded, but you cannot tell.

Localization/pseudo-translation proves localizability.
An application can be localizable, can even be localized, without being
properly internationalized (French UI, but US date and time formats, wrong
sorting, bad case conversions, etc.)

The debate might be if localizability is part of internationalization or not,
and if it is not, how do you call the two, together. MS decided to call that
"globalization." Their call. I will try not to follow this, because is
confusing.


But I can also imagine a situation where the programmer can do bot
globalization and localization:
When you declare a global variable, is globalization, and when you declare a
local variable, is localization :-)

0 new messages