C and UNICODE and sizeof(char)

forcer

unread,

Oct 16, 1997, 3:00:00 AM10/16/97

to

>... and sizeof(char) should be 2 ...
Oops. Sorry, my mistake. thanks to those who made me aware of that.
since by definiton sizeof(char)==1, UNICODE would only fit into a char
if the implementation uses 16bit bytes.
But due to the fact of 8bit addressable machines nowadays, it's not THAT
a Good Thing to do so. (Maybe in a later time ;)
char for ASCII.
And most probably wchar_t for UNICODE.
-forcer

--
"The amount of success of Microsoft is directly related to the ignorance of
J. Random Luser"

/* email: for...@nospam.cyberspace.org */
/* www: http://www.forcer.base.org/ */
/* IRC: forcer@{IRCnet,HoloNet} @#StarWars #LinuxGER #ManOwaR #Cantina */
/* pgppub 1024/56554141 B3 2A 88 53 DB DA 12 20 FD B0 D4 79 4E 5A AB 32 */
/* finger for...@nospam.cyberspace.org for public key */

Scott Mayo

unread,

Oct 16, 1997, 3:00:00 AM10/16/97

to

In comp.std.c, for...@mynock.org (forcer) wrote:

<>... and sizeof(char) should be 2 ...
<Oops. Sorry, my mistake. thanks to those who made me aware of that.
<since by definiton sizeof(char)==1, UNICODE would only fit into a char
<if the implementation uses 16bit bytes.
<But due to the fact of 8bit addressable machines nowadays, it's not THAT
<a Good Thing to do so. (Maybe in a later time ;)
<char for ASCII.
<And most probably wchar_t for UNICODE.

wchar_t can't be used portably for unicode. The functions that work on
wchar_t are already given implementation and locale defined properties.
Some implementations used it for Unicode and some didn't, so the result
is non-portable AFAIK.

Douglas A. Gwyn

unread,

Oct 18, 1997, 3:00:00 AM10/18/97

to

The problem is that wchar_t might be defined as an 8-bit type,
which is too small to hold Unicode encodings.

It would be good practice for an implementation that *does* have
nontrivial I18n support to use Unicode (more generally, ISO 10646)
encoding for its wchar_t, but as has been pointed out, there is no
requirement that this be done.

Alas, there is no standard way to encode any given extended character
in C; the encoding is implementation (and possibly locale) dependent.
(A MISTAKE if you ask me.) C9x may support a form of escape sequence
for writing the intended character in a portable way, although there
still would not be a guarantee that the actual value would be its
Unicode code.

Kaz

unread,

Oct 18, 1997, 3:00:00 AM10/18/97

to

In article <233a4.sm...@ziplink.net>,

Scott Mayo <sm...@ziplink.net> wrote:
>In comp.std.c, for...@mynock.org (forcer) wrote:
>
><>... and sizeof(char) should be 2 ...
><Oops. Sorry, my mistake. thanks to those who made me aware of that.
><since by definiton sizeof(char)==1, UNICODE would only fit into a char
><if the implementation uses 16bit bytes.
><But due to the fact of 8bit addressable machines nowadays, it's not THAT
><a Good Thing to do so. (Maybe in a later time ;)
><char for ASCII.
><And most probably wchar_t for UNICODE.
>
>wchar_t can't be used portably for unicode. The functions that work on

And neither can char be portably used for ASCII.

>wchar_t are already given implementation and locale defined properties.

So why not define one standard locale which supports unicode, and which
dictates a well specified multi-byte encoding for unicode characters?

This way, implementations could share the multi-byte data provided they
select the appropriate local for the multi-byte functions. And they could
share unicode data as well, provided that they take care of issues of external
representation.
--