Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

UNICODE and network byte order

0 views
Skip to first unread message

Wayne Steinhour

unread,
Jul 1, 2003, 2:34:16 PM7/1/03
to
When sending UNICODE strings across a TCP/IP connection
(from an intel machine), should the byte order of each
character be reversed by using the htons function as it
would be for numeric data?

Thanks in advance.

Wayne

abc5594def

unread,
Jul 1, 2003, 4:34:01 PM7/1/03
to
you should mark your data with FF FE from your intel client so that it is
interpreted as unicode little endian. otherwise you can use utf-8 encoding
while sending.


Phil Frisbie, Jr.

unread,
Jul 1, 2003, 4:49:00 PM7/1/03
to

Most developers I have talked to are using UTF-8.

--
Phil Frisbie, Jr.
Hawk Software
http://www.hawksoft.com

Eugene Gershnik [SDK MVP]

unread,
Jul 2, 2003, 5:08:42 AM7/2/03
to
Why do you assume that Unicode is even stored using the same number of bytes
on a different machine? It is 4 bytes on Linux for instance :-) Unicode is
just an abstract representation that says for example "code point 64
corresponds to latin capital letter A". All applications and OSes are free
to represent and store Unicode code points as they see fit. Windows for
example uses UCS-2 encoding and that's what all these XxxXxxW functions are
woring with. If you want two different platforms to exchange Unicode strings
they need to agree on the encoding on the protocol level. The de-facto
standard in most network protocols today is UTF-8 so this is probably the
easiest one to use.

Eugene

Eugene Gershnik [SDK MVP]

unread,
Jul 2, 2003, 5:12:15 AM7/2/03
to
Phil Frisbie, Jr. wrote:
> abc5594def wrote:
>> you should mark your data with FF FE from your intel client so
>> that it is interpreted as unicode little endian. otherwise you can
>> use utf-8 encoding while sending.
>
> Most developers I have talked to are using UTF-8.

That's probably because most developers are in US and there UTF-8 is just
good ol' ASCII and so is quite efficient for communications. For the far
east countries UTF-8 is less efficient than other alternatives but who cares
<sigh>

Eugene


Wayne Steinhour

unread,
Jul 2, 2003, 8:54:42 AM7/2/03
to
Eugene,

You have convinced me that this is more complicated than
I thought. I did not know that anybody uses 4 bytes for
unicode. Can you direct me to anyplace with good
documentation on how to share unicode strings across a
network?

Thanks for your help.

Wayne

>.
>

Eugene Gershnik [SDK MVP]

unread,
Jul 2, 2003, 12:32:15 PM7/2/03
to
Well this is a very good question because I personally do not know a single
good reference :-( You can try to follow links from www.unicode.org. Also
there is a book called "CJKV Information Processing" which contains lots of
usefull information (though its main topic is a little different). If you
find something else please post it here.

Eugene

Phil Frisbie, Jr.

unread,
Jul 2, 2003, 1:01:53 PM7/2/03
to

In this context we are talking about general network transmission. It is up to
the developer to decide if it is better to use UCS-2 or UCS-4 instead of UTF-8.
If using UCS-2 and UCS-4 the byte order can be negotiated at connection time or
hard coded into the application.

> Eugene

Phil Frisbie, Jr.

unread,
Jul 2, 2003, 1:03:56 PM7/2/03
to
Wayne Steinhour wrote:

> Eugene,
>
> You have convinced me that this is more complicated than
> I thought. I did not know that anybody uses 4 bytes for
> unicode. Can you direct me to anyplace with good
> documentation on how to share unicode strings across a
> network?

I like this reference: http://www.cl.cam.ac.uk/~mgk25/unicode.html

> Thanks for your help.
>
> Wayne

--

Eugene Gershnik [MVP]

unread,
Jul 2, 2003, 2:32:09 PM7/2/03
to

"Phil Frisbie, Jr." <ph...@hawksoft.com> wrote in message
news:eyc9luLQ...@tk2msftngp13.phx.gbl...

Both UCS encodings are ill-suited for network transmissions because of the
8-bit bytes and byte order issues. UTF-16 could be a better alternative to
UTF-8 though.
Some interesting reading:
http://www.faqs.org/rfcs/rfc2277.html
http://www.faqs.org/rfcs/rfc2781.html

Eugene


0 new messages