--
Best regards.
TsiLang Components Suite - Best Globalization Tool 2004
http://www.tsilang.com
"Emmanuel" <emma...@erphk.com> wrote in message
news:45a2...@newsgroups.borland.com...
UTF8String contains UTF-8 encoded Unicode string. Each character is 1, 2, 3,
4 bytes long.
AnsiString contains code page encoded string. Depending on the code page
each character is either 1, or 1 or 2 characters long. Asian code pages use
multi byte encoding code pages where each character is either 1 byte or 2
bytes. All Windows code pages has identical first 128 characters. If the
most significant byte of an AnsiString is 1 then the byte is extended byte
that is either single character (such as ä or ö in code page 1252) or needs
the following byte to describe the character.
Also in AnsiString you will see garbage if the system code page does not
match the code page used in the string. For example using Japanese (code
page 932) string on English Windows.
Both UTF8String and AnsiString must be read byte by byte to get the real
meaning.
Best regards,
Jaakko Salmenius
www.sisulizer.com
> What do you mean encoded? As far as I see they are both encoded.
>
IMHO!
Utf8String is encoded but AnsiString is "as is" string. AnsiString is single
byte string and will be displayed properly only while using proper code
page. MBCS strings are multi-byte and they apply to all you wrote below. But
AnsiString is "plain" single-byte string and this is why they will look
differently under different code pages because there is no codepage info in
single-byte data just char itself (for some Asian codepages there will be
needed 2 bytes to decode one single char but there still no any code page
info in AnsiString). I think you mixed in your post byte and bit. Most
significant BIT describes the character "range".
My point is that whenever there is an AnsiString in Delphi you need to know
what is the code page that the string uses. Otherwise you might end up
interpreting the string incorrectly.
You were right about byte and bit. I ment bit but wrote byte.
> My point is that whenever there is an AnsiString in Delphi you need to
> know what is the code page that the string uses. Otherwise you might end
> up interpreting the string incorrectly.
Yes, this is absolutely right.
>
> You were right about byte and bit. I ment bit but wrote byte.
>
OK, no problem, it was just confusing a bit. :)