Does the standard C string library support utf-8 charset
basically operations like string length, string compare, string copy,
string search?
Can I use the C conversion functions like wcstombs and
mbstowcs to convert string between wide and utf-8 format?
I requires the information for an application that I am
developing for Windows CE using Visual Studio 2005.
Thanks & Regards,
AK
> Does the standard C string library support utf-8 charset
>basically operations like string length, string compare, string copy,
>string search?
Nothing is needed to support UTF-8 for these operations, since they
do not interpret the bytes they process. The only significant value
is 0, and that does not occur in UTF-8 characters.
Of course strlen() will give the length in bytes, not unicode characters.
-- Richard
--
:wq
Isn't that the problem, that some operations require some interpretation of
the data?
UTF8 I think is a series of 8-bit bytes, with 0 to 127 being normal ASCII,
and 128 to 255 being some binary value, combined with following bytes and
decoded to form a character.
Strlen could then give the wrong result, and probably strcmp, toupper, and
so on.
So there is no proper support for UTF8 other than operations requiring no
interpretation (strcpy, strcmp for equality, etc).
--
Bart
Correct. It is what the C Standard refers to as a multibyte character set
-- the representation of a single character can be one, two, three, or
four bytes long.
> Strlen could then give the wrong result, and probably strcmp, toupper, and
> so on.
Correct. That is why the C Standard has support for both multibyte
characters and what it calls wide characters: a larger (typically 16 or
32 bit) container only one of which is required to hold any character.
There are lots of library functions that convert between multibyte and
wide characters and a fairly completely library of wide character string
functions that parallel the traditional single-byte string functions.
> So there is no proper support for UTF8 other than operations requiring no
> interpretation (strcpy, strcmp for equality, etc).
Support for multibyte and wide characters is required, but there's no
requirement that UTF-8 be a supported multibyte character set. Of
course, most implementations do support it.
-Larry Jones
You're just trying to get RID of me, aren't you? -- Calvin