I've known C/C++ for years, but only ever used ascii strings. I have a client who wants to know how gcc handles unicode. I've found the functions utf8_mbtowc, utf8_mbstowcs, utf8_wctomb and utf8_wcstombs, but I'm wondering if there are any other libraries or functions which can do things like handle different kinds of encodings?
> I've known C/C++ for years, but only ever used ascii strings. I have a > client who wants to know how gcc handles unicode. I've found the functions > utf8_mbtowc, utf8_mbstowcs, utf8_wctomb and utf8_wcstombs, but I'm > wondering if there are any other libraries or functions which can do > things like handle different kinds of encodings?
>> I've known C/C++ for years, but only ever used ascii strings. I have a >> client who wants to know how gcc handles unicode. I've found the >> functions utf8_mbtowc, utf8_mbstowcs, utf8_wctomb and utf8_wcstombs, but >> I'm wondering if there are any other libraries or functions which can do >> things like handle different kinds of encodings?
A proper std:: way is using wchar_t, wstring types - can handle Unicode strings. (fstream -> wfstream, ostream -> wostream, istream -> wistream, etc) To display characters properly (in a window, console) or to save them in a file you have to use locales (regional settings) that are available in your computer.
E.g. to find a name of the available locale: ... #include <locale> ... .. .
locale Ger("German_Germany.1252"); wcout.imbue(Ger); //attach locale to stream wstring ws(L"A german text..."); wcout << ws << endl; //to get a current locale of a stream use: CurrentLocale = wcout.getloc();
It is good to use a text editor that can display/manage these locales.
>>A proper std:: way is using wchar_t, wstring types - can handle >>Unicode strings. >>(fstream -> wfstream, ostream -> wostream, istream -> wistream, etc)
> By 'Unicode' you mean UTF-16, right?
Not necessarily. While Windows equates UNICODE with UTF-16, many of the UNIX implemeations use a 32 bit wchar_t and UNICODE>
Unfortunately, while the various W-versions of the functions can support wide char (presumably some UNICODE version) strings. Most of the major C++ interfaces don't support it. The assumption of the standardizer is there some mutibyte-char type that you can use for the system interfaces. It's really stupid and causes a pain in the butt on systems that really don't have that mapping (like Windows).
Unicode is a very big character set where each character has its own index. There is thousands of characters in this set. Unicode means standard it is not character encoding. There also exists standard with name ISO 10646. Theoretically ISO 10646 can handle about billions of characters. The first 65 536 characters of ISO 10646 are identical with Unicode standard. Advantage of Unicode or ISO 10646 is that these formats cover almost every character you would ever need.
Non-Wide Characters - reprezented with CHAR: Many charsets (ISO 8859-1, ISO 8859-2, ...) include 256 characters - it means that it is not possible to cover every language in such small number of characters. But many applications are not able to manage Unicode at this time so use some of encodings/character representations available in your OS:
standardized charsets ISO 8859... or windows-125X ... or Mac x-mac-ce ...etc or UTF-8.
UTF? yes but it is reprezented with WIDE CHAR. UTF-8 is a way how to write a character to file: ASCII characters are represented with one byte and other characters are represented with more than one byte. example: 11000011-10101101
UTF-16: All characters are represented with two bytes. Some of those characters have a special meaning. example: 11101101-00000000
To represent all languages as much as possible use wchar_t (one character), wstring (string). These types are __usually__ able to cover all characters in Unicode standard with 4 bytes but it can be also 2 bytes. w means wide characters. To use them you have to use streams for wide characters. Please see std::locale, std::locale::facet. When using w-objects you have to be sure about your current encoding/charset.
Usually we express text in programs with CHARs (We can be happy enough with chars) but sometime we want to use a different language, very different language that is not covered in the available encoding (with 256 characters, windows-125X, ISO88...). We can handle text in program like Unicode set (and we can be happy as well) but we (in C++) usually write to file using available encoding (non-Unicode)in our OS because it is not possible when using std::. One way is http://www.codeproject.com/vcpp/stl/upgradingstlappstounicode.asp?pri... another way is using C function fwrite: