On Wed, 13 Jan 2016 00:21:40 +0000, JiiPee wrote:
> If the program uses string (in files, user interface etc), what is best
> to be the default character set type? Unicode? How about if currently
> all users are english, does it change anything? So should I chose string
> or wstring by default in my projects? Does wstring work mostly the same
> as string; do most of the STL functions work the same way as if I used
> string? I mean if I use (only) english letters, does wstring version of
> STL functions (like find and replace, insert, delete, parse) work the
> same way as string versions?
std::string is a typedef for std::basic_string<char>.
std::wstring is a typedef for std::basic_string<wchar_t>.
Both support the same methods. std::basic_string<T> is more or less
std::vector<T> with a few more methods.
The biggest difference is that if you use std::wstring, you will
inevitably find yourself having to convert to std::string and/or char*
occasionally or even frequently. If you use std::string, it's often
possible to never need to use std::wstring or wchar_t* for anything
(although this is somewhat less viable on Windows).
OTOH, the Windows OS functions all use wide strings (wchar_t*) as their
"string" type (filenames, registry keys, etc). The versions which take
char* are just shallow wrappers around the wchar_t* functions. If you want
to be able to open any file, regardless of the current locale, you need to
use the wide-string functions (and you'll need to use the non-standard
fstream constructors/methods to open such files as fstreams).
For files, the default choice should be UTF-8 if you actually need to
treat the data as text (e.g. you need to use <cctype> functions or
convert to wide strings or whatever).
If the data is almost entirely ASCII, you need to be able to "deal with"
whatever the user throws at it, and it doesn't matter if non-ASCII
characters aren't handled entirely correctly, using ISO-8859-* has the
advantage that decoding never fails (any sequence of bytes is valid). So
if the program reads a file that's actually some other "extended ASCII"
encoding, you get a few mojibake characters where UTF-8 would give you a
decoding error.