I think it boils down to this, though. Pretty much the only thing
the standard lib provides is a buffer with a zero at the end as its
only non-implementation-defined property, and I think that's not even
remotely sufficient to seriously claim "string support". Never mind
that most functions taking "string" arguments go even lower level
with a raw char*...
[...]
> The last point to make is that the poster's exemplar of QString does
> not actually seem to perform the function that he thinks it does.
> You still have to offer QString its input in a codeset it recognises
> (UTF-8 or UTF-16), for obvious reasons; for anything else the user
> has to make her own conversions using fromLatin1(), fromLocal8bit()
> or use some external conversion function, and if you don't it will
> fail.
Of course. I never claimed QString will magically guess the encoding.
However, sorting out the encoding during construction is a much
cleaner, less error-prone way than keeping the encoding implicitly
on the side as in std::string and working on a raw byte buffer (where
strictly speaking the standard doesn't even say it's a byte).
Components making errors during construction (e.g. another library
which, like so many, just dumps local encoding into everything) are
typically revealed much sooner that way, within those components
near the error, rather than completely elsewhere where they'll make
a _real_ mess.
Conversions usually don't require external tools BTW, the methods
in the QString-interface are just convenience for the most popular
encodings, a much wider range is available via QTextCodec.
> And you still have to ensure that if you are comparing strings
> they are correctly normalized (to use or not use combining
> characters). And QString carries out comparisons of two individual
> characters using QChar, which only covers characters in the basic
> multilingual plane. And its character access functions also only
> return QChar and not a 32 bit type capable of holding a unicode code
> point. Indeed, as far as I can tell (but I stand ready to be
> corrected) it appears that all individual character access functions
> in QString only correctly handle the BMP, including its iterators and
> the way it indexes for its other methods such as chop(), indexOf()
> and size(). It even appears to allow a QString to be modified by
> indexed 16 bit code units. If so, that is hopeless.
That depends on what you use it for. When using it to communicate
with the OS, e.g. file system, console, GUI etc., I have yet to see
this become an issue (which includes Asian countries), and that's one
of the biggest advantages QString has in my eyes: it's not the class
itself, it's the entire environment it's integrated in, and integrated
well. I've mentioned this several times, but it's obviously a point
everybody painstakingly avoids addressing for fear of admitting that
pretty much every function in the standard library that goes beyond
manipulating a mere byte buffer (i.e. pretty much everything interfacing
with the system and the environment, or in other words everything you
can't just as well implement yourself in portable user space but only by
resorting to platform-specific extensions) can't handle UTF-8 (except
by accident). And while QString may not be perfect, it's several orders
of magnitude better in these areas than anything the standard provides,
which is basically undefined behaviour for anything not restricted to
7-bits of some unqualified encoding (good luck trying to feed a
UTF-8 encoded filename into fopen() on e.g. Windows...)
> I used to use frequently a string class which was designed for UTF-8.
> In the end, I stopped using it because I found it had little actual
> advantage over std::string. You still had to validate what went into
> this string class to ensure that it really was UTF-8, and convert if
> necessary. The class provided an operator[]() method which returned
> a whole unicode code point which was nice (and which QString appears
> not to), but in the end I made my own iterator class for std::string
> which iterates over the string by whole code points (and dereferences
> to a 32 bit type), and in practice I found that was just as good.
So in conclusion:
1) you initially used a string class not in the standard library
2) you then extended the standard library for string handling
3) you need external libraries for string transcoding so you can
even get started using the "std::string has UTF-8 encoding" convention
4) you need external libraries to actually do anything with these
strings that goes beyond the simplest forms of buffer manipulation
5) you need other external libraries if you want these "strings"
to actually interface with something like the file system. Which
to me is really beyond insanity.
Yet people insist the standard has string support and as far as I'm
concerned that simply does not compute. We can agree to disagree.