On Monday 06 July 2015 14:15:38 David Krauss wrote:
> > On 2015–07–06, at 1:32 PM, Thiago Macieira <
thi...@macieira.org> wrote:
> >
> > Unless the proposal for char8_t comes about, I'd dismiss this as a
> > possibility. 8-bit characters have arbitrary encoding, regardless of
> > whether they were "" or u8”".
>
> What do you mean, arbitrary? Unprefixed strings use the execution character
> set. u8 strings use UTF-8. The same goes for character literals, now that
> the u8 prefix exists there too, although multibyte characters must be
> treated as substrings either way.
The problem is that the information is lost as the UTF-8 string is stored in a
char[] array. Neither the template version nor any char-based function, for
that matter, can now whether you meant UTF-8 or the execution charset.
The only solution in my view for this problem is to force the execution
charset to UTF-8. On most OS except for Windows, that's already the case. The
trouble is just convincing Windows and Microsoft compilers to be like that.
> Is there a problem with letting unsigned char stand-in for char8_t? I mean,
> changing types has its own problems either way, but I don’t see the
> advantage of an additional type.
That's the proposal I was talking about.
> > If you want to know the encoding, use char16_t and char32_t. Don't add the
> > overloads for 8-bit and for wchar_t.
>
> UTF-8 is popular enough to warrant a feature test like
> __cpp_execution_charset_utf8, if nothing else exists.
Isn't the execution charset, by definition, a runtime feature? I don't think a
macro would serve here.
> > There's no good solution for that. Moreover, it's also pretty much
> > orthogonal to the current issue. There's no way to share my UTF-8 files
> > with colleagues using Visual Studio. I don't see willingness to solve
> > that problem which is at a much lower level.
> >
> > (BOMs are not a good idea)
>
> Microsoft doesn’t represent the whole industry. I’m not saying the standard
> needs to be perfect, but compile-time string processing should support text
> consistently with the rest of the language.
That's the problem! The rest of the language does not have almost any features
to do that. This lack of features elsewhere should not hold back a feature
that is otherwise useful.
Don't get me wrong. I do think we should fix the woeful lack of UTF-8 and
Unicode support in the language. Compared to QString and QTextCodec, support
in the Standard Library is laughable.
> Perhaps the perceived orthogonality comes from the different use-cases of
> compile-time parsing. Source-code-Ilke strings can be used for
> metaprogramming, and text-like strings can be used more conventionally, for
> the program’s output. Anywhere that a string is really a string (not a
> number or a function), it’s a pretty good idea to shoot for expressive
> parity with runtime strings.