N3599 for C++17

1,259 views
Skip to first unread message

b17...@gmail.com

unread,
Jul 5, 2015, 4:37:56 AM7/5/15
to std-pr...@isocpp.org
Hi,
What ever happened to N3599? As I understand it, it was proposed for inclusion in C++14. Does anyone know why it didn't get included?

Templated string literal operators was actually the first literal operator I tried to implement when C++11 came out. I was disappointed that it didn't work. It seems like an oversight in the standard. I just discovered N3599 today after looking at the code for Super Template Tetris lol. N3599 has already been implemented for clang and gcc. Is there any momentum behind this proposal for C++17?

-Kal

Jens Maurer

unread,
Jul 5, 2015, 2:54:47 PM7/5/15
to std-pr...@isocpp.org
On 07/05/2015 10:37 AM, b17...@gmail.com wrote:
> Hi,
> What ever happened to N3599? As I understand it, it was proposed for inclusion in C++14. Does anyone know why it didn't get included?

See here:

http://cplusplus.github.io/EWG/ewg-active.html#66

I'd guess you should talk to Smith and Vandevoorde to see whether
they've come up with a revised version.

Jens

b17...@gmail.com

unread,
Jul 5, 2015, 3:30:23 PM7/5/15
to std-pr...@isocpp.org
Thanks for the link. Does anyone know the specifics of "Revise with additional machinery for compile time string processing" or why people were otherwise against the original proposal?

Louis Dionne

unread,
Jul 6, 2015, 1:06:32 AM7/6/15
to std-pr...@isocpp.org, b17...@gmail.com
We absolutely need templated string literals, and IMHO we don't need additional machinery for compile-time
string processing. I would personally take the proposal as-is. We will then implement additional machinery as
needed on a per-library basis.

Louis

David Krauss

unread,
Jul 6, 2015, 1:21:44 AM7/6/15
to std-pr...@isocpp.org

On 2015–07–06, at 1:06 PM, Louis Dionne <ldio...@gmail.com> wrote:

We absolutely need templated string literals, and IMHO we don't need additional machinery for compile-time
string processing. I would personally take the proposal as-is. We will then implement additional machinery as
needed on a per-library basis.

Who is “we”? Having everyone reinvent the wheel is asking for poor interoperability, poor performance, poor reliability…

The core language issue I wonder about, for compile-time string processing, is text encoding support. Should there be feature-test macros so metaprograms can tell what character set is in use? Something like std::char_traits (or new members in char_traits itself?) How to differentiate u8"" literals from unprefixed ones?

Libraries that don’t care about portability should be satisfied with the status quo: rough compatibility between popular implementations that volunteer to be cutting-edge. The standard shouldn’t compromise on quality just to “get something out the door.”

Thiago Macieira

unread,
Jul 6, 2015, 1:32:22 AM7/6/15
to std-pr...@isocpp.org
On Monday 06 July 2015 13:21:24 David Krauss wrote:
> The core language issue I wonder about, for compile-time string processing,
> is text encoding support. Should there be feature-test macros so
> metaprograms can tell what character set is in use? Something like
> std::char_traits (or new members in char_traits itself?) How to
> differentiate u8"" literals from unprefixed ones?

Unless the proposal for char8_t comes about, I'd dismiss this as a
possibility. 8-bit characters have arbitrary encoding, regardless of whether
they were "" or u8"".

If you want to know the encoding, use char16_t and char32_t. Don't add the
overloads for 8-bit and for wchar_t.

> The standard shouldn’t compromise on quality just to “get something out the
> door.”

There's no good solution for that. Moreover, it's also pretty much orthogonal
to the current issue. There's no way to share my UTF-8 files with colleagues
using Visual Studio. I don't see willingness to solve that problem which is at
a much lower level.

(BOMs are not a good idea)

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358

Louis Dionne

unread,
Jul 6, 2015, 1:33:49 AM7/6/15
to std-pr...@isocpp.org
There's at least me, Roland Bock (with sqlpp11), and Matt Bierner it seems. But I also talked with many
more people at conferences like C++Now who would have needed compile-time strings badly.

Let me clarify. I'm not opposed to additional machinery, and I would even like it if we had more tools
to manipulate compile-time strings. However, 90% of the use cases are super simple, and what we
need is nothing more than the Clang/GCC provided templated string literal. Considering this is a
low hanging fruit and that more can be added later, it would be sad to delay it 6 more years just
because this 10% of use cases is not covered yet.

Louis

David Krauss

unread,
Jul 6, 2015, 2:15:59 AM7/6/15
to std-pr...@isocpp.org
On 2015–07–06, at 1:32 PM, Thiago Macieira <thi...@macieira.org> wrote:

Unless the proposal for char8_t comes about, I'd dismiss this as a
possibility. 8-bit characters have arbitrary encoding, regardless of whether
they were "" or u8”".

What do you mean, arbitrary? Unprefixed strings use the execution character set. u8 strings use UTF-8. The same goes for character literals, now that the u8 prefix exists there too, although multibyte characters must be treated as substrings either way.

Is there a problem with letting unsigned char stand-in for char8_t? I mean, changing types has its own problems either way, but I don’t see the advantage of an additional type.

If you want to know the encoding, use char16_t and char32_t. Don't add the
overloads for 8-bit and for wchar_t.

UTF-8 is popular enough to warrant a feature test like __cpp_execution_charset_utf8, if nothing else exists.

There's no good solution for that. Moreover, it's also pretty much orthogonal
to the current issue. There's no way to share my UTF-8 files with colleagues
using Visual Studio. I don't see willingness to solve that problem which is at
a much lower level.

(BOMs are not a good idea)

Microsoft doesn’t represent the whole industry. I’m not saying the standard needs to be perfect, but compile-time string processing should support text consistently with the rest of the language.

Perhaps the perceived orthogonality comes from the different use-cases of compile-time parsing. Source-code-Ilke strings can be used for metaprogramming, and text-like strings can be used more conventionally, for the program’s output. Anywhere that a string is really a string (not a number or a function), it’s a pretty good idea to shoot for expressive parity with runtime strings.

David Krauss

unread,
Jul 6, 2015, 2:41:25 AM7/6/15
to std-pr...@isocpp.org
EWG in Urbana voted between N4121, which proposed a class basic_string_literal< charT, traits, N > and N4236, which proposed to further develop the Clang/GCC UDL template extension. The decision was strongly in favor of N4121, which means the extension is a dead end. And, there’s a slot there for the traits I mentioned :) .

If you’re interested in future-proof compile-time string processing, the better methodology is to forgo the literal operator template extension and unpack the characters from an ordinary string literal, using std::index_sequence, as outlined in N4121.

Bo Persson

unread,
Jul 6, 2015, 3:24:22 AM7/6/15
to std-pr...@isocpp.org
On 2015-07-06 08:15, David Krauss wrote:
>
>> On 2015–07–06, at 1:32 PM, Thiago Macieira <thi...@macieira.org
>> <mailto:thi...@macieira.org>> wrote:
>>
>> Unless the proposal for char8_t comes about, I'd dismiss this as a
>> possibility. 8-bit characters have arbitrary encoding, regardless of
>> whether
>> they were "" or u8”".
>
> What do you mean, arbitrary? Unprefixed strings use the execution
> character set. u8 strings use UTF-8. The same goes for character
> literals, now that the u8 prefix exists there too, although multibyte
> characters must be treated as substrings either way.
>
> Is there a problem with letting unsigned char stand-in for char8_t? I
> mean, changing types has its own problems either way, but I don’t see
> the advantage of an additional type.
>

unsigned char is already likely to be the base of uint8_t. And used as a
generic 'byte' type in low-lwvwl code. Giving it yet another use can
give us "interesting" overload problems.

Not that I favor a 4th 8-bit character type either...


Bo Persson


Thiago Macieira

unread,
Jul 6, 2015, 3:27:00 AM7/6/15
to std-pr...@isocpp.org
On Monday 06 July 2015 14:15:38 David Krauss wrote:
> > On 2015–07–06, at 1:32 PM, Thiago Macieira <thi...@macieira.org> wrote:
> >
> > Unless the proposal for char8_t comes about, I'd dismiss this as a
> > possibility. 8-bit characters have arbitrary encoding, regardless of
> > whether they were "" or u8”".
>
> What do you mean, arbitrary? Unprefixed strings use the execution character
> set. u8 strings use UTF-8. The same goes for character literals, now that
> the u8 prefix exists there too, although multibyte characters must be
> treated as substrings either way.

The problem is that the information is lost as the UTF-8 string is stored in a
char[] array. Neither the template version nor any char-based function, for
that matter, can now whether you meant UTF-8 or the execution charset.

The only solution in my view for this problem is to force the execution
charset to UTF-8. On most OS except for Windows, that's already the case. The
trouble is just convincing Windows and Microsoft compilers to be like that.

> Is there a problem with letting unsigned char stand-in for char8_t? I mean,
> changing types has its own problems either way, but I don’t see the
> advantage of an additional type.

That's the proposal I was talking about.

> > If you want to know the encoding, use char16_t and char32_t. Don't add the
> > overloads for 8-bit and for wchar_t.
>
> UTF-8 is popular enough to warrant a feature test like
> __cpp_execution_charset_utf8, if nothing else exists.

Isn't the execution charset, by definition, a runtime feature? I don't think a
macro would serve here.

> > There's no good solution for that. Moreover, it's also pretty much
> > orthogonal to the current issue. There's no way to share my UTF-8 files
> > with colleagues using Visual Studio. I don't see willingness to solve
> > that problem which is at a much lower level.
> >
> > (BOMs are not a good idea)
>
> Microsoft doesn’t represent the whole industry. I’m not saying the standard
> needs to be perfect, but compile-time string processing should support text
> consistently with the rest of the language.

That's the problem! The rest of the language does not have almost any features
to do that. This lack of features elsewhere should not hold back a feature
that is otherwise useful.

Don't get me wrong. I do think we should fix the woeful lack of UTF-8 and
Unicode support in the language. Compared to QString and QTextCodec, support
in the Standard Library is laughable.

> Perhaps the perceived orthogonality comes from the different use-cases of
> compile-time parsing. Source-code-Ilke strings can be used for
> metaprogramming, and text-like strings can be used more conventionally, for
> the program’s output. Anywhere that a string is really a string (not a
> number or a function), it’s a pretty good idea to shoot for expressive
> parity with runtime strings.

Bo Persson

unread,
Jul 6, 2015, 3:33:09 AM7/6/15
to std-pr...@isocpp.org
On 2015-07-06 09:26, Thiago Macieira wrote:
> On Monday 06 July 2015 14:15:38 David Krauss wrote:
>>> On 2015–07–06, at 1:32 PM, Thiago Macieira <thi...@macieira.org> wrote:
>>>
>>> Unless the proposal for char8_t comes about, I'd dismiss this as a
>>> possibility. 8-bit characters have arbitrary encoding, regardless of
>>> whether they were "" or u8”".
>>
>> What do you mean, arbitrary? Unprefixed strings use the execution character
>> set. u8 strings use UTF-8. The same goes for character literals, now that
>> the u8 prefix exists there too, although multibyte characters must be
>> treated as substrings either way.
>
> The problem is that the information is lost as the UTF-8 string is stored in a
> char[] array. Neither the template version nor any char-based function, for
> that matter, can now whether you meant UTF-8 or the execution charset.
>
> The only solution in my view for this problem is to force the execution
> charset to UTF-8. On most OS except for Windows, that's already the case. The
> trouble is just convincing Windows and Microsoft compilers to be like that.
>

You have another player using EBCDIC. Not likely to give up on that either.

Not everyting is a desktop!


Bo Persson



b17...@gmail.com

unread,
Jul 6, 2015, 4:30:11 AM7/6/15
to std-pr...@isocpp.org
N4121 is interesting. But it seems to me that N4121 and N3599 are orthogonal proposals. I mean templated string UDLs are useful on their own with or without basic_string_literal. I don't think there is any need to combine them as in N4236. Seems like it can be kept fairly simple, along the lines of what N3599 proposes.

David Krauss

unread,
Jul 6, 2015, 4:38:03 AM7/6/15
to std-pr...@isocpp.org
On 2015–07–06, at 3:26 PM, Thiago Macieira <thi...@macieira.org> wrote:

The problem is that the information is lost as the UTF-8 string is stored in a
char[] array. Neither the template version nor any char-based function, for 
that matter, can now whether you meant UTF-8 or the execution charset.

Fortunately, the std::string_literal proposal favored by EWG provides for traits which could record the encoding. It should be on track for a Fundamentals TS (e.g., Fundamentals v2), along with string_view. Maybe char_traits itself deserves a revision, too.

Isn't the execution charset, by definition, a runtime feature? I don't think a
macro would serve here.

The execution character set is the one used to encode string literals.

Microsoft doesn’t represent the whole industry. I’m not saying the standard
needs to be perfect, but compile-time string processing should support text
consistently with the rest of the language.

That's the problem! The rest of the language does not have almost any features
to do that. This lack of features elsewhere should not hold back a feature
that is otherwise useful.

The standard allows the platform to define the text encoding of string literals. I’m only saying that compile-time string processing should portably cooperate with that.

You seem to be focusing on runtime interoperability, i.e. foreign text encodings. I’m only talking about dealing with native strings at compile time.

Don't get me wrong. I do think we should fix the woeful lack of UTF-8 and
Unicode support in the language. Compared to QString and QTextCodec, support
in the Standard Library is laughable.

Why not make proposals?

Roland Bock

unread,
Jul 6, 2015, 4:39:10 AM7/6/15
to std-pr...@isocpp.org
Since I saw my name coming up and I started a discussion about namespace-qualified use of UDLs:

I need compile-time strings.
And they should be painless and maintainer-friendly to use.


Specifically, I need

struct A
{
   static constexpr auto x = foo("hello");
};

or

struct B
{
   using x_t = decltype(bar("hello"));
};

in a header. x or x_t must be compile-time comparable.

a) UDLs are not a good solution, unless there were a way to namespace qualify them: I cannot use "using namespace XYC" to pull in a literal in a header.
b) I do not want to have to think about whether "hello" is 5 characters long. When I change it to "good morning", it should still compile, without me having to count characters. Thus, if I understand it correctly, N4121 does not seem to be a good option, too.
c) foo and bar should not be macros, of course.

I would be all for N4121 if we could get rid of the size parameter.

struct C
{
    static constexpr auto x = std::string_literal("hello"); // without <5>
};


This would be nice.


BTW, could I then do something like the following?

template<typename T>
struct my_name;

my_name<std::string_literal("hello")>


Best,

Roland
--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

David Krauss

unread,
Jul 6, 2015, 4:52:07 AM7/6/15
to std-pr...@isocpp.org
On 2015–07–06, at 4:30 PM, b17...@gmail.com wrote:

N4121 is interesting. But it seems to me that N4121 and N3599 are orthogonal proposals. I mean templated string UDLs are useful on their own with or without basic_string_literal. I don't think there is any need to combine them as in N4236. Seems like it can be kept fairly simple, along the lines of what N3599 proposes.

You can also get a char non-type parameter pack from a basic_string_literal, provided that it can bind to a template non-type parameter of reference type. so the feature would need justification for being a core language feature instead of a library. And, one main reason N4121 was favored was that it’s faster (more efficient for the compiler) to process char array values instead of parameter packs. It’s also easier for the programmer, since arrays are manipulated by constexpr functions but packs require TMP.

Parameter packs provide the benefit of a distinct C++ type per string, but at a significant cost in compiler efficiency. A better way toward string-to-type mapping would be to allow temporary objects of literal type to bind to template const& parameters. There’s a knot to be untied there, relating to linkage and object uniqueness, but that problem needs solving anyway. It’s the better way forward.

David Krauss

unread,
Jul 6, 2015, 4:57:14 AM7/6/15
to std-pr...@isocpp.org

On 2015–07–06, at 4:39 PM, Roland Bock <rb...@eudoxos.de> wrote:

I would be all for N4121 if we could get rid of the size parameter.

N4121 needs that because it’s a container class. Getting rid of the size parameter requires some other constexpr way of obtaining storage. That pretty much boils down to predynamic storage, as far as I can tell, which is still generally regarded pie in the sky.

Hopefully string_view will work over string_literal at compile time, to roughly the same effect. Completely ignoring the underlying string_literal and using only the view sounds like a job for generalized lifetime extension.

Thiago Macieira

unread,
Jul 6, 2015, 10:56:11 AM7/6/15
to std-pr...@isocpp.org
On Monday 06 July 2015 09:32:54 Bo Persson wrote:
> > The only solution in my view for this problem is to force the execution
> > charset to UTF-8. On most OS except for Windows, that's already the case.
> > The trouble is just convincing Windows and Microsoft compilers to be like
> > that.
> You have another player using EBCDIC. Not likely to give up on that either.

That's right, but that one already faces problems with people who assume that
everything is ASCII.

Thiago Macieira

unread,
Jul 6, 2015, 11:00:39 AM7/6/15
to std-pr...@isocpp.org
On Monday 06 July 2015 16:37:52 David Krauss wrote:
> > Don't get me wrong. I do think we should fix the woeful lack of UTF-8 and
> > Unicode support in the language. Compared to QString and QTextCodec,
> > support in the Standard Library is laughable.
>
> Why not make proposals?

I'm not interested. I might try and find the time to write a proposal about the
PMF math and I'm following closely string_literal, reflection, etc., but
writing for the runtime standard library has no appeal to me.

b17...@gmail.com

unread,
Jul 6, 2015, 11:52:07 AM7/6/15
to std-pr...@isocpp.org
I want to create a constexpr object from a UDL string. Like:

constexpr UUID = "123e4567-e89b-12d3-a456-426655440000"_uuid;

How can this be done with basic_string_literal without an addition to string UDLs?

Thiago Macieira

unread,
Jul 6, 2015, 12:41:34 PM7/6/15
to std-pr...@isocpp.org
On Monday 06 July 2015 08:52:07 b17...@gmail.com wrote:
> I want to create a constexpr object from a UDL string. Like:
>
> constexpr UUID = "123e4567-e89b-12d3-a456-426655440000"_uuid;
>
> How can this be done with basic_string_literal without an addition to
> string UDLs?

You were thinking of doing:

template <char... str, size_t N> operator""_uuid();

and operate on str[i].

Why can't your function operate on literal[i] ?

David Krauss

unread,
Jul 6, 2015, 12:49:05 PM7/6/15
to std-pr...@isocpp.org

Don’t use basic_string_literal. Try this.

Louis Dionne

unread,
Jul 6, 2015, 2:36:10 PM7/6/15
to std-pr...@isocpp.org
David,

I understand how and why N4121-style compile-time strings have a better 
compile-time performance than type-level strings. However, type-level
strings are more powerful (strictly speaking) than constexpr strings.
Consider the following challenge:

Write a function `f` (it can be a template) which takes two arguments 
representing strings known at compile-time and which statically asserts 
that both strings are equal.

The solution with type-level strings:

    // Utilities
    template <char ...s>
    struct string { };

    template <char ...s1, char ...s2>
    constexpr std::false_type operator==(string<s1...>, string<s2...>) 
    { return {}; }

    template <char ...s>
    constexpr std::true_type operator==(string<s...>, string<s...>) 
    { return {}; }

    template <typename CharT, CharT ...c>
    constexpr auto operator""_s() { return string<c...>{}; }


    // And now the function
    template <typename S1, typename S2>
    void f(S1 s1, S2 s2) {
        static_assert(s1 == s2, "");
    }


I know of no solution with strings à la N4121, and so (if I'm right) the claim 
at the beginning of the paper that both approaches are equivalent is downright 
wrong. The thing is that the value of the string must be contained in the type
of the object that represents that string, since passing an object as an 
argument to a function discards constexpr-ness. More is explained in the 
documentation of the Hana library [1].

Note that if the above challenge seems stupid, well it is. However, the exact
same problem applies if you try to write a function which maps compile-time
strings to heterogeneous objects, for example. And this is a very, very 
important use case for compile-time strings.

Now, you might be aware of all this already. However, your answer

    You can also get a char non-type parameter pack from a basic_string_literal, 
    provided that it can bind to a template non-type parameter of reference type. 
    so the feature would need justification for being a core language feature 
    instead of a library. 

makes me think that you envision a different way of making this possible. Is
this right? If so, how might one get a template parameter from a non-constexpr
basic_string_literal?

Regarding performance; anyone sane would do the hardcore compile-time string 
processing with constexpr functions, and then turn the result back into a 
type-level string. The interface of the library would however be comprised of 
type-level strings, because that's what's required for most applications anyway.

Regards,
Louis

b17...@gmail.com

unread,
Jul 6, 2015, 2:38:07 PM7/6/15
to std-pr...@isocpp.org
Thanks for this. I had tried something like this but it didn't work because I tried to use static_assert for error checking. The parameters of the UDL function aren't constexpr so they can't be used with static_assert but out of these non-constexpr parameters you can construct a constexpr UUID object? How does that work?

Louis Dionne

unread,
Jul 6, 2015, 3:42:57 PM7/6/15
to std-pr...@isocpp.org
For completeness; I don't know what I was thinking, but I didn't test my solution
and it does not work as-is. Here's a corrected solution:


    #include <type_traits>

    // ...same as before...

    template <typename S1, typename S2>
    void f(S1 s1, S2 s2) {
        auto eq = s1 == s2;
        static_assert(eq, "");
    }

    int main() {
        f("abc"_s, "abc"_s);
    }

Regards,
Louis

Max Truxa

unread,
Jul 8, 2015, 1:58:03 PM7/8/15
to std-pr...@isocpp.org
On Monday, July 6, 2015 at 8:15:59 AM UTC+2, David Krauss wrote:

What do you mean, arbitrary? Unprefixed strings use the execution character set. u8 strings use UTF-8. The same goes for character literals, now that the u8 prefix exists there too, although multibyte characters must be treated as substrings either way.

I think what Thiago Macieira meant was that you can't distinguish between "" and u8"" based on its type (e.g. overloading, std::is_same, etc) since both are char.

Is there a problem with letting unsigned char stand-in for char8_t? I mean, changing types has its own problems either way, but I don’t see the advantage of an additional type.

unsigned char is not sufficient here because char's signed-ness is implementation-defined which means that on a platform where char is actually unsigned char nothing would change.

Note: I'm currently working on a proposal for char8_t. Sure, yet another character type is not perfect but it's the only way to make UTF-8 string literals (and in the future UTF-8 character literals) usable. For (an incomplete) reference see my post from a few weeks ago: Distinct type of array elements in UTF-8 string literals (char8_t).

Daniel Krügler

unread,
Jul 8, 2015, 2:12:19 PM7/8/15
to std-pr...@isocpp.org
2015-07-08 19:58 GMT+02:00 Max Truxa <m...@maxtruxa.com>:
> On Monday, July 6, 2015 at 8:15:59 AM UTC+2, David Krauss wrote:
>> Is there a problem with letting unsigned char stand-in for char8_t? I
>> mean, changing types has its own problems either way, but I don’t see the
>> advantage of an additional type.
>
> unsigned char is not sufficient here because char's signed-ness is
> implementation-defined which means that on a platform where char is actually
> unsigned char nothing would change.

This statement is misleading. Plain "char", "signed char", and
"unsigned char" are always three different types. There does not exist
a platform, where char is actually unsigned char (or signed char),
even though char is either a signed or an unsigned type:

#include <type_traits>

static_assert(!std::is_same<char, unsigned char>() &&
!std::is_same<char, signed char>(), "char is different");


- Daniel

Max Truxa

unread,
Jul 8, 2015, 2:30:25 PM7/8/15
to std-pr...@isocpp.org
On Wednesday, July 8, 2015 at 8:12:19 PM UTC+2, Daniel Krügler wrote:

This statement is misleading. Plain "char", "signed char", and
"unsigned char" are always three different types. There does not exist
a platform, where char is actually unsigned char (or signed char),
even though char is either a signed or an unsigned type:

That didn't come across as I wanted it to. You are right, obviously. Initially I intended to refer to Bo Persson's comment about the persisting ambiguity regarding e.g. uint8_t when overloading but that got lost somehow...
Reply all
Reply to author
Forward
0 new messages