A compile-time string literal library type

1,033 views
Skip to first unread message

Michael Price - Dev

unread,
Dec 15, 2013, 10:26:59 PM12/15/13
to std-pr...@isocpp.org
I'm considering writing up a proposal to standardize a library type that represents a compile-time sequence of character literals.  I'm looking for a first round of feedback.  You can view my prototype implementation at https://github.com/michaelbprice/cxx_snippets/blob/master/string_literal/string_literal.cpp

I'm using the top-of-trunk clang to compile with the following command-line:

clang++ -std=c++1y string_literal.cpp -Wno-gnu-string-literal-operator-template

At the moment, my "test" code looks like this:

void testSizes()

{
   
static_assert("He"_cs.size() == 2, "size() is wrong");
   
static_assert(L"Wo"_cs.size() == 2, "size() is wrong");
   
static_assert(u"Hello!"_cs.size() == 6, "size() is wrong");
   
static_assert(U"World!"_cs.size() == 6, "size() is wrong");
}

void testEmptyStrings()
{
   
constexpr auto mb = ""_cs;
   
constexpr auto wide = L""_cs;
   
constexpr auto utf16 = u""_cs;
   
constexpr auto utf32 = U""_cs;

   
static_assert(mb.size() == 0, "Multi-byte size() is wrong");
   
static_assert(wide.size() == 0, "Wide size() is wrong");
   
static_assert(utf16.size() == 0, "UTF-16 size() is wrong");
   
static_assert(utf32.size() == 0, "UTF-32 size() is wrong");

   
static_assert(mb.length() == 0, "Multi-byte length() is wrong");
   
static_assert(wide.length() == 0, "Wide length() is wrong");
   
static_assert(utf16.length() == 0, "UTF-16 length() is wrong");
   
static_assert(utf32.length() == 0, "UTF-32 length() is wrong");

   
static_assert(mb.empty() == true, "Multi-byte empty() is wrong");
   
static_assert(wide.empty() == true, "Wide empty() is wrong");
   
static_assert(utf16.empty() == true, "UTF-16 empty() is wrong");
   
static_assert(utf32.empty() == true, "UTF-32 empty() is wrong");

   
static_assert(mb == ""_cs, "operator==() is wrong");
   
static_assert(mb != " "_cs, "operator!=() is wrong");
}

void testSearching()
{
   
constexpr auto hi = "Hello "_cs + "World!"_cs;

   
static_assert(hi.find('?') == std::string::npos, "find() is wrong");
   
static_assert(hi.find(' ') == 5, "find() is wrong");
   
static_assert(hi.find('o') == 4, "find() is wrong");
   
static_assert(hi.find("orl"_cs) == 7, "find() is wrong");
   
static_assert(hi.rfind('o') == 7, "rfind() is wrong");
   
static_assert(hi.rfind("el"_cs) == 1, "rfind() is wrong");
}

void testSubstringFunctions()
{
   
constexpr auto big = "Hello, World!"_cs;
   
constexpr auto smaller = big.substr<7>();
   
constexpr auto lessbig = big.cdr();

   
static_assert(smaller.size() == 6, "substr<>() is wrong");
   
static_assert(smaller[0UL] == 'W', "substr<>() is wrong");
   
static_assert(big.substr<0>().at(0) == 'H', "substr<>() is wrong");
   
static_assert(big.substr<12>().at(0) == '!', "substr<>() is wrong");
   
static_assert(big.size() == lessbig.size() + 1, "cdr() is wrong");
}

void testComparisons()
{
   
constexpr auto first = "One"_cs;
   
constexpr auto second = "Two"_cs;
   
constexpr auto last = "One"_cs;


   
static_assert(first.compare(last) == 0, "compare() is wrong");
   
static_assert(first.compare(second) == -1, "compare() is wrong");
   
static_assert(second.compare(last) == 1, "compare() is wrong");

   
static_assert("A"_cs == "A"_cs, "operator== is wrong");
   
static_assert("A"_cs != "a"_cs, "operator!= is wrong");
   
static_assert("a"_cs > "A"_cs, "operator> is wrong");
   
static_assert("1"_cs < "2"_cs, "operator< is wrong");
}

void testConversions()
{
   
static_assert("0"_cs.to_number() == 0, "to_number() is wrong");
   
static_assert("1"_cs.to_number() == 1, "to_number() is wrong");
   
static_assert("11"_cs.to_number() == 11, "to_number() is wrong");
   
static_assert("65535"_cs.to_number() == 65535, "to_number() is wrong");
   
//"99kj343"_cs.to_number();

   
constexpr ptrdiff_t number = "12345"_cs;
   
static_assert(number == 12345, "operator size_type is wrong");
   
//constexpr ptrdiff_t positive = "+1"_cs;
   
//static_assert(positive == 1, "to_number() is wrong");
}

void testConcatenation()
{
   
constexpr auto hello = "Hello"_cs;
   
constexpr auto world = "World"_cs;

   
constexpr auto together = hello + ", "_cs + world + "!"_cs;
   
static_assert(together.size() == 13, "operator+ is wrong");
   
static_assert(together[5UL] == ',', "operator+ is wrong");
}


template <typename Str>
struct star_wars_speaker
{
    star_wars_speaker
() { std::cout << "Someone in Star Wars" << std::endl; }
};

template <>
struct star_wars_speaker<decltype("I'd just as soon kiss a Wookiee"_cs)>
{
    star_wars_speaker
() { std::cout << "Carrie Fisher" << std::endl; }
};

template <>
struct star_wars_speaker<decltype("Luke, I am your father"_cs)>
{
    star_wars_speaker
() { std::cout << "James Earl Jones" << std::endl; }
};

template <>
struct star_wars_speaker<decltype("Midi-chlorians"_cs)>
{
    star_wars_speaker
() { std::cout << "Liam Neeson" << std::endl; }
};

void testClassTemplates()
{
   
auto one = star_wars_speaker<decltype("I'd just as soon kiss a Wookiee"_cs)>();
   
auto two = star_wars_speaker<decltype("Luke, I am your father"_cs)>();
   
auto thr = star_wars_speaker<decltype("Midi-chlorian"_cs)>();
   
auto fur = star_wars_speaker<decltype("How wude!"_cs)>();
}


template <typename Str>
void star_trek_speak(const Str &)
{
    std
::cout << "Someone in Star Trek" << std::endl;
}

template <>
void star_trek_speak(const decltype("Illogical captain"_cs) &)
{
    std
::cout << "Leonard Nimoy" << std::endl;
}

template <>
void star_trek_speak(const decltype("He's dead Jim"_cs) &)
{
    std
::cout << "DeForest Kelley" << std::endl;
}

template <>
void star_trek_speak(const decltype("Oh my"_cs) &)
{
    std
::cout << "George Takei" << std::endl;
}

void testFunctionTemplates()
{
    star_trek_speak
("Illogical captain"_cs);
    star_trek_speak
("He's dead Jim"_cs);
    star_trek_speak
("Oh my"_cs);
    star_trek_speak
("Khaaaaaaaaaaannnnnn!"_cs);
}

int main()
{
    testSizes
();
    testEmptyStrings
();
    testSubstringFunctions
();
    testComparisons
();
    testConversions
();
    testConcatenation
();
    testClassTemplates
();
    testFunctionTemplates
();

   
return 0;
}



Zhihao Yuan

unread,
Dec 15, 2013, 11:52:52 PM12/15/13
to std-pr...@isocpp.org
On Sun, Dec 15, 2013 at 10:26 PM, Michael Price - Dev
<michael.b...@gmail.com> wrote:
> I'm considering writing up a proposal to standardize a library type that
> represents a compile-time sequence of character literals. I'm looking for a
> first round of feedback. You can view my prototype implementation at
> https://github.com/michaelbprice/cxx_snippets/blob/master/string_literal/string_literal.cpp
>
> I'm using the top-of-trunk clang to compile with the following command-line:

I think you know that the language feature you used

http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3599.html

is not a part of C++14.

Anyway.

What's the different between `make_array("str")` and "str"_cs?

http://isocpp.org/files/papers/N3824.html

1. string_literalS are different in types (works with template);
2. string_literal cannot be modified;
3. string_literal has some algorithms as member functions.

So what is the problem you want to solve? Compile-time parsing?
If so, Bristol EWG says:

1. don't think template instantiations is the only way to do it;
2. doubt user to overuse it (a compile time SQL parser is fun,
but what you gain?);
3. need use cases.

--
Zhihao Yuan, ID lichray
The best way to predict the future is to invent it.
___________________________________________________
4BSD -- http://4bsd.biz/

David Krauss

unread,
Dec 16, 2013, 1:35:14 AM12/16/13
to std-pr...@isocpp.org
On 12/16/13 12:52 PM, Zhihao Yuan wrote:
> So what is the problem you want to solve? Compile-time parsing?
> If so, Bristol EWG says:
>
> 1. don't think template instantiations is the only way to do it;
> 2. doubt user to overuse it (a compile time SQL parser is fun,
> but what you gain?);
> 3. need use cases.

I need a template to map compile-time strings to types.

The use case is essentially an exception class template. The program, a
parser, has countless exception types, representing the various syntax
errors and such. It's nice to be able to declare a new, unique, named
exception type, without making a boilerplate declaration at namespace
scope. There are also other generic use cases in the same program.

Right now I'm making due with an imperfect constexpr hash:

typedef std::uintmax_t hash_t;
constexpr hash_t parameter_name_hash( char const *str, std::size_t
offset = 0 )
{ return str[ offset ]? /* recursive hash iteration */ : 0; }

constexpr hash_t operator "" _param ( char const * str, std::size_t ) {
return parameter_name_hash( str ); }


The ud-string-literal is used directly as a non-type template argument,
so it must have scalar type, which cannot contain more information than
a std::intmax_t, or else be a reference to global, which seemingly
cannot be generated suitably. Another way of looking at this is not the
need for template string arguments, but the need for a non-type template
argument that can uniquely encode arbitrarily much data, and can be
returned from a ud-string-literal function.

It would be a lot easier if C style arrays were value-semantic. But
because these uniquefying tags used very heavily in my metaprocessing
(not as strings, just as hashed ID numbers), I wouldn't really want to
sacrifice much efficiency just to avoid hash collisions. I suspect that
one-parameter-per-character string packs could cause compile time to
suffer. (If such an extension is already part of the g++1y dialect, I
should try and see.)

Zhihao Yuan

unread,
Dec 16, 2013, 10:44:22 AM12/16/13
to std-pr...@isocpp.org
On Mon, Dec 16, 2013 at 1:35 AM, David Krauss <pot...@gmail.com> wrote:
> Right now I'm making due with an imperfect constexpr hash:
> [...]
>
> The ud-string-literal is used directly as a non-type template argument, so
> it must have scalar type, which cannot contain more information than a
> std::intmax_t, or else be a reference to global, which seemingly cannot be
> generated suitably. Another way of looking at this is not the need for
> template string arguments, but the need for a non-type template argument
> that can uniquely encode arbitrarily much data, and can be returned from a
> ud-string-literal function.

AFAICS all you are building is a symbol table, then why not ask for a
language feature like, a symbol type? Compiler builds symbol table
cheaper and nicer then you do :)

Sean Middleditch

unread,
Dec 16, 2013, 2:22:56 PM12/16/13
to std-pr...@isocpp.org
On Sunday, December 15, 2013 8:52:52 PM UTC-8, Zhihao Yuan wrote:

 3. need use cases.


We use a string literal wrapper to allow compile-time string hashing for big efficiency gains when using fixed known keys in hash tables.  This need comes up a lot with parsing (XML, JSON, INI config files, www-urlencoded POST requests, HTTP headers, command and script languages, etc.).

Given the proper interfaces to a hash table, you can have overloads or extra methods so that lookups with pre-computed hashes are possible and you can avoid the overhead of the hash function.  For smaller tables the hash function can be almost as expensive (or more expensive) than the bucket lookup and value traversal, especially if you are using a slightly non-conforming unordered_map replacement that uses certain open addressing algorithms (which might mean iterators get invalidated during insert/erase even without rehashing.)

Given the current lack of some essential C++11 features in a very popular compiler we end up using a few hacks.  One common one I see a lot is to use a macro like HASH_STR("literal") and then a preprocessor to find and replace all those with hash results.  Obviously that is far from ideal, though there are plenty of other needs for a specialized pre-processor in C++ today (it's one way to implement reflection for serialization and editor data binding).

Zhihao Yuan

unread,
Dec 16, 2013, 2:32:14 PM12/16/13
to std-pr...@isocpp.org
On Mon, Dec 16, 2013 at 2:22 PM, Sean Middleditch
<sean.mid...@gmail.com> wrote:
> Given the current lack of some essential C++11 features in a very popular
> compiler we end up using a few hacks. One common one I see a lot is to use
> a macro like HASH_STR("literal") and then a preprocessor to find and replace
> all those with hash results. Obviously that is far from ideal, though there
> are plenty of other needs for a specialized pre-processor in C++ today (it's
> one way to implement reflection for serialization and editor data binding).

I think you can do it with array reference and constexpr function.

The string_literal class proposed here is different: every string literal
differ in content also differ in types. To array reference and std::array,
they are only differ in types when the lengths are different.

David Krauss

unread,
Dec 16, 2013, 6:06:42 PM12/16/13
to std-pr...@isocpp.org
On 12/16/13 11:44 PM, Zhihao Yuan wrote:
> On Mon, Dec 16, 2013 at 1:35 AM, David Krauss <pot...@gmail.com> wrote:
>> Right now I'm making due with an imperfect constexpr hash:
>> [...]
>>
>> The ud-string-literal is used directly as a non-type template argument, so
>> it must have scalar type, which cannot contain more information than a
>> std::intmax_t, or else be a reference to global, which seemingly cannot be
>> generated suitably. Another way of looking at this is not the need for
>> template string arguments, but the need for a non-type template argument
>> that can uniquely encode arbitrarily much data, and can be returned from a
>> ud-string-literal function.
> AFAICS all you are building is a symbol table, then why not ask for a
> language feature like, a symbol type? Compiler builds symbol table
> cheaper and nicer then you do :)

I do also perform the same hash on user input as a proxy for string
equality. When this is not required, I use an elaborated-type-specifier
for local declaration of a namespace-member, incomplete class.

My use case sounds similar to Sean Middleditch's. If the declarations
were at namespace scope, and they contained two copies of the literal
(likely job for a macro), then the declared objects could have a proper
copy of the string to catch collisions.

Michael Price - Dev

unread,
Dec 17, 2013, 10:45:54 PM12/17/13
to std-pr...@isocpp.org


On Sunday, December 15, 2013 10:52:52 PM UTC-6, Zhihao Yuan wrote:
On Sun, Dec 15, 2013 at 10:26 PM, Michael Price - Dev
<michael.b...@gmail.com> wrote:
> I'm considering writing up a proposal to standardize a library type that
> represents a compile-time sequence of character literals.  I'm looking for a
> first round of feedback.  You can view my prototype implementation at
> https://github.com/michaelbprice/cxx_snippets/blob/master/string_literal/string_literal.cpp
>
> I'm using the top-of-trunk clang to compile with the following command-line:

I think you know that the language feature you used

  http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3599.html

is not a part of C++14. 

Actually, I was unaware, although I had stumbled upon clang's implementation of said feature mostly by mistake.  I'm not intending this for inclusion in C++14, but rather the next version out.  I've contacted the author of that proposal to see about addressing the straw poll concerns. 
 
Anyway.

What's the different between `make_array("str")` and "str"_cs?

  http://isocpp.org/files/papers/N3824.html

 1. string_literalS are different in types (works with template);
 2. string_literal cannot be modified;
 3. string_literal has some algorithms as member functions.

So what is the problem you want to solve?  Compile-time parsing?
If so, Bristol EWG says:

 1. don't think template instantiations is the only way to do it;
 2. doubt user to overuse it (a compile time SQL parser is fun,
    but what you gain?);
 3. need use cases.

My initially motivating use case was as a proof for yet another change that I will be proposing regarding the message for static_assert.  Example:


constexpr auto library_name = "BEST"_cs;
constexpr auto static_prefix = library_name + ": "_cs;

static_assert(false, static_prefix + "Oops, did I say false"); // would print: "BEST: Oops, did I say false"


Of course, there's lots of other potential uses. See http://2012.cppnow.org/session/metaparse-complie-time-parsing-with-template-metaprogramming/ for instance (won best presentation award at C++Now 2012).  I can't even begin to imagine all of the cool ways that this type of thing could be used.

For me, the idea that in order to get a rich interface for strings, one must succumb to the run-time penalty of std::string is just counter to one of the ideas of fundamental underpinnings of C++, namely that we don't make people pay for the things that they don't need. This is very much the same rationale behind std::array (vs. C-arrays or std::vector).

David Krauss

unread,
Dec 17, 2013, 11:19:06 PM12/17/13
to std-pr...@isocpp.org
On 12/18/13 11:45 AM, Michael Price - Dev wrote:
> Actually, I was unaware, although I had stumbled upon clang's
> implementation of said feature mostly by mistake. I'm not intending this
> for inclusion in C++14, but rather the next version out. I've contacted
> the author of that proposal to see about addressing the straw poll
> concerns.
The name of the -Wno-gnu-string-literal-operator-template option implies
that Clang emulates a GCC extension.

You should check that the extension has been proposed for
standardization. I suspect it hasn't. Your library extension could not
be accepted until after core language support has been accepted.

For a core language proposal, it might help to include information about
the experiences of folks who have used this extension in real life. Its
cross-platform support suggests there may be such projects in the wild.
To be sure, strings can easily push the scalability limits of parameter
pack expansion.

Zhihao Yuan

unread,
Dec 17, 2013, 11:42:53 PM12/17/13
to std-pr...@isocpp.org
On Tue, Dec 17, 2013 at 10:45 PM, Michael Price - Dev
<michael.b...@gmail.com> wrote:
> constexpr auto library_name = "BEST"_cs;
> constexpr auto static_prefix = library_name + ": "_cs;
>
> static_assert(false, static_prefix + "Oops, did I say false"); // would

This can be build based on same technique of std::array;
you really don't need string literal those different in content
to have different types.

And they actually meet your requirements better, because
after being built at compile-time, std::array based string
can be modified at runtime.

Metaparse is cool, but think about it: all compilers can
parse at compiler time, and you want to parse at compile
time, then why you write yet-another parser? If you want
a rich interface, I think this is a good solution:

http://static.rust-lang.org/doc/0.6/tutorial-macros.html

if you just want a symbol table, then just ask compiler
to expose its symbol table...
Reply all
Reply to author
Forward
0 new messages