User defined literal for size_t

527 views
Skip to first unread message

morw...@gmail.com

unread,
Jul 18, 2014, 7:03:36 PM7/18/14
to std-pr...@isocpp.org
It would be great to have a standard user-defined literal for std::size_t. I often want to be able to pass a std::size_t constant to some generic functions, and since the type is implementation-defined, I generally have to use a full static_cast in order to do so. It would probably make sense to have a UDL for std::size_t, which is arguably one of the most used standard integer types. It would be z if we follow the printf format specifier for size_t. Example:

for (auto i: irange(2z, 12z))
{
    // whatever
}


I don't think that the other specifiers which printf has format specifiers for are used enough to deserve an UDL though.

Your thoughts?

David Krauss

unread,
Jul 18, 2014, 7:18:18 PM7/18/14
to std-pr...@isocpp.org
I agree, but given use of the U suffix, how often do you have a literal value that actually overflows unsigned int? And even if it does, it will bump up to long unsigned int automatically. If you specifically need size_t for metaprogramming (template argument sensitivity), then it’s better to:

a) explicitly cast or specify type (in this case, declare std::size_t i)
b) pass an explicit template argument
c) change the metaprogram (is boost::irange behavior really sensitive to this difference?)

There’s no reason in principle we shouldn’t have suffixes for various typedefs, particularly fixed-size types like uint32_t, but suffixes shouldn’t be encouraged where better stylistic choices exist.

Also, if lowercase z is added, uppercase Z should do the same thing.

Róbert Dávid

unread,
Jul 19, 2014, 12:03:23 PM7/19/14
to std-pr...@isocpp.org


2014. július 19., szombat 1:18:18 UTC+2 időpontban David Krauss a következőt írta:
it’s better to:

a) explicitly cast or specify type (in this case, declare std::size_t i)
b) pass an explicit template argument
c) change the metaprogram (is boost::irange behavior really sensitive to this difference?)

(...)


suffixes shouldn’t be encouraged where better stylistic choices exist.

I'm not convinced if these are better stylistic choices. Consider:
vector<int> vals{1,2,3,4,5,6,7,8,9,10,11,12,13};
auto product = accumulate(begin(vals), end(vals), 1, multiplies<>); //Probably undefined behavior!
Because accumulate will use int for calculation from 3rd parameter's type , and 13! = 6 227 020 800, whereas 32 bit int max is just 2 147 483 648. Signed integer overflow is undefined behavior.

To solve this, option b) and c) is not really feasible: the return / calculation type is the second template parameter so I can't explicitly pass it without defining the iterator's type as well, and I don't think it is ever good suggestion to avoid a standard algorithm.

Option a) will now look like this:
auto product = accumulate(begin(vals), end(vals), (unsigned long long)1, multiplies<>);
Provided there is a postfix for unsigned long longs, I can just write:
auto product = accumulate(begin(vals), end(vals), 1ull, multiplies<>);

But unsigned long long is not a 'well defined' type, what if I want explicitly 64 bit calculations?
auto product = accumulate(begin(vals), end(vals), (uint64_t)1, multiplies<>);
..but I don't have the second option now :(

Robert

David Krauss

unread,
Jul 19, 2014, 9:48:54 PM7/19/14
to std-pr...@isocpp.org
On 2014–07–20, at 12:03 AM, Róbert Dávid <lrd...@gmail.com> wrote:

But unsigned long long is not a 'well defined' type, what if I want explicitly 64 bit calculations?
auto product = accumulate(begin(vals), end(vals), (uint64_t)1, multiplies<>);
..but I don't have the second option now :(

What second option? Can you elaborate what’s wrong with this line of code?

The three alternatives I gave are in general order of preference, though they express completely different approaches, so I don’t think it’s a cause for concern that (b) and (c) don’t apply.

Róbert Dávid

unread,
Jul 20, 2014, 6:43:46 AM7/20/14
to std-pr...@isocpp.org

2014. július 20., vasárnap 3:48:54 UTC+2 időpontban David Krauss a következőt írta:
What second option? Can you elaborate what’s wrong with this line of code?
 
The one with a(n uint64_t) literal. There is nothing wrong with it, it's just nothing better than the one using casting to unsigned long long, but there the programmer has given a less verbose choice to write a literal, making unsigned long long a.. "more equal" citizen than uint64_t.

As a totally unlikely alternative, if it is actually an intention to use (unsigned long long)1 instead of 1ull, why not remove/deprecate the literal suffixes?

Robert

David Krauss

unread,
Jul 20, 2014, 7:48:53 AM7/20/14
to std-pr...@isocpp.org
On 2014–07–20, at 6:43 PM, Róbert Dávid <lrd...@gmail.com> wrote:


2014. július 20., vasárnap 3:48:54 UTC+2 időpontban David Krauss a következőt írta:
What second option? Can you elaborate what’s wrong with this line of code?
 
The one with a(n uint64_t) literal. There is nothing wrong with it, it's just nothing better than the one using casting to unsigned long long, but there the programmer has given a less verbose choice to write a literal, making unsigned long long a.. "more equal" citizen than uint64_t.

OK. You’re not arguing for the suffix(es) at all. Thanks for clarifying.

As a totally unlikely alternative, if it is actually an intention to use (unsigned long long)1 instead of 1ull, why not remove/deprecate the literal suffixes?

The literal suffixes are not harmful enough to deprecate, and they can be useful when precise specification of a type isn’t desired. For floating point types in particular, no format-specific typedefs exist, for better or worse, and floating point “magic numbers” are much more common (per variable of applicable type) than integers.

I’d generally agree though that integer type-suffixes are poor style. There are only a few cases where there’s no substitute, such as in preprocessing conditional control expressions. Otherwise, any programmer good enough to know that safety and explicit meaning are worth a little extra verboseness will seldom be tempted to use them in the first place.

Most practical ud-literal suffixes likewise have more verbose names than just Z or _Z. Usage of ud-literals is more about emphasizing the argument value over the function call, and expressing the constexpr semantic at the call site, than merely saving keystrokes.

rhalb...@gmail.com

unread,
Oct 21, 2014, 3:59:00 PM10/21/14
to std-pr...@isocpp.org
I like the idea, and use it in my own code (with an underscore of course). I typed up a proposal but didn't manage to get it done before the Urbana deadline, so it will have to wait until 2015. Here's the draft proposal

Myriachan

unread,
Oct 21, 2014, 4:30:09 PM10/21/14
to std-pr...@isocpp.org, rhalb...@gmail.com
On Tuesday, October 21, 2014 12:59:00 PM UTC-7, rhalb...@gmail.com wrote:

I like the idea, and use it in my own code (with an underscore of course). I typed up a proposal but didn't manage to get it done before the Urbana deadline, so it will have to wait until 2015. Here's the draft proposal

What if an architecture defines extended-precision integers, and std::size_t is defined by that architecture to be a size larger than unsigned long long?

Melissa

Daniel Krügler

unread,
Oct 21, 2014, 5:36:18 PM10/21/14
to std-pr...@isocpp.org, Rein Halbersma
2014-10-21 22:30 GMT+02:00 Myriachan <myri...@gmail.com>:
> What if an architecture defines extended-precision integers, and std::size_t
> is defined by that architecture to be a size larger than unsigned long long?

While this is a valid argument, it also is actually a possible
defective restriction of user-defined literals. There already exists a
core language related to that:

http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1266

My recommendation for the proposal is to refer to this CWG issue as well.

- Daniel

rhalb...@gmail.com

unread,
Oct 21, 2014, 7:19:49 PM10/21/14
to std-pr...@isocpp.org, rhalb...@gmail.com
Thanks for the guidance. Interestingly, [c.files]/4 contains perhaps a way of resolving this: 

if and only if the type intmax_t designates an extended integer type (3.9.1), the following function
signatures are added:

intmax_t abs(intmax_t);
imaxdiv_t div(intmax_t, intmax_t); 

Would it be feasible also add an operator"" z(size_t) that takes priority in the overload set iff size_t is an extended integer type?

Myriachan

unread,
Oct 21, 2014, 10:14:31 PM10/21/14
to std-pr...@isocpp.org, rhalb...@gmail.com

I don't really understand the argument in the text of CWG 1266, but the point definitely remains that requiring "unsigned long long" is a bit silly.  Paragraph 3 of [lex.ext] implies that if you want to use very long literals, don't provide operator ""(unsigned long long) and use one of the two raw literal forms instead.

CWG 1266 is nontrivial to solve.  Consider GCC's type unsigned __int128.  Though not formally "extended-precision integer" as defined by the Standard due to bad design of std::uintmax_t, GCC and clang support 128-bit integers in the form of __int128 for calculations, but do not support integer literals of that size.  Requiring supporting extended-precision integers for user-defined literals could be problematic for that reason: the compiler may not have code actually capable of converting such strings to the extended-precision types.

(That said, the Standard I don't think supports extended-precision integers that do not have built-in literal forms.)

Melissa

rhalb...@gmail.com

unread,
Oct 23, 2014, 4:04:35 AM10/23/14
to std-pr...@isocpp.org, rhalb...@gmail.com
On Tuesday, October 21, 2014 11:36:18 PM UTC+2, Daniel Krügler wrote:
I spotted two more issues (1620 and 1735). Attached an updated draft proposal. 
Dxxxx.html

Myriachan

unread,
Oct 24, 2014, 3:22:52 AM10/24/14
to std-pr...@isocpp.org, rhalb...@gmail.com


Perhaps a proposal could fix the 1620, 1723 and 1735 issues?  Here are some things I think user-defined literals need:

  1. An operator "" function may take a single parameter of non-enumeration non-bool integral type, or floating-point type(*).  (Don't permit cv-qualification; it's pointless.)  Having more than one for a given suffix is ill-formed.

  2. Signed types are allowed, but programmers ought to be aware that negative "literals" are actually a unary minus operator applied to a literal, and thus negative values never actually get passed to operator "" functions taking a signed type as a parameter.

  3. Use of a numeric literal with an operator "" function taking numeric type such that that literal overflows the numeric type is ill-formed.

  4. Integral and floating-point literals passed to operator "" functions of the two signatures:

    T operator "" X(const [volatile] char *)
    template <
    some nonzero or variable number of chars> T operator "" X()

    for type T and ud-suffix X would take a string of arbitrary length of the general grammar of a numeric literal, but possibly exceeding the maximum length of a literal of any standard or extended-precision type, optionally up to an implementation-defined limit.  This implementation-defined limit would be at least sufficient to express any nonnegative value of integral type T, or the same length as supported for ordinary floating-point literals of floating-point type T.

  5. operator "" for string literals may have a new template form roughly corresponding to that available with numeric literals:

    template <class C, zero or more, or variable number, non-type template parameters of type C>
    T operator "" X()

    where C is one of char, char16_t, char32_t or wchar_t, T is arbitrary, and X is the ud-suffix.  The characters of the concatenated ([lex.ext]/6) string literal would become non-type template parameters of type C to this function.  (This I believe is already a paper.  This one is actually considered undesirable for some reason...?)

  6. Make multi-character character constants legal for user-defined literals; such literals would work like string literals, except that a) a character literal of exactly one character can call the old style (both existing would be ill-formed)  b) an unused "int" parameter is added to the function parameters to distinguish character literals from string literals.  Concatenation of character literals would be supported, but only for user-defined literals.  Same for empty character literals.  Concatenation of character literals for built-in multi-character character literals would be conditionally supported; an implementation supporting multi-character character literals would have to support concatenation for built-in characters.



Does what I wrote seem reasonable?

Melissa

Reply all
Reply to author
Forward
0 new messages