More efficient overloads for to_string(number).

970 views
Skip to first unread message

Matthew Fioravante

unread,
Oct 17, 2014, 2:36:00 PM10/17/14
to std-pr...@isocpp.org

Consider the to_string library of functions such as:

std::string to_string(int value);

This interface while convenient is horribly inefficient. Every time you call this function you will allocate memory for a temporary string. If you plan to do this operation multiple times (which is common for use cases such as file parsing routines) you cannot avoid these memory allocations with move semantics or return value optimization. The performance implications of to_string make it unusable for high performance parsing routines.

I would propose 2 additional overloads.

string_view to_string(std::string& dest, int value);
string_view to_string
(std::array_view<char> dest, int value);

The first version allows you to still parse the data into a memory buffer backed by a std::string but allows you to reuse the same string object and save on memory allocations. The second version is available if you absolutely don't want to allocate memory at all and just store the results (possibly truncated) into a fixed size buffer. Since int to string conversions always have a maximum length on the resulting string, we can easily create a fixed size buffer on the stack. Not only does creating a buffer on the stack avoid memory allocation, the stack space itself is likely to be hot in the cache already whereas a random buffer returned by operator new will not be.

Compare the following 3 code fragments:

std::cout << to_string(i); //memory allocation
std
::cout << to_string(j); //memory allocation
std
::cout << to_string(k); //memory allocation

std::string s;
std
::cout << to_string(s, i); //memory allocation
std
::cout << to_string(s, j); //likely no memory allocation, reuse the buffer
std
::cout << to_string(s, k); //likely no memory allocation, reuse the buffer


char s[64]; //Chose some size big enough for the maximal string representation for decltype(i), delctype(j), and decltype(k)
std
::cout << to_string(s, i); //no memory allocation
std
::cout << to_string(s, j); //no memory allocation
std
::cout << to_string(s, k); //no memory allocation

The implementation of each of these can be built on top of one another and without optimization can be trivially implemented using snprintf

string_view to_string(array_view<char> s, int value) {
 
int len = snprintf(s.data(), s.length(), "%d", value);
 
return string_view(s.data(), len);
}



string_view to_string
(string& s, int value) {
 
char buf[32];
 
auto sv = to_string(buf, value);
  s
= sv;
 
return sv;
}

string to_string(int value) {
 
string s;
  to_string
(s, value);
 
return s;
}

 

Nevin Liber

unread,
Oct 17, 2014, 2:47:18 PM10/17/14
to std-pr...@isocpp.org
On 17 October 2014 13:35, Matthew Fioravante <fmatth...@gmail.com> wrote:

Consider the to_string library of functions such as:

std::string to_string(int value);

This interface while convenient


Which is the point of this interface.
 

is horribly inefficient.


Really?  I would expect that most numbers would fit within the small object optimization of std::string.

Of course, no one would mention something like "horribly inefficient" without measurements to back it up.  Where are your benchmarks?  Is your std::string implementation C++11/14 conforming?

Every time you call this function you will allocate memory for a temporary string.

Only for really large numbers.

If you plan to do this operation multiple times (which is common for use cases such as file parsing routines)

On what platform is file I/O not slow compared with 0-1 allocations?

you cannot avoid these memory allocations with move semantics or return value optimization.

You can avoid them most of the time with a decent std::string implementation.

The performance implications of to_string make it unusable for high performance parsing routines.

I really cannot wait to see your benchmarks. 
--
 Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com(847) 691-1404

Matthew Fioravante

unread,
Oct 17, 2014, 4:28:29 PM10/17/14
to std-pr...@isocpp.org


On Friday, October 17, 2014, Nevin Liber <ne...@eviloverlord.com> wrote:
On 17 October 2014 13:35, Matthew Fioravante <fmatth...@gmail.com> wrote:

Consider the to_string library of functions such as:

std::string to_string(int value);

This interface while convenient


Which is the point of this interface.

Indeed and i think its great to have by default an easy to use and impossible to screw up interface which is good enough most of the time. 
 

is horribly inefficient.


Really?  I would expect that most numbers would fit within the small object optimization of std::string.

If your domain of numbers are randomly generated ids or hashes the small string optimization does not help you. Also, I don't believe the standard mandates the size for sso, so your performance can vary between implementations. 

Of course, no one would mention something like "horribly inefficient" without measurements to back it up.  Where are your benchmarks?  Is your std::string implementation C++11/14 conforming?

I agree benchmarks should be provided. One example to consider is andrei alexandrescu's optimization talk where he optimizes a simple int to string routine which was a major hot spot of his entire application. He uses char* and i doubt he would change it to return strings by value. 

Every time you call this function you will allocate memory for a temporary string.

Only for really large numbers.

Sometimes we need to work with large numbers. 

If you plan to do this operation multiple times (which is common for use cases such as file parsing routines)

On what platform is file I/O not slow compared with 0-1 allocations?

This is a tempting conclusion to make but its not always correct. I've had production code where optimizing a csv file parser more than tripled the performance of a data loading routine. The application was cpu bound not io bound. If you or the os is doing asychronous io the io delay is even further mitigated.

Finally int to string is used in a lot more places than writing to files.

you cannot avoid these memory allocations with move semantics or return value optimization.

You can avoid them most of the time with a decent std::string implementation.

Summarizing my earlier points thats highly dependent on many factors and in some scenarios it does not work. 

The performance implications of to_string make it unusable for high performance parsing routines.

I really cannot wait to see your benchmarks. 

Sure
 
--
 Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com(847) 691-1404

--

---
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-proposals/OHF7J9jU1gc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.


--
Sent from my phone, please ignore spelling and grammar.

Sean Middleditch

unread,
Oct 17, 2014, 4:44:38 PM10/17/14
to std-pr...@isocpp.org
As an example I recently ran into, trying to format a simple set of floats to display on screen in a game was taking almost a fourth of a millisecond (you have 16.6 at most, assuming you're not targeting a 120hz display or doing stereoscopic 3D) using any dtsndstd library routine. I switched to an interface that didn't return a std::string; basically, snprintf to a locsl buffer I explicitly made large enough. It now completes in barely more than a few microseconds and most of that is building the render queue commands. No IO involved.

Allocations are expensive. Especially in implementations that have debug heaps, or constrained environments that have to worry more about totsl sllocatipn size or fragmentation than latency, etc.

Nevin Liber

unread,
Oct 17, 2014, 4:48:59 PM10/17/14
to std-pr...@isocpp.org
On 17 October 2014 15:28, Matthew Fioravante <fmatth...@gmail.com> wrote:

Really?  I would expect that most numbers would fit within the small object optimization of std::string.

If your domain of numbers are randomly generated ids or hashes the small string optimization does not help you.

Yes, every single idea on this mailing list has at least one theoretical use case.  That doesn't mean every single idea on this mailing list belongs in the standard.

How big is that audience?  If you are parsing that many ids/hashes for the performance to matter, what are you doing with them which doesn't require storing the strings?  Why do you need/want std::string as an intermediary at all?
 
Also, I don't believe the standard mandates the size for sso, so your performance can vary between implementations. 

That goes for most things in the standard library.  If the performance matters that much, why would you even have the dependency on the standard library?

After all, it isn't terribly hard to write a to_array() clone fills in a std::array (possibly internal to a custom class that also keeps track of the length) of the maximum number of digits the conversion can make.  I might even be weakly for such a function, as opposed to overloading the interface of to_string().

Matthew Fioravante

unread,
Oct 17, 2014, 5:03:09 PM10/17/14
to std-pr...@isocpp.org


On Friday, October 17, 2014, Nevin Liber <ne...@eviloverlord.com> wrote:
On 17 October 2014 15:28, Matthew Fioravante <fmatth...@gmail.com> wrote:

Really?  I would expect that most numbers would fit within the small object optimization of std::string.

If your domain of numbers are randomly generated ids or hashes the small string optimization does not help you.

Yes, every single idea on this mailing list has at least one theoretical use case.  That doesn't mean every single idea on this mailing list belongs in the standard.

How big is that audience?  If you are parsing that many ids/hashes for the performance to matter, what are you doing with them which doesn't require storing the strings?  Why do you need/want std::string as an intermediary at all?

Maybe you have your own string class. Maybe you are using the data for a while on the stack and then throwing it away. Maybe you have some other storage mechanism. There are many possibilites which I've seen used in production code. Snprintf is still very popular and with good reason.

Std::string is often useful but its wrong to impose it on everyone. The array_view interface puts no restrictions on where the target string bytes will end up and is optimally effecient.

 
 
Also, I don't believe the standard mandates the size for sso, so your performance can vary between implementations. 

That goes for most things in the standard library.  If the performance matters that much, why would you even have the dependency on the standard library?

Converting ints to strings is about as generic a programming tool one can have. These things should be in the standard library and they should be as efficent as possible.
 

After all, it isn't terribly hard to write a to_array() clone fills in a std::array (possibly internal to a custom class that also keeps track of the length) of the maximum number of digits the conversion can make.  I might even be weakly for such a function, as opposed to overloading the interface of to_string().

Thats exactly what the array_view overload does without creating more string types. If people would rather it had a name other than to_string im fine with that. 

--
 Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com(847) 691-1404

--

---
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-proposals/OHF7J9jU1gc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.


--
Sent from my phone, please pardon my spelling and grammar.

Nevin Liber

unread,
Oct 18, 2014, 2:49:13 AM10/18/14
to std-pr...@isocpp.org
On 17 October 2014 15:28, Matthew Fioravante <fmatth...@gmail.com> wrote:


If your domain of numbers are randomly generated ids or hashes the small string optimization does not help you.

So you are parsing large numbers from binary into a textual representation?  That seems like the opposite of what most parsing algorithms do, because manipulating binary numbers is far faster than manipulating text. 

If I'm in that domain, I can think of other optimizations I may want.  For instance, if I have a exact number of digits that I can left-fill with '0's when the number is occasionally smaller, my entire conversion loop can be unrolled w/o any pesky "if" checks in the code.

I can always come up with pathological cases that your interface is sub-optimal for.
 
One example to consider is andrei alexandrescu's optimization talk where he optimizes a simple int to string routine which was a major hot spot of his entire application. He uses char* and i doubt he would change it to return strings by value. 

I also doubt he would replace his code with the standard library one you propose, because it is unlikely you can even guarantee things like whether or not the call will or will not be inlined.
 
This is a tempting conclusion to make but its not always correct. I've had production code where optimizing a csv file parser

A CSV file parser with binary numbers in it??

Summarizing my earlier points thats highly dependent on many factors and in some scenarios it does not work. 

So will anything you come up with.

Jens Maurer

unread,
Oct 19, 2014, 2:31:20 PM10/19/14
to std-pr...@isocpp.org
On 10/17/2014 10:48 PM, Nevin Liber wrote:
> How big is that audience? If you are parsing that many ids/hashes
> for the performance to matter, what are you doing with them which
> doesn't require storing the strings? Why do you need/want
> std::string as an intermediary at all?

I agree extending the std::string-based interfaces in the standard
library is probably too special.

> ... If the
> performance matters that much, why would you even have the dependency
> on the standard library?

I disagree: I think the standard library should provide
non-allocating fast binary-to-string (and reverse)
conversions. What we currently have for binary-to-string
(even with direct streambuf access) is factors slower than
hand-written code. This is mostly due to over-generalization,
e.g. using locales. We should also offer different functions
for differences currently hidden in formatting flags, e.g.
fill (right align) vs. left align, since the choice of your
application is often a fixed one at compile time.

(See c++std-lib-ext-1088 and c++std-lib-ext-1128 for sample
code for integers and benchmarks.)

Jens

gmis...@gmail.com

unread,
Oct 19, 2014, 2:38:22 PM10/19/14
to std-pr...@isocpp.org
+1 

dgutson .

unread,
Oct 19, 2014, 2:45:31 PM10/19/14
to std-proposals
Absolutely agree.
BTW, locales is often useless in embedded systems, specially those
that have zero human interaction at all, and that's another issue to
address (it was very hard to get rid of so far in some
implementations).

>
> (See c++std-lib-ext-1088 and c++std-lib-ext-1128 for sample
> code for integers and benchmarks.)
>
> Jens
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
> To post to this group, send email to std-pr...@isocpp.org.
> Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.



--
Who’s got the sweetest disposition?
One guess, that’s who?
Who’d never, ever start an argument?
Who never shows a bit of temperament?
Who's never wrong but always right?
Who'd never dream of starting a fight?
Who get stuck with all the bad luck?

Thiago Macieira

unread,
Oct 19, 2014, 7:52:03 PM10/19/14
to std-pr...@isocpp.org
On Sunday 19 October 2014 15:45:30 dgutson . wrote:
> BTW, locales is often useless in embedded systems, specially those
> that have zero human interaction at all, and that's another issue to
> address (it was very hard to get rid of so far in some
> implementations).

I wouldn't say locales are useless, even in embedded systems. If they have a
screen, they should understand locales.

However, locales are useless in data interchange. The data should not depend
on locale and that's one of the most fatal mistakes in the standard C library,
that things like strtol and sprintf are locale-dependent. You simply cannot
write a localised application that can do proper interchange.

In other words, there's room for both: a locale-bearing API for displaying
information to the user and a non-locale API for data interchange. Though they
can also be one and the same (cf. strtol_l and sprintf_l).

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358

dgutson .

unread,
Oct 19, 2014, 10:15:18 PM10/19/14
to std-proposals
On Sun, Oct 19, 2014 at 8:51 PM, Thiago Macieira <thi...@macieira.org> wrote:
> On Sunday 19 October 2014 15:45:30 dgutson . wrote:
>> BTW, locales is often useless in embedded systems, specially those
>> that have zero human interaction at all, and that's another issue to
>> address (it was very hard to get rid of so far in some
>> implementations).
>
> I wouldn't say locales are useless, even in embedded systems. If they have a
> screen, they should understand locales.

That's greater than zero human interaction :)

>
> However, locales are useless in data interchange. The data should not depend
> on locale and that's one of the most fatal mistakes in the standard C library,
> that things like strtol and sprintf are locale-dependent. You simply cannot
> write a localised application that can do proper interchange.
>
> In other words, there's room for both: a locale-bearing API for displaying
> information to the user and a non-locale API for data interchange. Though they
> can also be one and the same (cf. strtol_l and sprintf_l).
>
> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
> Software Architect - Intel Open Source Technology Center
> PGP/GPG: 0x6EF45358; fingerprint:
> E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
>

Magnus Fromreide

unread,
Oct 20, 2014, 3:01:42 AM10/20/14
to std-pr...@isocpp.org
On Mon, Oct 20, 2014 at 07:51:46AM +0800, Thiago Macieira wrote:
> On Sunday 19 October 2014 15:45:30 dgutson . wrote:
> > BTW, locales is often useless in embedded systems, specially those
> > that have zero human interaction at all, and that's another issue to
> > address (it was very hard to get rid of so far in some
> > implementations).
>
> I wouldn't say locales are useless, even in embedded systems. If they have a
> screen, they should understand locales.
>
> However, locales are useless in data interchange. The data should not depend
> on locale and that's one of the most fatal mistakes in the standard C library,
> that things like strtol and sprintf are locale-dependent. You simply cannot
> write a localised application that can do proper interchange.
>
> In other words, there's room for both: a locale-bearing API for displaying
> information to the user and a non-locale API for data interchange. Though they
> can also be one and the same (cf. strtol_l and sprintf_l).

I disagree - if you allow the interfaces to be the same ones and just add a
locale parameter then you end up with iostreams and those do suffer from
always having to pay the localization cost, even when localization isn't
needed.

With that said the _l functions do have uses, but beeing quick isn't the
among them.

This is by the way what have gnawed on me through the whole to_string thread,
what do the gain over a stringstream?

The one useful addition could be a stringrefstream that you initialize with
a reference to a string variable that will be updated:

string s;
stringrefstream(s) << a_number;

appends a_number to the end of s and gives you the whole range of iostream
operations, not just numbers.

Please make new interfaces better than that if you want to use speed as an
argument for them.

/MF

Nevin Liber

unread,
Oct 20, 2014, 11:21:22 AM10/20/14
to std-pr...@isocpp.org
On 19 October 2014 13:31, Jens Maurer <Jens....@gmx.net> wrote:
We should also offer different functions
for differences currently hidden in formatting flags, e.g.
fill (right align) vs. left align, since the choice of your
application is often a fixed one at compile time.

Most formatting is fixed at compile time.  I just don't believe that breaking down each possible way to format a number into its own function is the right way to proceed; usablilty-wise, it doesn't scale.

The problem is, every single choice you offer at least doubles the number of functions you need.

1.  Locale vs. no-locale. x2
2.  Exceptions vs. return codes for error handling. x2
3.  No padding vs right align vs left align. x3
4.  String vs. array_view. x2

24 functions per number type (well, you could combine some, with a possible loss in performance, as long as the type promotion isn't losing information) so far, and I'm sure if I went through what snprintf and ostream offer, I could get a lot more.

And that is just converting one number.  With formatting, I typically have to do a lot more than that.

We can, of course, mitigate this with template policies or defer it to runtime with defaulted parameters, but those both have their own issues.

If someone wishes to write up and present a paper on this design space, I'm certainly not against it, but is seems almost as ambitious as revamping ostreams.

Sean Middleditch

unread,
Oct 20, 2014, 2:15:31 PM10/20/14
to std-pr...@isocpp.org
On Mon, Oct 20, 2014 at 8:20 AM, Nevin Liber <ne...@eviloverlord.com> wrote:
> On 19 October 2014 13:31, Jens Maurer <Jens....@gmx.net> wrote:
>>
>> We should also offer different functions
>> for differences currently hidden in formatting flags, e.g.
>> fill (right align) vs. left align, since the choice of your
>> application is often a fixed one at compile time.
>
>
> Most formatting is fixed at compile time. I just don't believe that
> breaking down each possible way to format a number into its own function is
> the right way to proceed; usablilty-wise, it doesn't scale.
>
> The problem is, every single choice you offer at least doubles the number of
> functions you need.

Not at all. Default arguments or a "conversion context" parameter
(again, defaulting to the most common behavior) removes the need for
umpteen function variants. The result type can't be handled without a
little duplication, but this is mostly a layering issue. e.g. the
std::string version can just be a facade over the array_view version.

This isn't free, of course, but it has other advantages.

For instance, see all the various "type-safe printf" libraries for C++
(a frequently-desired library feature). They need to call out to
someone to do the final formatting. Today that's usually snprintf;
with a conversion-context-based to_string, it could become that
instead. (std::stringstream is very rarely used due to its large
performance overhead compared to alternatives.)

I'm of the belief that the library can never be a complete tool for
every use case. I'd argue that it should aim for generalization (in a
way that keeps its simplicity for the common case, which a
context-based version would, since the context would be optional)
rather than minimalness or even maximal efficiency (though it should
aim for as much efficiency as it can while serving the other needs).

>
> 1. Locale vs. no-locale. x2
> 2. Exceptions vs. return codes for error handling. x2
> 3. No padding vs right align vs left align. x3

All three of these can be handled via a context object, with at worst
a little branching overhead in the implementation (which is cheaper
than either the exception or the locales themselves, at least).

> 4. String vs. array_view. x2

I'm thinking this is more of a +1 interface rather than a x2, since
you would be adding a facade over the existing functions.

I think it might just be four interfaces per type at worst. That could
even be reduced to a set of four templates for most related types like
all the builtin numeric types (one for signed, one for unsigned, one
for floating point).

void to_string(array_view, value, context);
void to_string(array_view, value); // default context
string to_string(context, value) // forward to array_view version
string to_string(value) // forward to array_view version

Nevin Liber

unread,
Oct 20, 2014, 2:26:56 PM10/20/14
to std-pr...@isocpp.org
On 20 October 2014 13:15, Sean Middleditch <se...@middleditch.us> wrote:
Not at all. Default arguments or a "conversion context" parameter
(again, defaulting to the most common behavior) removes the need for
umpteen function variants. The result type can't be handled without a
little duplication, but this is mostly a layering issue. e.g. the
std::string version can just be a facade over the array_view version.

This isn't free, of course, but it has other advantages.

In other words, not high performance.  It seems like the goal posts keep moving...

Like I said, if you wish to write a paper and come to a couple of meetings to try and push it through, go for it...

Jens Maurer

unread,
Oct 20, 2014, 2:32:17 PM10/20/14
to std-pr...@isocpp.org
On 10/20/2014 05:20 PM, Nevin Liber wrote:
> On 19 October 2014 13:31, Jens Maurer <Jens....@gmx.net <mailto:Jens....@gmx.net>> wrote:
>
> We should also offer different functions
> for differences currently hidden in formatting flags, e.g.
> fill (right align) vs. left align, since the choice of your
> application is often a fixed one at compile time.
>
>
> Most formatting is fixed at compile time. I just don't believe that breaking down each possible way to format a number into its own function is the right way to proceed; usablilty-wise, it doesn't scale.

Random thoughts in this direction:
- I agree this is a usability concern.
- This space should be explored afresh with C++14 expressive power.
- Whole-program optimizations can certainly propagate constant arguments
into function calls, making a seemingly-runtime "if" a compile-time choice.
With state modifications (such as in iostreams) or virtual functions,
this gets more difficult.

> The problem is, every single choice you offer at least /doubles/ the number of functions you need.
>
> 1. Locale vs. no-locale. x2

No locales. If you want locales, use iostreams.

> 2. Exceptions vs. return codes for error handling. x2

No exceptions. This is low-level bare-bones functionality.
If you want exceptions, use a wrapper template.

> 3. No padding vs right align vs left align. x3

Yes. (Left align with padding is probably uninteresting.)

> 4. String vs. array_view. x2

No, "char *". If you want strings/array_view/whatever on top, use
a wrapper template.

5. base (octal vs. hexadecimal vs. decimal vs. binary vs. arbitrary base)

We want to differentiate these at compile-time, because conversion to
hex has a different implementation strategy than conversion to decimal.
Plus, convert-to-hex right-justified, zero-filled is so fast the additional
"if" will certainly hurt you.

> 24 functions /per number type/

3*5 = 15

> and I'm sure if I went through what
> snprintf and ostream offer, I could get a lot more.

What else is there?

> And that is just converting /one/ number. With formatting, I
> typically have to do a lot more than that.

Are you somehow mixing the library implementation effort
(providing all the overloads) with the usage effort?
What would "a lot more than that" be?

> If someone wishes to write up and present a paper on this design
> space, I'm certainly not against it, but is seems almost as ambitious
> as revamping ostreams.

The trick is carving out a small enough subset that provides useful
building blocks, but remains manageable.

Jens

Nevin Liber

unread,
Oct 20, 2014, 2:48:07 PM10/20/14
to std-pr...@isocpp.org
On 20 October 2014 13:32, Jens Maurer <Jens....@gmx.net> wrote:
> 3.  No padding vs right align vs left align. x3

Yes.  (Left align with padding is probably uninteresting.)

I wish. :-(  Some third party protocols I deal with have this requirement.
 
> 4.  String vs. array_view. x2

No, "char *".  If you want strings/array_view/whatever on top, use
a wrapper template.

Why would I want an unsafe C type interface?
 
5. base (octal vs. hexadecimal vs. decimal vs. binary vs. arbitrary base)

We want to differentiate these at compile-time, because conversion to
hex has a different implementation strategy than conversion to decimal.
Plus, convert-to-hex right-justified, zero-filled is so fast the additional
"if" will certainly hurt you.

Then comes the choice of lower case vs. upper case.  More and more choices...
 
Are you somehow mixing the library implementation effort
(providing all the overloads) with the usage effort?
What would "a lot more than that" be?

In every discussion about this, people want more and more customization.  And we haven't even started discussing floating point formatting...
 
> If someone wishes to write up and present a paper on this design
> space, I'm certainly not against it, but is seems almost as ambitious
> as revamping ostreams.

The trick is carving out a small enough subset that provides useful
building blocks, but remains manageable.

And getting consensus on that subset.

Jens Maurer

unread,
Oct 20, 2014, 2:54:50 PM10/20/14
to std-pr...@isocpp.org
On 10/20/2014 08:47 PM, Nevin Liber wrote:
> On 20 October 2014 13:32, Jens Maurer <Jens....@gmx.net <mailto:Jens....@gmx.net>> wrote:

> > 4. String vs. array_view. x2
>
> No, "char *". If you want strings/array_view/whatever on top, use
> a wrapper template.
>
>
> Why would I want an unsafe C type interface?

Because it's the most fundamental interface choice that
can be adapted to everything else.

I would not expect anyone to actually use that interface
in application-level code, but use wrappers that have the
right safety / performance / flexibility trade-off for the
problem at hand. Currently, the standard library
does not provide ANY fast building blocks for formatting.

(array_view might work as well, and probably has the same
safety issues.)

Jens

Matthew Fioravante

unread,
Oct 20, 2014, 3:02:53 PM10/20/14
to std-pr...@isocpp.org

On Monday, October 20, 2014 2:54:50 PM UTC-4, Jens Maurer wrote:
On 10/20/2014 08:47 PM, Nevin Liber wrote:
> On 20 October 2014 13:32, Jens Maurer <Jens....@gmx.net <mailto:Jens....@gmx.net>> wrote:

>     > 4.  String vs. array_view. x2
>
>     No, "char *".  If you want strings/array_view/whatever on top, use
>     a wrapper template.
>
>
> Why would I want an unsafe C type interface?

Because it's the most fundamental interface choice that
can be adapted to everything else.
 
 
Unless you want strict C compatibility, (char*, size_t) is not necessary and only makes the interface more clumsy to use and easier to write buffer overflows. Instead you should use array_view because the pointer and length are stored together within the type. You can directly pass a C array or a std::array (or anything else thats contiguous) without having to do careful sizeof() calls. In short, I see absolutely no advantage to sticking with char*.
 
See this blog post for a motivating example as to why (char*, size_t) is bad:
 
Also the function needs to return the length of the string that was written to the array. You can either just return an int / size_t or return a string_view. The later offers additional flexibility in that you can immediately compose to_string with something else which operates on string data, such as cout.
 
string_view to_string(array_view<char> buf, int val /* other params*/);

 

Matthew Fioravante

unread,
Oct 20, 2014, 4:02:30 PM10/20/14
to std-pr...@isocpp.org

On Saturday, October 18, 2014 2:49:13 AM UTC-4, Nevin ":-)" Liber wrote:
 
A CSV file parser with binary numbers in it??
 
 
No, its just an example demonstrating that serialization is not always IO bound.
 

On Monday, October 20, 2014 11:21:22 AM UTC-4, Nevin ":-)" Liber wrote:
 
The problem is, every single choice you offer at least doubles the number of functions you need.

1. Locale vs. no-locale. x2
 
This idea is a separate proposal in and of itself. In order to have programmer specified locale parsing routines / IO one would first need to formalize how to specify locales. Using "string" names with runtime lookup would be too slow and also mandate runtime error handling. We would need a set of locale tags either via an enum, type traits, a set of global constexpr tags, or some other mechanism. That means the standard would need to specify a list of locales with which it supports, optionally allowing implementations to supply additional platform specific ones.
 
2. Exceptions vs. return codes for error handling. x2
 
In general yes, but for int to string we can sidestep this question. What kinds of errors can occur for int to string? The input is a binary integer / float and all 2^(sizeof(T)*CHAR_BIT) values have some valid string representation. The only errors I can think of are related to the destination buffer where we either run of out of space (truncation), or are unable to allocate string memory (throw / crash).
 
3. No padding vs right align vs left align. x3
 
A higher level interface overtop of the int to string primitive could be written to do column alignment.
 
4. String vs. array_view. x2
 
array_view should be the primitive, with string being a convenience overload.
 

If someone wishes to write up and present a paper on this design space, I'm certainly not against it, but is seems almost as ambitious as revamping ostreams.
 
The problem is that with all of the possible variations people could want we end up with a long discussion and no results. One could also argue that if you have such specialized needs then you may want to write such a routine yourself which builds ontop of the int to string primitive provided by the standard library.
 
At a minimum, what I'd just be proposing an array_view overload for improved efficiency. A small incremental improvement over trying to solve the huge problem of all possible int to string conversions. If the real desire is for a full featured cover all possible scenarios interface, then why was to_string() created in the first place?
 

On Monday, October 20, 2014 2:32:17 PM UTC-4, Jens Maurer wrote:

 - Whole-program optimizations can certainly propagate constant arguments
into function calls, making a seemingly-runtime "if" a compile-time choice.
With state modifications (such as in iostreams) or virtual functions,
this gets more difficult.
 
 
Compile and link times are already very long in C++ and enabling LTO makes them even longer. One approach is for the implementation to do the configuration branching inline, and then call helpers do the actual parsing work out of line. Then when the function is called with compile time configuration arguments the branching is optimized out, and when you really want to decide at runtime whether to pad left or pad right the branch is added to do the decision.
 

 

Jens Maurer

unread,
Oct 20, 2014, 4:10:05 PM10/20/14
to std-pr...@isocpp.org
On 10/20/2014 09:02 PM, Matthew Fioravante wrote:
>
> On Monday, October 20, 2014 2:54:50 PM UTC-4, Jens Maurer wrote:

> > Why would I want an unsafe C type interface?
>
> Because it's the most fundamental interface choice that
> can be adapted to everything else.
>
>
>
> Unless you want strict C compatibility, (char*, size_t) is not
> necessary and only makes the interface more clumsy to use and easier
> to write buffer overflows. Instead you should use array_view because
> the pointer and length are stored together within the type. You can
> directly pass a C array or a std::array (or anything else thats
> contiguous) without having to do careful sizeof() calls. In short, I
> see absolutely no advantage to sticking with char*.

I don't want a (char*, size_t) interface. In my experience, this leads
to inefficient code, because you're incrementing your "current pointer"
as well as decrementing the "remaining size". This takes up two
registers. A range-style (char*begin, char*end) interface avoids
that. (Look, it's not a C interface, it's a C++ iterator-based
interface.)

I've already said that array_view might be an interface option,
but I need to see the generated assembly to be convinced it's
zero-overhead compared to the (char*begin, char*end) interface.
One of the surprising aspects is that a "char *" can point to
anything, so any read/write through a "char *" pessimizes code
generation quite a bit, unless only local variables are involved
(see also 3.10p10).

Jens

Jens Maurer

unread,
Oct 20, 2014, 4:15:24 PM10/20/14
to std-pr...@isocpp.org
On 10/20/2014 10:02 PM, Matthew Fioravante wrote:
> If
> the real desire is for a full featured cover all possible scenarios
> interface, then why was to_string() created in the first place?

In my recollection, to_string() was created as a beginner-friendly
facility. Using stringstreams is embarrassingly heavy on the syntax
for such a trivial task.

(Sometimes, I think WG21 focuses a bit too much on beginner-friendly
instead of considering the whole picture and finding the beginner-
friendly shim around a more useful, well-performing core functionality
that gets you the extra mile.)

Jens

Matthew Fioravante

unread,
Oct 20, 2014, 4:32:54 PM10/20/14
to std-pr...@isocpp.org

On Monday, October 20, 2014 4:10:05 PM UTC-4, Jens Maurer wrote:

I don't want a (char*, size_t) interface.  In my experience, this leads
to inefficient code, because you're incrementing your "current pointer"
as well as decrementing the "remaining size".  This takes up two
registers.  A range-style  (char*begin, char*end)  interface avoids
that.  (Look, it's not a C interface, it's a C++ iterator-based
interface.)

I've already said that array_view might be an interface option,
but I need to see the generated assembly to be convinced it's
zero-overhead compared to the  (char*begin, char*end)  interface.
One of the surprising aspects is that a "char *" can point to
anything, so any read/write through a "char *" pessimizes code
generation quite a bit, unless only local variables are involved
(see also 3.10p10).

 
This sounds like QoI to me. An implemenation can easily convert (char*, size_t) into (char*, char*), an index based for loop, (char* restrict, char* restrict), or whatever else depending on whatever is fastest on that platform. array_view may be implemented using pointer pairs or pointer / length but it can be converted to whatever is most optimal.
 

gmis...@gmail.com

unread,
Oct 20, 2014, 4:37:12 PM10/20/14
to std-pr...@isocpp.org


On Tuesday, October 21, 2014 9:15:24 AM UTC+13, Jens Maurer wrote:
On 10/20/2014 10:02 PM, Matthew Fioravante wrote:
>  If
> the real desire is for a full featured cover all possible scenarios
> interface, then why was to_string() created in the first place?

In my recollection, to_string() was created as a beginner-friendly
facility.
 
 Using stringstreams is embarrassingly heavy on the syntax
for such a trivial task.

Exactly.


(Sometimes, I think WG21 focuses a bit too much on beginner-friendly
instead of considering the whole picture and finding the beginner-
friendly shim around a more useful, well-performing core functionality
that gets you the extra mile.)

Jens

We definitely need faster conversion routines.

I also think the exception only conversion functions were a mistake in my opinion if that's all we have.

Sean Middleditch

unread,
Oct 20, 2014, 4:47:07 PM10/20/14
to std-pr...@isocpp.org


On Oct 20, 2014 11:26 AM, "Nevin Liber" <ne...@eviloverlord.com> wrote:
>
> On 20 October 2014 13:15, Sean Middleditch <se...@middleditch.us> wrote:
>>
>> Not at all. Default arguments or a "conversion context" parameter
>> (again, defaulting to the most common behavior) removes the need for
>> umpteen function variants. The result type can't be handled without a
>> little duplication, but this is mostly a layering issue. e.g. the
>> std::string version can just be a facade over the array_view version.
>>
>> This isn't free, of course, but it has other advantages.
>
>
> In other words, not high performance.  It seems like the goal posts keep moving...

I don't know where you get that idea. "High performance" by itself is a vague term, though. I can say with absolute certainty that a context approach is more than sufficient for many soft-real-time needs; implemented well it's identical in functionality and overhead to sprintf (and is how some sorintf implementations work internally already), minus the format parsing overhead.

You could also just make to_string take a format string parameter I suppose, but that is... icky.

>
> Like I said, if you wish to write a paper and come to a couple of meetings to try and push it through, go for it...

Or discuss it with other interested parties on the mailing list meant for such discussion to help decide what such a paper should contain. :)

> --
>  Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com(847) 691-1404
>

Matthew Fioravante

unread,
Oct 20, 2014, 6:01:43 PM10/20/14
to std-pr...@isocpp.org

On Monday, October 20, 2014 4:47:07 PM UTC-4, Sean Middleditch wrote:


On Oct 20, 2014 11:26 AM, "Nevin Liber" <ne...@eviloverlord.com> wrote:
>
> On 20 October 2014 13:15, Sean Middleditch <se...@middleditch.us> wrote:
>>
>> Not at all. Default arguments or a "conversion context" parameter
>> (again, defaulting to the most common behavior) removes the need for
>> umpteen function variants. The result type can't be handled without a
>> little duplication, but this is mostly a layering issue. e.g. the
>> std::string version can just be a facade over the array_view version.
>>
>> This isn't free, of course, but it has other advantages.
>
>
> In other words, not high performance.  It seems like the goal posts keep moving...

I don't know where you get that idea. "High performance" by itself is a vague term, though. I can say with absolute certainty that a context approach is more than sufficient for many soft-real-time needs; implemented well it's identical in functionality and overhead to sprintf (and is how some sorintf implementations work internally already), minus the format parsing overhead.

If we're going to go with the fully functional approach, the context or configuration object approach is probably the best. The alternative is to have an explosion of different functions / overloads or an explosion of parameters. The first is not scalable and the second is very hard to use and possibly has very poor performance from passing so much crap on the stack. Both are hard to use.
 
As I mentioed earlier if you do the branching on the context parameters inline, it can be optimized out when the defaults are used or non-default compile time values are used.
 

You could also just make to_string take a format string parameter I suppose, but that is... icky.

I don't think there's much value in replicating snprintf(). It also makes it very hard to use this low level interface to build higher level parsing abstractions.

Thiago Macieira

unread,
Oct 20, 2014, 6:49:33 PM10/20/14
to std-pr...@isocpp.org
On Monday 20 October 2014 22:10:00 Jens Maurer wrote:
> I've already said that array_view might be an interface option,
> but I need to see the generated assembly to be convinced it's
> zero-overhead compared to the (char*begin, char*end) interface.

Wouldn't this be a QoI issue depending on how array_view is internally
represented?

Jens Maurer

unread,
Oct 21, 2014, 1:46:28 AM10/21/14
to std-pr...@isocpp.org
Partly, yes. I start getting concerned when the interface
requires heroic compiler effort to make a function efficient.
I've failed to get any restrict-based version to optimize
properly with gcc. But maybe I'm simply not understanding restrict.

A range-based interface also composes remarkably well:

Suppose you have conversion functions like this, returning the
new pointer past the consumed output range:

char * convert(char * begin, char * end, unsigned int value);
char * convert(char * begin, char * end, const std::string_view&);
char * convert_hex(char * begin, char * end, unsigned int value);

Then you can do

p = convert(p, end, 42);
p = convert(p, end, "some string");
p = convert_hex(p, end, 99);

etc.

This is as safe as iterator-based interfaces will get, and the
syntax complexity still looks manageable (yet not minimal, of
course).

What's the equivalent syntax using array_view?

Jens

Olaf van der Spek

unread,
Oct 21, 2014, 7:39:31 AM10/21/14
to std-pr...@isocpp.org
On Tuesday, October 21, 2014 7:46:28 AM UTC+2, Jens Maurer wrote:
Partly, yes.  I start getting concerned when the interface
requires heroic compiler effort to make a function efficient.
I've failed to get any restrict-based version to optimize
properly with gcc.  But maybe I'm simply not understanding restrict.

A range-based interface also composes remarkably well:

Suppose you have conversion functions like this, returning the
new pointer past the consumed output range:

  char * convert(char * begin, char * end, unsigned int value);
  char * convert(char * begin, char * end, const std::string_view&);
  char * convert_hex(char * begin, char * end, unsigned int value);

Then you can do

   p = convert(p, end, 42);
   p = convert(p, end, "some string");
   p = convert_hex(p, end, 99);

etc.

This is as safe as iterator-based interfaces will get, and the
syntax complexity still looks manageable (yet not minimal, of
course).

What's the equivalent syntax using array_view?

void convert(array_view&, T); 

convert(av, 42);
convert(av, "some string");
convert_hex(av, 99);

This probably requires two variants of convert, one taking array_view by value and one by reference.

Olaf van der Spek

unread,
Oct 21, 2014, 7:43:32 AM10/21/14
to std-pr...@isocpp.org
On Monday, October 20, 2014 10:10:05 PM UTC+2, Jens Maurer wrote:
I don't want a (char*, size_t) interface.  In my experience, this leads
to inefficient code, because you're incrementing your "current pointer"
as well as decrementing the "remaining size".  This takes up two
registers.  A range-style  (char*begin, char*end)  interface avoids
that.  (Look, it's not a C interface, it's a C++ iterator-based
interface.)

Two registers are required anyway but a pointer pair saves you from having to decrease the second one.
An even stronger argument against a (char*, size_t) overload is correctness. It causes tons of security bugs (off by one errors etc).

Matthew Fioravante

unread,
Oct 21, 2014, 10:51:02 AM10/21/14
to std-pr...@isocpp.org
I'm not sure taking the array_view param by reference is a good idea. It could lead to surprising results. Its also very clumsy for times when we won't want to modify the lvalue array_view with which we are passing in.
 
Admitttedly, using a pair of pointers and returning the tail does make it easier to cascade multiple calls together to parse a full string. To use my example would be harder:
 
string_view convert(array_view<char> buf, T val)

size_t len = 0;
array_view
<char> buf = /* something */;

len
+= convert(buf, 42).length();
len
+= convert({buf.begin() + len, buf.end()}, "something else").length();
len
+= convert({buf.begin() + len, buf.end()}, x).length();


Both Jen's example and my example do no checking on the size of the output buffer.
 
My example works better for common cases where you want to convert one thing and use the result right away as a string value type.
 
cout << convert(buf, 42) << endl;

The pointer pair version only works better for cascading because it returns the tail instead of returning something representing the actual string that was parsed (either a null terminated char* (bad idea imo, we should move away from null termination), or a string_view (good idea imo)).
 
char buf[/*some size*/];
char* p = convert(begin(buf), end(buf), 42);
string_view sv
= {buf, p - buf };
cout
<< sv;

//Very easy to do this by mistake
cout
<< convert(begin(buf), end(buf), 42);
 
Returning a pointer to the end of the buffer is much more clumsy for the most common case. If you want a tail pointer, it might be better to make the user compute it themselves or have an optional tail parameter (either an array_view or char*) with which the user can pass in if they need it.
 
string_view convert(array_view<char> buf, T val);
string_view convert
(array_view<char>& tail, array_view<char> buf, T val);


Even if a pointer pair version is adopted, an array_view convenience overload would be good to have to avoid error prone size computations. Since it accepts raw pointers only, begin() and end() may not be usable for all source array types and thus we are put back into the C world of buffer overflows with bugs in size computations.
 

Jens Maurer

unread,
Oct 21, 2014, 2:19:43 PM10/21/14
to std-pr...@isocpp.org
On 10/21/2014 04:51 PM, Matthew Fioravante wrote:
> string_view convert(array_view<char>buf,T val);
> string_view convert(array_view<char>&tail,array_view<char>buf,T val);
>
>
> Even if a pointer pair version is adopted, an array_view convenience overload would be good to have to avoid error prone size computations. Since it accepts raw pointers only, begin() and end() may not be usable for all source array types and thus we are put back into the C world of buffer overflows with bugs in size computations.

Absolutely. The low-level interface (pointer pair?) need not be the
only interface we offer to users. In fact, I envision quite a few
template wrappers that are a lot more convenient to use than the
pointer pair version.

Examples:

template<class T>
string_view convert(array_view<char> buf, T val);

template<class T>
std::string to_string(T val); // might not work due to existing to_string()


and possibly even "iostreams light":

template<class T>
bufstream& operator<<(bufstream&, T val);


For these kinds of convenience wrappers, we need a good idea how to
pass the multitude of formatting options so that we can have a
handful of wrapper templates per idiom, at most.

template<class T, class ... Opt>
string_view convert(array_view<char> buf, T val, Opt... options);

might work.

Jens
Reply all
Reply to author
Forward
0 new messages