Consider the to_string library of functions such as:
std::string to_string(int value);This interface while convenient is horribly inefficient. Every time you call this function you will allocate memory for a temporary string. If you plan to do this operation multiple times (which is common for use cases such as file parsing routines) you cannot avoid these memory allocations with move semantics or return value optimization. The performance implications of to_string make it unusable for high performance parsing routines.
I would propose 2 additional overloads.
string_view to_string(std::string& dest, int value);
string_view to_string(std::array_view<char> dest, int value);The first version allows you to still parse the data into a memory buffer backed by a std::string but allows you to reuse the same string object and save on memory allocations. The second version is available if you absolutely don't want to allocate memory at all and just store the results (possibly truncated) into a fixed size buffer. Since int to string conversions always have a maximum length on the resulting string, we can easily create a fixed size buffer on the stack. Not only does creating a buffer on the stack avoid memory allocation, the stack space itself is likely to be hot in the cache already whereas a random buffer returned by operator new will not be.
Compare the following 3 code fragments:
std::cout << to_string(i); //memory allocation
std::cout << to_string(j); //memory allocation
std::cout << to_string(k); //memory allocationstd::string s;
std::cout << to_string(s, i); //memory allocation
std::cout << to_string(s, j); //likely no memory allocation, reuse the buffer
std::cout << to_string(s, k); //likely no memory allocation, reuse the bufferchar s[64]; //Chose some size big enough for the maximal string representation for decltype(i), delctype(j), and decltype(k)
std::cout << to_string(s, i); //no memory allocation
std::cout << to_string(s, j); //no memory allocation
std::cout << to_string(s, k); //no memory allocationThe implementation of each of these can be built on top of one another and without optimization can be trivially implemented using snprintf
string_view to_string(array_view<char> s, int value) {
int len = snprintf(s.data(), s.length(), "%d", value);
return string_view(s.data(), len);
}
string_view to_string(string& s, int value) {
char buf[32];
auto sv = to_string(buf, value);
s = sv;
return sv;
}
string to_string(int value) {
string s;
to_string(s, value);
return s;
}
Consider the to_string library of functions such as:
std::string to_string(int value);This interface while convenient
is horribly inefficient.
Every time you call this function you will allocate memory for a temporary string.
If you plan to do this operation multiple times (which is common for use cases such as file parsing routines)
you cannot avoid these memory allocations with move semantics or return value optimization.
The performance implications of to_string make it unusable for high performance parsing routines.
On 17 October 2014 13:35, Matthew Fioravante <fmatth...@gmail.com> wrote:Consider the to_string library of functions such as:
std::string to_string(int value);This interface while convenient
Which is the point of this interface.
is horribly inefficient.
Really? I would expect that most numbers would fit within the small object optimization of std::string.
Of course, no one would mention something like "horribly inefficient" without measurements to back it up. Where are your benchmarks? Is your std::string implementation C++11/14 conforming?
Every time you call this function you will allocate memory for a temporary string.
Only for really large numbers.
If you plan to do this operation multiple times (which is common for use cases such as file parsing routines)
On what platform is file I/O not slow compared with 0-1 allocations?
you cannot avoid these memory allocations with move semantics or return value optimization.
You can avoid them most of the time with a decent std::string implementation.
The performance implications of to_string make it unusable for high performance parsing routines.
I really cannot wait to see your benchmarks.
--
---
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-proposals/OHF7J9jU1gc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.
Allocations are expensive. Especially in implementations that have debug heaps, or constrained environments that have to worry more about totsl sllocatipn size or fragmentation than latency, etc.
Really? I would expect that most numbers would fit within the small object optimization of std::string.If your domain of numbers are randomly generated ids or hashes the small string optimization does not help you.
Also, I don't believe the standard mandates the size for sso, so your performance can vary between implementations.
On 17 October 2014 15:28, Matthew Fioravante <fmatth...@gmail.com> wrote:
Really? I would expect that most numbers would fit within the small object optimization of std::string.If your domain of numbers are randomly generated ids or hashes the small string optimization does not help you.Yes, every single idea on this mailing list has at least one theoretical use case. That doesn't mean every single idea on this mailing list belongs in the standard.How big is that audience? If you are parsing that many ids/hashes for the performance to matter, what are you doing with them which doesn't require storing the strings? Why do you need/want std::string as an intermediary at all?
Also, I don't believe the standard mandates the size for sso, so your performance can vary between implementations.That goes for most things in the standard library. If the performance matters that much, why would you even have the dependency on the standard library?
After all, it isn't terribly hard to write a to_array() clone fills in a std::array (possibly internal to a custom class that also keeps track of the length) of the maximum number of digits the conversion can make. I might even be weakly for such a function, as opposed to overloading the interface of to_string().
--
---
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-proposals/OHF7J9jU1gc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.
If your domain of numbers are randomly generated ids or hashes the small string optimization does not help you.
One example to consider is andrei alexandrescu's optimization talk where he optimizes a simple int to string routine which was a major hot spot of his entire application. He uses char* and i doubt he would change it to return strings by value.
This is a tempting conclusion to make but its not always correct. I've had production code where optimizing a csv file parser
Summarizing my earlier points thats highly dependent on many factors and in some scenarios it does not work.
We should also offer different functions
for differences currently hidden in formatting flags, e.g.
fill (right align) vs. left align, since the choice of your
application is often a fixed one at compile time.
Not at all. Default arguments or a "conversion context" parameter
(again, defaulting to the most common behavior) removes the need for
umpteen function variants. The result type can't be handled without a
little duplication, but this is mostly a layering issue. e.g. the
std::string version can just be a facade over the array_view version.
This isn't free, of course, but it has other advantages.
> 3. No padding vs right align vs left align. x3
Yes. (Left align with padding is probably uninteresting.)
> 4. String vs. array_view. x2
No, "char *". If you want strings/array_view/whatever on top, use
a wrapper template.
5. base (octal vs. hexadecimal vs. decimal vs. binary vs. arbitrary base)
We want to differentiate these at compile-time, because conversion to
hex has a different implementation strategy than conversion to decimal.
Plus, convert-to-hex right-justified, zero-filled is so fast the additional
"if" will certainly hurt you.
Are you somehow mixing the library implementation effort
(providing all the overloads) with the usage effort?
What would "a lot more than that" be?
> If someone wishes to write up and present a paper on this design
> space, I'm certainly not against it, but is seems almost as ambitious
> as revamping ostreams.
The trick is carving out a small enough subset that provides useful
building blocks, but remains manageable.
On 10/20/2014 08:47 PM, Nevin Liber wrote:
> On 20 October 2014 13:32, Jens Maurer <Jens....@gmx.net <mailto:Jens....@gmx.net>> wrote:
> > 4. String vs. array_view. x2
>
> No, "char *". If you want strings/array_view/whatever on top, use
> a wrapper template.
>
>
> Why would I want an unsafe C type interface?
Because it's the most fundamental interface choice that
can be adapted to everything else.
string_view to_string(array_view<char> buf, int val /* other params*/);A CSV file parser with binary numbers in it??
The problem is, every single choice you offer at least doubles the number of functions you need.
1. Locale vs. no-locale. x2
2. Exceptions vs. return codes for error handling. x2
3. No padding vs right align vs left align. x3
4. String vs. array_view. x2
If someone wishes to write up and present a paper on this design space, I'm certainly not against it, but is seems almost as ambitious as revamping ostreams.
- Whole-program optimizations can certainly propagate constant arguments
into function calls, making a seemingly-runtime "if" a compile-time choice.
With state modifications (such as in iostreams) or virtual functions,
this gets more difficult.
I don't want a (char*, size_t) interface. In my experience, this leads
to inefficient code, because you're incrementing your "current pointer"
as well as decrementing the "remaining size". This takes up two
registers. A range-style (char*begin, char*end) interface avoids
that. (Look, it's not a C interface, it's a C++ iterator-based
interface.)
I've already said that array_view might be an interface option,
but I need to see the generated assembly to be convinced it's
zero-overhead compared to the (char*begin, char*end) interface.
One of the surprising aspects is that a "char *" can point to
anything, so any read/write through a "char *" pessimizes code
generation quite a bit, unless only local variables are involved
(see also 3.10p10).
On 10/20/2014 10:02 PM, Matthew Fioravante wrote:
> If
> the real desire is for a full featured cover all possible scenarios
> interface, then why was to_string() created in the first place?
In my recollection, to_string() was created as a beginner-friendly
facility.
Using stringstreams is embarrassingly heavy on the syntax
for such a trivial task.
(Sometimes, I think WG21 focuses a bit too much on beginner-friendly
instead of considering the whole picture and finding the beginner-
friendly shim around a more useful, well-performing core functionality
that gets you the extra mile.)
Jens
On Oct 20, 2014 11:26 AM, "Nevin Liber" <ne...@eviloverlord.com> wrote:
>
> On 20 October 2014 13:15, Sean Middleditch <se...@middleditch.us> wrote:
>>
>> Not at all. Default arguments or a "conversion context" parameter
>> (again, defaulting to the most common behavior) removes the need for
>> umpteen function variants. The result type can't be handled without a
>> little duplication, but this is mostly a layering issue. e.g. the
>> std::string version can just be a facade over the array_view version.
>>
>> This isn't free, of course, but it has other advantages.
>
>
> In other words, not high performance. It seems like the goal posts keep moving...
I don't know where you get that idea. "High performance" by itself is a vague term, though. I can say with absolute certainty that a context approach is more than sufficient for many soft-real-time needs; implemented well it's identical in functionality and overhead to sprintf (and is how some sorintf implementations work internally already), minus the format parsing overhead.
You could also just make to_string take a format string parameter I suppose, but that is... icky.
>
> Like I said, if you wish to write a paper and come to a couple of meetings to try and push it through, go for it...
Or discuss it with other interested parties on the mailing list meant for such discussion to help decide what such a paper should contain. :)
> --
> Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> (847) 691-1404
>
On Oct 20, 2014 11:26 AM, "Nevin Liber" <ne...@eviloverlord.com> wrote:
>
> On 20 October 2014 13:15, Sean Middleditch <se...@middleditch.us> wrote:
>>
>> Not at all. Default arguments or a "conversion context" parameter
>> (again, defaulting to the most common behavior) removes the need for
>> umpteen function variants. The result type can't be handled without a
>> little duplication, but this is mostly a layering issue. e.g. the
>> std::string version can just be a facade over the array_view version.
>>
>> This isn't free, of course, but it has other advantages.
>
>
> In other words, not high performance. It seems like the goal posts keep moving...I don't know where you get that idea. "High performance" by itself is a vague term, though. I can say with absolute certainty that a context approach is more than sufficient for many soft-real-time needs; implemented well it's identical in functionality and overhead to sprintf (and is how some sorintf implementations work internally already), minus the format parsing overhead.
You could also just make to_string take a format string parameter I suppose, but that is... icky.
Partly, yes. I start getting concerned when the interface
requires heroic compiler effort to make a function efficient.
I've failed to get any restrict-based version to optimize
properly with gcc. But maybe I'm simply not understanding restrict.
A range-based interface also composes remarkably well:
Suppose you have conversion functions like this, returning the
new pointer past the consumed output range:
char * convert(char * begin, char * end, unsigned int value);
char * convert(char * begin, char * end, const std::string_view&);
char * convert_hex(char * begin, char * end, unsigned int value);
Then you can do
p = convert(p, end, 42);
p = convert(p, end, "some string");
p = convert_hex(p, end, 99);
etc.
This is as safe as iterator-based interfaces will get, and the
syntax complexity still looks manageable (yet not minimal, of
course).
What's the equivalent syntax using array_view?
I don't want a (char*, size_t) interface. In my experience, this leads
to inefficient code, because you're incrementing your "current pointer"
as well as decrementing the "remaining size". This takes up two
registers. A range-style (char*begin, char*end) interface avoids
that. (Look, it's not a C interface, it's a C++ iterator-based
interface.)
string_view convert(array_view<char> buf, T val)size_t len = 0;
array_view<char> buf = /* something */;
len += convert(buf, 42).length();
len += convert({buf.begin() + len, buf.end()}, "something else").length();
len += convert({buf.begin() + len, buf.end()}, x).length();
cout << convert(buf, 42) << endl;char buf[/*some size*/];
char* p = convert(begin(buf), end(buf), 42);
string_view sv = {buf, p - buf };
cout << sv;
//Very easy to do this by mistake
cout << convert(begin(buf), end(buf), 42);
string_view convert(array_view<char> buf, T val);
string_view convert(array_view<char>& tail, array_view<char> buf, T val);