Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Slow std integer to a string conversion (to_string/sprintf)

176 views
Skip to first unread message

JiiPee

unread,
Nov 1, 2014, 7:02:49 PM11/1/14
to
I wonder how C++ standard library has not implemented a very fast
integer-to-string conversion function. They have sprintf, snprintf,
stringstream and to_string.

From this site:

http://zverovich.net/2013/09/07/integer-to-string-conversion-in-cplusplus.html

we can see that sprintf is the fastest one of those.
But then I googled and found from this site:

http://stackoverflow.com/questions/4351371/c-performance-challenge-integer-to-stdstring-conversion

a very fast version, done by somebody. I run tests (using a integers
from 0 to 1294967295 by step 60 and converted all of them to strings. I
also used the sixth item from the result: cU += str2[6]; inside the loop
and then printed it at the end so that the compiler would not optimize
things out).

Results: the user made conversion function from the site was about 130
times faster than sprintf (and thus would be about 200 times faster than
to_string ).

How is it possible there is no fast std version? This is sometimes an
important function, for example when parsing a very large text and
checking integers from it.

Not really complaining but want to start a discussion about this: why
std sometimes does not make fast functions/classes?

For me, I already added that new fast conversion function to my
toolset... I rather use it than library versions. btw if somebody wants
my fast function please ask, I can give it then. Also, if somebody knows
faster way to convert, am ready to test it against this! I have a test
platform ready.

Luca Risolia

unread,
Nov 1, 2014, 8:55:43 PM11/1/14
to
Il 01/11/2014 23:02, JiiPee ha scritto:

> For me, I already added that new fast conversion function to my
> toolset... I rather use it than library versions. btw if somebody wants
> my fast function please ask, I can give it then. Also, if somebody knows
> faster way to convert, am ready to test it against this! I have a test
> platform ready.

And how can we reproduce your tests? Please post the full, compilable
test program together with all the informations about the compiler,
compiler version and compilation flags that you used.

JiiPee

unread,
Nov 1, 2014, 9:19:38 PM11/1/14
to
I also tested intToStr that it converts correctly: first 20 million is
correct. Needs to test all integers, takes 2 hours.... gonna do sometimes.

compiler: GCC 4.8.1, use C++11, -O3, -s

Here you go:

#include <iostream>
#include <ctime>

inline void intToStr(unsigned val, char* c)
{
int size;
if(val>=10000)
{
if(val>=10000000)
{
if(val>=1000000000)
size=10;
else if(val>=100000000)
size=9;
else
size=8;
}
else
{
if(val>=1000000)
size=7;
else if(val>=100000)
size=6;
else
size=5;
}
}
else
{
if(val>=100)
{
if(val>=1000)
size=4;
else
size=3;
}
else
{
if(val>=10)
size=2;
else if (val==0) {
c[0]='0';
c[1] = '\0';
return;
}
else
size=1;
}
}

c += size-1;
while(val>=100)
{
int pos = val % 100;
val /= 100;
*(short*)(c-1)=*(short*)(digit_pairs+2*pos);
c-=2;
}
while(val>0)
{
*c--='0' + (val % 10);
val /= 10;
}
c[size+1] = '\0';
}

// speed test function:

// intToStr is 132 faster than snprintf
// loop 1294967295 step 60 (i+=60): intToStr 0.41 s, snprintf 54.4 s
(x132), sprintf 54.2 (x131)
// (max uint=4294967295)
void benchmark() {

clock_t begin = clock();

char str2[16];
char cU;
for(unsigned int i = 0; i < 1294967295; i+=60) {

// HERE COMMENT OUT THE FUNCTION YOU WANT TO TEST:
//intToStr(i, str2);
//snprintf (str2, 15, "%d",i);
sprintf(str2,"%d",i);
//itoa (i,str2,10);

cU += str2[6]; // use the string so that the compiler will not
optimize things out
}

clock_t end = clock();
double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;

std::cout<<"Time: "<<elapsed_secs<<cU<<std::endl;
}

JiiPee

unread,
Nov 1, 2014, 9:23:40 PM11/1/14
to
On 02/11/2014 01:19, JiiPee wrote:
> // HERE COMMENT OUT THE FUNCTION YOU WANT TO TEST:

obviously below that only one conversion function should be without
comments... others should be in comments: like if testing snprintf then:

//intToStr(i, str2);
snprintf (str2, 15, "%d",i);
//sprintf(str2,"%d",i);
//itoa (i,str2,10);

Jorgen Grahn

unread,
Nov 1, 2014, 10:10:37 PM11/1/14
to
On Sat, 2014-11-01, JiiPee wrote:
> I wonder how C++ standard library has not implemented a very fast
> integer-to-string conversion function. They have sprintf, snprintf,
> stringstream and to_string.
>
> From this site:
>
> http://zverovich.net/2013/09/07/integer-to-string-conversion-in-cplusplus.html
>
> we can see that sprintf is the fastest one of those.
> But then I googled and found from this site:
>
> http://stackoverflow.com/questions/4351371/c-performance-challenge-integer-to-stdstring-conversion
>
> a very fast version, done by somebody. I run tests (using a integers
> from 0 to 1294967295 by step 60 and converted all of them to strings. I
> also used the sixth item from the result: cU += str2[6]; inside the loop
> and then printed it at the end so that the compiler would not optimize
> things out).

That's not a bad setup, but it's even harder than that to make
benchmarks. E.g. how does this implementation affect data and code
caches? Unlike in real code, the caches are probably warm when you
enter the function you're benchmarking.

> Results: the user made conversion function from the site was about 130
> times faster than sprintf (and thus would be about 200 times faster than
> to_string ).
>
> How is it possible there is no fast std version? This is sometimes an
> important function, for example when parsing a very large text and
> checking integers from it.
>
> Not really complaining but want to start a discussion about this: why
> std sometimes does not make fast functions/classes?

The C++ standard has not so much to do with it; it's a problem with
whatever implementation of the standard library you use.

But /that/ is an interesting question: are we losing performance due
to suboptimal standard library implemntations, and is it feasible to
do something about it?

I know, after earlier discussions here, that std::ostream
implementations tend to be slower than C stdio. And I don't see that
they /have/ to be.

> For me, I already added that new fast conversion function to my
> toolset... I rather use it than library versions. btw if somebody wants
> my fast function please ask, I can give it then. Also, if somebody knows
> faster way to convert, am ready to test it against this! I have a test
> platform ready.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Paavo Helde

unread,
Nov 2, 2014, 3:08:30 AM11/2/14
to
JiiPee <n...@notvalid.com> wrote in news:yEf5w.641279$O%1.13...@fx31.am4:

> On 02/11/2014 00:55, Luca Risolia wrote:
>> Il 01/11/2014 23:02, JiiPee ha scritto:
>>
>>> For me, I already added that new fast conversion function to my
>>> toolset... I rather use it than library versions. btw if somebody
>>> wants my fast function please ask, I can give it then. Also, if
>>> somebody knows faster way to convert, am ready to test it against
>>> this! I have a test platform ready.
>>
>> And how can we reproduce your tests? Please post the full, compilable
>> test program together with all the informations about the compiler,
>> compiler version and compilation flags that you used.
>>
>
> I also tested intToStr that it converts correctly: first 20 million is
> correct. Needs to test all integers, takes 2 hours.... gonna do
> sometimes.
>
> compiler: GCC 4.8.1, use C++11, -O3, -s


It appears that the crux of the fast algorithm is to have some static
lookup buffer (digit_pairs) which encodes the ASCII representation of
numbers 0..99. (You forgot to include that, copied it from the
stackoverflow page). So, it trades some memory for speed. So far, so
good.

For fast operation it is essential the digit_pairs array is in the cache.
In your test, this is most probably the case, but in real code the int-
to-string conversions probably happen much less often, so the performance
would not be so great in the real code. Standard library implementations
probably operate only with data in registers so the cache access is not
an issue.

Another point is that in C++ you want to wrap the result in a std::string
or something, and that will also slow things down. In my test runs
wrapping time was in the same range than the conversion itself and if
std::string happens to not use SSO then I guess it may be even much
slower.

So, it is not really clear how much would the win be in real programs.
Anyway, here are my test results for running your test program (MSVC2012,
Release x64 mode):

Using _snprintf, no wrapping: 4.622 s
Using intToStr, no wrapping: 1.143 s (4.0 times faster)

Using _snprintf, wrapping to a std::string: 6.622 s
Using boost::lexical_cast<std::string>: 8.75 s
Using intToStr, wrapping to a std::string: 2.996 s (2.2 times faster than
_snprintf and 2.9 times faster than boost::lexical_cast).

So, here the differences are around 2-3 times and probably smaller in the
real code when the lookup array is out of the cache. Not sure if this
justifies optimizing something which is already quite fast anyway.

OTOH, if there are indeed differences like 130 times on some platform,
then I guess the standard library implementation must be quite bad
indeed.

Cheers
Paavo

Christian Gollwitzer

unread,
Nov 2, 2014, 3:36:04 AM11/2/14
to
Am 02.11.14 03:10, schrieb Jorgen Grahn:
> I know, after earlier discussions here, that std::ostream
> implementations tend to be slower than C stdio. And I don't see that
> they /have/ to be.
>

For sprintf vs. intToStr (or itoa), I'm also not surprised - sprintf has
to interpret the format string at runtime, unless the compiler has a
special optimization to inline sprintf.

Christian

JiiPee

unread,
Nov 2, 2014, 5:09:58 AM11/2/14
to
hmmm, interesting. How is it possible you have only 3 times faster and I
have here constantly 130 times faster? Any idea? Did you try also with GCC?

Jorgen Grahn

unread,
Nov 2, 2014, 7:06:22 AM11/2/14
to
On Sun, 2014-11-02, Christian Gollwitzer wrote:
> Am 02.11.14 03:10, schrieb Jorgen Grahn:
>> I know, after earlier discussions here, that std::ostream
>> implementations tend to be slower than C stdio. And I don't see that
>> they /have/ to be.
>>
>
> For sprintf vs. intToStr (or itoa), I'm also not surprised - sprintf has
> to interpret the format string at runtime,

Oh, I'm not surprised by /that/ -- I was only talking about iostreams
above.

> unless the compiler has a
> special optimization to inline sprintf.

I was going to say "sprintf() seems unlikely to be one of those
functions", but gcc apparently has it as a builtin. No idea why
though -- the documentation doesn't say.

Jouko Koski

unread,
Nov 2, 2014, 7:21:07 AM11/2/14
to
"JiiPee" wrote:

> I wonder how C++ standard library has not implemented a very fast
> integer-to-string conversion function. They have sprintf, snprintf,
> stringstream and to_string.

Considering what these functions do, they may perform pretty well, actually.
It is so easy to implement a function which simple, fast, and produces
incorrect results. It is so obvious that 1234 should yield "1234" but what
if the result, say, "१२३४" would be preferred?

--
Jouko

Paavo Helde

unread,
Nov 2, 2014, 8:11:52 AM11/2/14
to
JiiPee <n...@notvalid.com> wrote in news:Kpn5w.543663$J62....@fx03.am4:

> hmmm, interesting. How is it possible you have only 3 times faster and
> I have here constantly 130 times faster? Any idea? Did you try also
> with GCC?
>

Gcc/Linux results:

snprintf: 2.85 s
intToStr: 0.4 s (7.1 times faster)

snprintf + std::string wrap: 4.37 s
boost::lexical_cast to std::string: 2.44 s
intToStr + std::string wrap: 1.67 s

Command-line options: g++ -O3 -Wall

> g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/4.6/lto-wrapper
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --
mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --
enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-
checking=release --with-gxx-include-dir=/usr/include/c++/4.6 --enable-ssp
--disable-libssp --disable-plugin --with-bugurl=http://bugs.opensuse.org/
--with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --
with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-
libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-
specific-runtime-libs --program-suffix=-4.6 --enable-linux-futex --
without-system-libunwind --with-arch-32=i586 --with-tune=generic --
build=x86_64-suse-linux
Thread model: posix
gcc version 4.6.2 (SUSE Linux)

JiiPee

unread,
Nov 2, 2014, 9:22:57 AM11/2/14
to
and needs this one as well:

const char digit_pairs[201] = {
"00010203040506070809"
"10111213141516171819"
"20212223242526272829"
"30313233343536373839"
"40414243444546474849"
"50515253545556575859"
"60616263646566676869"
"70717273747576777879"
"80818283848586878889"
"90919293949596979899"
};

On 02/11/2014 01:19, JiiPee wrote:

JiiPee

unread,
Nov 2, 2014, 9:25:53 AM11/2/14
to
you have the same intToStr time as I have, but I have like 40 secs for
snprintf. But I have 32 bit computer, so maybe thats the reason.

I ran it again, it took more than 50 secs with snprintf. Dont know....

Ben Bacarisse

unread,
Nov 2, 2014, 9:48:36 AM11/2/14
to
Paavo Helde <myfir...@osa.pri.ee> writes:
<snip>
>>> Il 01/11/2014 23:02, JiiPee ha scritto:
>>>
>>>> For me, I already added that new fast conversion function to my
>>>> toolset... I rather use it than library versions. btw if somebody
>>>> wants my fast function please ask, I can give it then. Also, if
>>>> somebody knows faster way to convert, am ready to test it against
>>>> this! I have a test platform ready.
<snip>
> It appears that the crux of the fast algorithm is to have some static
> lookup buffer (digit_pairs) which encodes the ASCII representation of
> numbers 0..99.

Worth pointing out that the code that copies the pairs is not portable
for a number of reasons and, in those cases where it works, the
performance may depend on the buffer alignment. I changed the code to
copy the two bytes portably, and saw no significant penalty, at least in
optimised code.

Dragons:
>> int pos = val % 100;
>> val /= 100;
>> *(short*)(c-1)=*(short*)(digit_pairs+2*pos);
>> c-=2;

<snip>
--
Ben.

Öö Tiib

unread,
Nov 2, 2014, 10:03:26 AM11/2/14
to
On Sunday, 2 November 2014 16:25:53 UTC+2, JiiPee wrote:
> you have the same intToStr time as I have, but I have like 40 secs for
> snprintf. But I have 32 bit computer, so maybe thats the reason.

No. You likely did something wrong and possibly already discovered what.
Or why else you avoid posting command lines, compiler versions and
system confs? Test carefully. Otherwise you just spread groundless FUD.

Öö Tiib

unread,
Nov 2, 2014, 10:20:29 AM11/2/14
to
On Sunday, 2 November 2014 01:02:49 UTC+2, JiiPee wrote:
> I wonder how C++ standard library has not implemented a very fast
> integer-to-string conversion function. They have sprintf, snprintf,
> stringstream and to_string.

In my tests all of the conversions ('snprintf', 'boost::lexical_cast',
'std::to_string' etc.) are already blindingly fast. 'boost::format' for
example compares most slow but is still tolerable since nothing needs
to turn hundreds of millions of 'int's into readable sub second.

If the string conversions affect performance then I can bet that the
whole algorithm can be made faster by few orders of magnitude since
something is wrong with the whole nature of it. What you do with
*gigabytes* of *human-readable* ints on your 32 bit computer?

JiiPee

unread,
Nov 2, 2014, 11:03:15 AM11/2/14
to
Can you show me exatcly what you changed so I can test that version here?

Paavo Helde

unread,
Nov 2, 2014, 11:21:31 AM11/2/14
to
JiiPee <n...@notvalid.com> wrote in news:WAs5w.636289$3F6....@fx02.am4:
Probably something like
// *(short*)(c-1)=*(short*)(digit_pairs+2*pos);

*(c-1) = digit_pairs[2*pos];
*c = digit_pairs[2*pos+1];

JiiPee

unread,
Nov 2, 2014, 11:34:33 AM11/2/14
to
On 02/11/2014 16:21, Paavo Helde wrote:
> *(c-1) = digit_pairs[2*pos];
> *c = digit_pairs[2*pos+1];

that does not make intToStr any slower

JiiPee

unread,
Nov 2, 2014, 11:41:20 AM11/2/14
to
If I want to convert 1234 to a string, why would I want it to be "१२३४"?
:) I guess you are talking about other languages.. but even then, we
could limit this proglem to europian 123344 representation.

Paavo Helde

unread,
Nov 2, 2014, 12:09:31 PM11/2/14
to
JiiPee <n...@notvalid.com> wrote in news:i2t5w.798795$ZX5....@fx32.am4:
Yes, that's the point. There was no need to mess around with short*
casting.

Cheers
Paavo


Christian Gollwitzer

unread,
Nov 2, 2014, 12:22:03 PM11/2/14
to
Am 02.11.14 13:06, schrieb Jorgen Grahn:
GCC can issue warnings when the format string and the actual parameters
mismatch. I don't know if this just onlyissues the warning or gnerates
static calls (which it could). Too lazy to check the assembly right now.

Christian

Jouko Koski

unread,
Nov 2, 2014, 12:48:14 PM11/2/14
to
"JiiPee" wrote:
>
> If I want to convert 1234 to a string, why would I want it to be "१२३४"?
> :) I guess you are talking about other languages.. but even then, we could
> limit this proglem to europian 123344 representation.

Yes, I mean locales, or even different character encodings used by terminal
devices. Standard library functions generally take these into account while
most "fast" functions do not.

Limiting this to Western number representation sounds as myopic as an
insular Yankee insisting that everybody should be fine with 7-bit ascii
encoding.

--
Jouko

Ben Bacarisse

unread,
Nov 2, 2014, 2:24:26 PM11/2/14
to
Yes, something like or maybe even exactly like that!

--
Ben.

JiiPee

unread,
Nov 2, 2014, 4:52:12 PM11/2/14
to
can put both in code: when normal characters, then use the fast one,
otherwise the std one.

Chris Vine

unread,
Nov 2, 2014, 7:36:22 PM11/2/14
to
And it is also correct. Yours gave rise to undefined behaviour (and I
am surprised your compiler did not warn about it). Did you have -Wall
set?

Chris

JiiPee

unread,
Nov 2, 2014, 7:48:33 PM11/2/14
to
Am lazy thinking... can you inform me what is the undefined behaviour here?
And yes I used Wall, and did not get warning on that line.

Chris Vine

unread,
Nov 2, 2014, 8:30:23 PM11/2/14
to
On Mon, 03 Nov 2014 00:48:19 +0000
JiiPee <n...@notvalid.com> wrote:

> On 03/11/2014 00:36, Chris Vine wrote:
> > On Sun, 02 Nov 2014 16:34:21 +0000
> > JiiPee <n...@notvalid.com> wrote:
> >> On 02/11/2014 16:21, Paavo Helde wrote:
> >>> *(c-1) = digit_pairs[2*pos];
> >>> *c = digit_pairs[2*pos+1];
> >> that does not make intToStr any slower
> > And it is also correct. Yours gave rise to undefined behaviour
> > (and I am surprised your compiler did not warn about it). Did you
> > have -Wall set?
> >
> > Chris
>
> Am lazy thinking... can you inform me what is the undefined behaviour
> here? And yes I used Wall, and did not get warning on that line.

Your version gave undefined behaviour because you cannot alias an object
whose dynamic type is a char by dereferencing a pointer to short, in
any version of C or C++. (You can however do the reverse.) So first,
the compiler can simply elide code doing this, and gcc-4.8 and 4.9 is
reputed to do that when provoked with a sufficiently high optimisation
level. Secondly, you may find that object alignments do not match (but
you should be fine on x86).

Chris

JiiPee

unread,
Nov 2, 2014, 8:40:22 PM11/2/14
to
yes, I start understanding now... just checked the code and I see now
what is happening there. Ye, maybe it needs to be changed here.... but
otherwise the code seems to be ok.

Good observation.. i ll have to change the code.

Also I can see that we can make the code even faster by doubling the
lookup table say to see 0-9999 numbers, which would take up memory 40000
char s. That could be a compiling flag maybe to get super fast code with
bigger exe file.

Paavo Helde

unread,
Nov 3, 2014, 1:04:53 AM11/3/14
to
JiiPee <n...@notvalid.com> wrote in news:Z1B5w.743535$Jh7.7...@fx27.am4:
> Also I can see that we can make the code even faster by doubling the
> lookup table say to see 0-9999 numbers, which would take up memory 40000
> char s. That could be a compiling flag maybe to get super fast code with
> bigger exe file.

Don't be so sure it will be faster, memory access is often the bottleneck
nowadays. It might even happen that the whole program slows down because
IntToStr() is pushing some useful data out of the caches.

To be honest, all this thread seems like an ultimate exercise in premature
optimization. I am pretty sure there will not be many programs where int-
to-string conversion would show up in the profiler. Trading 40000 bytes of
process size to (potentially) speed it up seems dubious, I know there are
people who are whining already about 100 bytes.

Cheers
Paavo

Paavo Helde

unread,
Nov 3, 2014, 1:57:46 AM11/3/14
to
JiiPee <n...@notvalid.com> wrote in news:G9r5w.611195$_u1....@fx30.am4:

> you have the same intToStr time as I have, but I have like 40 secs
> for snprintf. But I have 32 bit computer, so maybe thats the reason.
>
> I ran it again, it took more than 50 secs with snprintf. Dont know....

One possibility for 10-time slowdown might be that you have some weird libc
debugging options set up, or some weird locale settings. Check your
environment variables. And of course you can try to profile the program
with e.g. valgrind and see where it spends time.

Cheers
Paavo

JiiPee

unread,
Nov 3, 2014, 2:15:23 AM11/3/14
to
I know, but thats why I said it should be under a compile flag, so only
in extreme situations where a big loop is used and there is nothing else
running in the program.


JiiPee

unread,
Nov 3, 2014, 2:29:06 AM11/3/14
to
its a release build, only O3 and s flags on.
But I guess I need to test it next on Visual Studio to see if i get the
same there...

Öö Tiib

unread,
Nov 3, 2014, 2:48:23 AM11/3/14
to
On Monday, 3 November 2014 09:15:23 UTC+2, JiiPee wrote:
> On 03/11/2014 06:04, Paavo Helde wrote:
> > JiiPee <n...@notvalid.com> wrote in news:Z1B5w.743535$Jh7.7...@fx27.am4:
> >> Also I can see that we can make the code even faster by doubling the
> >> lookup table say to see 0-9999 numbers, which would take up memory
> >> 40000 char s. That could be a compiling flag maybe to get super fast
> >> code with bigger exe file.
> > Don't be so sure it will be faster, memory access is often the
> > bottleneck nowadays. It might even happen that the whole program
> > slows down because IntToStr() is pushing some useful data out of
> > the caches.
> >
> > To be honest, all this thread seems like an ultimate exercise in
> > premature optimization. I am pretty sure there will not be many
> > programs where int-to-string conversion would show up in the
> > profiler. Trading 40000 bytes of process size to (potentially)
> > speed it up seems dubious, I know there are people who are whining
> > already about 100 bytes.
>
> I know, but thats why I said it should be under a compile flag, so
> only in extreme situations where a big loop is used and there is
> nothing else running in the program.

Can you picture such real "extreme" situation? I can. Every time
when such text processing shows up in profiler then there is some
sort of needless text communication ... for example JSON layer
between two intensely communicating modules. Even there the
bottle-neck is not usually serialization of ints but dynamic memory management of the little texts. Removing the whole pointless
monkey business text exchange layer itself will always achieve
far better performance gains than optimizing the text composers
and parsers. Using up most of the cache however (40000 is most of
64k L1) will certainly slow it down even more.

Paavo Helde

unread,
Nov 3, 2014, 3:16:17 AM11/3/14
to
JiiPee <n...@notvalid.com> wrote in news:4YF5w.907401$an2.6...@fx09.am4:

> On 03/11/2014 06:04, Paavo Helde wrote:
>> To be honest, all this thread seems like an ultimate exercise in
>> premature optimization. I am pretty sure there will not be many
>> programs where int-
>> to-string conversion would show up in the profiler. Trading 40000
>> bytes of process size to (potentially) speed it up seems dubious, I
>> know there are people who are whining already about 100 bytes.
>
> I know, but thats why I said it should be under a compile flag, so
> only in extreme situations where a big loop is used and there is
> nothing else running in the program.

This has not much to do with the compiler, sprintf is a C standard
library function. The C standard library is meant as a general-purpose
library making a decent choice of different tradeoffs (like speed vs
memory usage vs development and maintenance efforts, etc) so that it is
good enough for most usage cases. As a result of tradeoffs it cannot be
the best choice in all situations.

In particular, sprintf() is defined as a locale-dependent function, with
all the legacy and drawbacks associated with it. It is also defined as a
function driven by a separate minilanguage and doing the runtime
interpretation of that. These two facts already basically exclude the
possibility that it could be the fastest tool around, so if some program
requires maximum performance in this area, it cannot use sprintf anyway,
regardless of whether it can be finetuned by some compiler switches or
not.

In special corner cases like your hypothetical program doing only
zillions of int-string conversions one may need to use special custom
functions or libraries. You know, there are lots of special purpose
libraries in C and C++. This special IntToStr() function looks like a
good candidate for such a library.

All this does not mean that the standard C library version of sprintf()
could not be made better. With glibc everybody can actually do that in
principle, it's an open-source collaborative project. The toughest part
would probably be reaching an agreement about what is "better".

Cheers
Paavo

Luca Risolia

unread,
Nov 3, 2014, 11:19:27 AM11/3/14
to
JiiPee wrote:

> On 02/11/2014 00:55, Luca Risolia wrote:
>> Il 01/11/2014 23:02, JiiPee ha scritto:
>>
>>> For me, I already added that new fast conversion function to my
>>> toolset... I rather use it than library versions. btw if somebody wants
>>> my fast function please ask, I can give it then. Also, if somebody knows
>>> faster way to convert, am ready to test it against this! I have a test
>>> platform ready.
>>
>> And how can we reproduce your tests? Please post the full, compilable
>> test program together with all the informations about the compiler,
>> compiler version and compilation flags that you used.
>>
>
> I also tested intToStr that it converts correctly: first 20 million is
> correct. Needs to test all integers, takes 2 hours.... gonna do sometimes.
>
> compiler: GCC 4.8.1, use C++11, -O3, -s
>
> Here you go:
>
> #include <iostream>
> #include <ctime>
>
> inline void intToStr(unsigned val, char* c)


This should be ~4x faster, just for fun...

#include <cstdint>
#include <limits>

template <class T>
constexpr T pow(T const& x, std::size_t n) {
return n > 0 ? x * pow(x, n - 1) : 1;
}

constexpr auto dgts = std::numeric_limits<unsigned int>::digits10;
constexpr auto max = pow(10u, dgts);

template<int> struct S;

template<>
struct S<0> {
template<int>
static inline void f(unsigned int, char*) noexcept {}
};

template<int P>
struct S {
template<int Q>
static inline void f(unsigned int i, char* c) noexcept {
constexpr auto a = pow(10u, Q + 1 - P), b = a / 10;
S < P - 1 > ::template f<Q>(i, c);
c[P - 1] = '0' + (i % a) / b;
}
};

template<int> void g(unsigned int, char*) noexcept;

template<> inline void g<dgts + 1>(unsigned int i, char* c) noexcept {
if (i < max) {
S<dgts>::f<dgts>(i, c);
c[dgts] = '\0';
} else {
c[0] = '0' + (i / max);
S<dgts >::f < dgts > (i, c + 1);
c[dgts + 1] = '\0';
}
}

template<int X> inline void g(unsigned int i, char* c) noexcept {
constexpr unsigned int Y = pow(10u, X);
if (i < Y) {
S<X>::template f< X > (i, c);
c[X] = '\0';
} else
g < X + 1 > (i, c);
}

inline void uint_to_str(unsigned int i, char* c) noexcept {
g<1>(i, c);
}

int main() {
// example
char str[16];
uint_to_str(100, str);
}


Martijn Lievaart

unread,
Nov 3, 2014, 11:50:15 AM11/3/14
to
On Mon, 03 Nov 2014 17:19:07 +0100, Luca Risolia wrote:

>
> template<int X> inline void g(unsigned int i, char* c) noexcept {
> constexpr unsigned int Y = pow(10u, X);
> if (i < Y) {
> S<X>::template f< X > (i, c);
> c[X] = '\0';
> } else
> g < X + 1 > (i, c);
> }

What if i is slightly smaller than std::numeric_limits<unsigned>::max()?
In that case Y would eventually not fit an unsigned, resulting in an
infinite loop.

M4

Luca Risolia

unread,
Nov 3, 2014, 7:27:32 PM11/3/14
to
Il 03/11/2014 17:47, Martijn Lievaart ha scritto:
>> template<int X> inline void g(unsigned int i, char* c) noexcept {
>> constexpr unsigned int Y = pow(10u, X);
>> if (i < Y) {
>> S<X>::template f< X > (i, c);
>> c[X] = '\0';
>> } else
>> g < X + 1 > (i, c);
>> }
>
> What if i is slightly smaller than std::numeric_limits<unsigned>::max()?

> In that case Y would eventually not fit an unsigned, resulting in an
> infinite loop.

I think that case is not possible. X can vary from 1 to
std::numeric_limits<unsigned>::digits10 (included), which implies that
the greatest Y is Y0 = 10^std::numeric_limits<unsigned>::digits10, where
std::numeric_limits<unsigned>::digits10 is one less than the number of
decimal digits needed to represent the maximum unsigned value
std::numeric_limits<unsigned>::max(). For i >= Y0, there is the
g<std::numeric_limits<unsigned>::digits10 + 1>() template
specialization, but no Y is calculated there.

Martijn Lievaart

unread,
Nov 4, 2014, 3:00:11 AM11/4/14
to
Buzz. Think again. I see at least two errors in this explanation.

M4

Luca Risolia

unread,
Nov 4, 2014, 4:44:46 AM11/4/14
to
Please be clear, define a way to reproduce the case where the algorithm
would "eventually" result in an infinite loop in your opinion.
Unfortunately, I cannot spend much time on this topic which was initially
supposed to take me no more than 5 minutes.

Martijn Lievaart

unread,
Nov 4, 2014, 5:55:44 AM11/4/14
to
If X equals std::numeric_limits<unsigned>::digits10, Y cannot be
represented in an unsigned, so the result is implementation defined, but
at most std::numeric_limits<unsigned>::max. So if i is
std::numeric_limits<unsigned>::max, it will never be smaller than Y,
resulting in unbounded recursion. For any i >= pow(10u, Y) where Y ==
std::numeric_limits<unsigned>::digits10, either the result is wrong, or
unbounded recursion occurs.

To make it simple, assume 8 bits. std::numeric_limits<unsigned>::digits10
== 3 => Y should be 1000, but that cannot be represented in 8 bits. (I
know an unsigned cannot be 8 bits, that is not the point).

M4

Luca Risolia

unread,
Nov 4, 2014, 6:11:53 AM11/4/14
to
Martijn Lievaart wrote:

> On Tue, 04 Nov 2014 10:44:33 +0100, Luca Risolia wrote:
>>>> I think that case is not possible. X can vary from 1 to
>>>> std::numeric_limits<unsigned>::digits10 (included), which implies that
>>>> the greatest Y is Y0 = 10^std::numeric_limits<unsigned>::digits10,
>>>> where std::numeric_limits<unsigned>::digits10 is one less than the
>>>> number of decimal digits needed to represent the maximum unsigned
>>>> value std::numeric_limits<unsigned>::max(). For i >= Y0, there is the
>>>> g<std::numeric_limits<unsigned>::digits10 + 1>() template
>>>> specialization, but no Y is calculated there.
>>>
>>> Buzz. Think again. I see at least two errors in this explanation.
>>
>> Please be clear, define a way to reproduce the case where the algorithm
>> would "eventually" result in an infinite loop in your opinion.
>> Unfortunately, I cannot spend much time on this topic which was
>> initially supposed to take me no more than 5 minutes.


> To make it simple, assume 8 bits. std::numeric_limits<unsigned>::digits10
> == 3 => Y should be 1000, but that cannot be represented in 8 bits. (I
> know an unsigned cannot be 8 bits, that is not the point).

No. Again, this case is not possible. By standard an 8-bit unsigned type
cannot have std::numeric_limits<unsigned>::digits10 == 3. A concrete example
on my C++ implementation is unsigned char, for which CHAR_BIT is 8 and
std::numeric_limits<unsigned char> is 2, as it should be.

Martijn Lievaart

unread,
Nov 5, 2014, 5:35:13 PM11/5/14
to
Brainfart on my part. You are right.

M4

JiiPee

unread,
Nov 13, 2014, 1:05:19 PM11/13/14
to
Thanks, I tested the uint version, and no errors.

JiiPee

unread,
Nov 13, 2014, 1:12:29 PM11/13/14
to
On 03/11/2014 08:16, Paavo Helde wrote:
> JiiPee <n...@notvalid.com> wrote in news:4YF5w.907401$an2.6...@fx09.am4:
>
>> On 03/11/2014 06:04, Paavo Helde wrote:
>>> To be honest, all this thread seems like an ultimate exercise in
>>> premature optimization. I am pretty sure there will not be many
>>> programs where int-
>>> to-string conversion would show up in the profiler. Trading 40000
>>> bytes of process size to (potentially) speed it up seems dubious, I
>>> know there are people who are whining already about 100 bytes.
>> I know, but thats why I said it should be under a compile flag, so
>> only in extreme situations where a big loop is used and there is
>> nothing else running in the program.
> This has not much to do with the compiler, sprintf is a C standard
> library function. The C standard library is meant as a general-purpose
> library making a decent choice of different tradeoffs (like speed vs
> memory usage vs development and maintenance efforts, etc) so that it is
> good enough for most usage cases. As a result of tradeoffs it cannot be
> the best choice in all situations.
>
> In particular, sprintf() is defined as a locale-dependent function, with
> all the legacy and drawbacks associated with it.

Ok, but am not talking about sprintf here, am talking about our IntToStr
function. If I was to do this library (which am actually doing I think..
I have my own "ctools" library where I add all kind of beneficial
functions/tools I use in my projects.. then just add this file to my
projects... so gonna add this one there as well)., then I would put a
compile time flag, maybe a template, to tell what kind of version I want
the super fast or fast.

JiiPee

unread,
Nov 13, 2014, 1:21:13 PM11/13/14
to
On 03/11/2014 06:04, Paavo Helde wrote:
> JiiPee <n...@notvalid.com> wrote in news:Z1B5w.743535$Jh7.7...@fx27.am4:
>> Also I can see that we can make the code even faster by doubling the
>> lookup table say to see 0-9999 numbers, which would take up memory 40000
>> char s. That could be a compiling flag maybe to get super fast code with
>> bigger exe file.
> Don't be so sure it will be faster, memory access is often the bottleneck
> nowadays. It might even happen that the whole program slows down because
> IntToStr() is pushing some useful data out of the caches.
>
>

Would be interesting to test your "theory"... I would like to do it. But
I would need an example code from you where you think it would be
slower. We could test this.

But the 40000 trade off - its only allocated once in a program because
its a static array. and for me 40000 is not much memory waste, as I do
normal pc program mostly. its like 0.001 % of the RAM memory if having 3
GIG memory.

Öö Tiib

unread,
Nov 13, 2014, 8:02:59 PM11/13/14
to
On Thursday, 13 November 2014 20:21:13 UTC+2, JiiPee wrote:
> But the 40000 trade off - its only allocated once in a program because
> its a static array. and for me 40000 is not much memory waste, as I do
> normal pc program mostly. its like 0.001 % of the RAM memory if having 3
> GIG memory.

Memory costs nothing. Gigabyte here or there. It is full L1 cache that
costs something and L1 is 64k. Platform architects say that 64k is
optimal middle ground between too lot of transistors and too few L1
cache. Your tests (if they only work with algorithms that use fully the
64k L1 and 40k is almost full) will lie to you! In real application no
algorithm can use all L1 as its private storage of nonsense sub-strings.

Dombo

unread,
Nov 16, 2014, 10:03:09 AM11/16/14
to
Op 14-Nov-14 2:02, 嘱 Tiib schreef:
> On Thursday, 13 November 2014 20:21:13 UTC+2, JiiPee wrote:
>> But the 40000 trade off - its only allocated once in a program because
>> its a static array. and for me 40000 is not much memory waste, as I do
>> normal pc program mostly. its like 0.001 % of the RAM memory if having 3
>> GIG memory.
>
> Memory costs nothing. Gigabyte here or there. It is full L1 cache that
> costs something and L1 is 64k. Platform architects say that 64k is
> optimal middle ground between too lot of transistors and too few L1
> cache.

The problem is not so much the number of transistors needed for the
cache but the trade off between cache size and cache latency. The larger
the cache the larger the latency. That is the reason why CPU have
multi-level caches; a small but low latency L1 cache and a larger but
slower L2 cache and nowadays typically an even larger and slower L3
cache. If it were just a matter of transistor count there would be
little point in integrating multiple cache levels on the same die.

Considering physics I have little hope that the size of L1 caches will
grow significantly in the near future. For example the AMD Athlon (K7)
introduced back in 1999 had a L1 cache of 64 kB + 64 kB (Data +
Instructions), whereas the Intel Core i7 has a L1 cache of 32 kB + 32 kB
(Data + Instructions). Since memory accesses can performance wise
potentially be very expensive, it may make more sense to reduce the
amount of memory that needs to be accessed than to reduce the number of
instructions that need to be executed to accomplish a certain task.

0 new messages