int to string conversion tests + faster version (post 2)

JiiPee

unread,

Nov 13, 2014, 10:31:08 PM11/13/14

to

To continue about int to string conversion discussion.
Now I have tested again std/C++ library int to string conversion
functions with Visual Studio and gcc, and I still get that std and C++
library functions are way slower than our conversion functions
(including my fastest I just made). (all code is below the message)

Some of you are already familiar with the function somebody made on one
site, I call it intToStr. It uses 200 size char-string look-up table to
convert for example 49 to '4','9'. I suggested even faster version by
using 40000 look-up table converting four integers: like 4691 -> '4' '6'
'9' '1' . I just implemented it, and it makes it even faster! I call it
intToStrSuperFast.

I have Windows XP 32 bit, using 1 cpu, Visual Studio 2008 and 2010, gcc
(using O3 optimization). compiler: GCC 4.8.1, use C++11, -O3, -s
Here are the results:

loop: for(i = 0; i < 4294966000; i += 20)
->about. 215 million ints

Seconds it took to convert ints to string:
intToStrSuperFast : 3.415 s
intToStr : 4.687 s (+37%)
snprintf (gcc) : 588 s (+17118%)
_snprintf (VC2008) : 112 s (+3179%)
_snprintf (VC2010) : 140 s (+3999%)
_itoa (VC2008) : 36.9 s (+980%)

As we can see, intToStrSuperFast is even 37% faster than intToStr.
Also I tested itoa which is much faster than snprintf but still 9 times
slower than intToStr.
Strange that GCC snprintf is 5 times slower than Microsoft one.
Now, many said that in real program it would not be this fast. So I
would like to next see that in what kind of program this would be the case.

Also am still confused why others say they run this same code and
intToStr was only 2 times faster than snprintf. Did you definitely run
exactly the same code I am running below? Or is this because of 64 bit
machines/windows 7 doing things differently?

Would be nice if somebody could make a simple program which uses
intToStrSuperFast and makes it slow as many here said it could be when
creating a normal program.

Can somebody test the code below so we see if you get similar results as
I got?

code:

#include <stdio.h>
#include <iostream>
#include <ctime>
#include <stdlib.h>
#include <stdio.h>
#include <string>
#include <vector>
#include <limits>
#include <cstring>

const char digit_pairs[201] = {
"00010203040506070809"
"10111213141516171819"
"20212223242526272829"
"30313233343536373839"
"40414243444546474849"
"50515253545556575859"
"60616263646566676869"
"70717273747576777879"
"80818283848586878889"
"90919293949596979899"
};

char digit_pairs_sf[50000];

void init_digit_pairs_sf()
{
int ind = 0;
for(int a=0; a<=9; ++a)
{
for(int b=0; b<=9; ++b)
{
for(int c=0; c<=9; ++c)
{
for(int d=0; d<=9; ++d)
{
digit_pairs_sf[ind++] = '0' + a;
digit_pairs_sf[ind++] = '0' + b;
digit_pairs_sf[ind++] = '0' + c;
digit_pairs_sf[ind++] = '0' + d;
}
}
}
}
}

inline void intToStrSuperFast(unsigned int val, char* c)
{
int size;
if(val>=10000)
{
if(val>=10000000)
{
if(val>=1000000000)
size=10;
else if(val>=100000000)
size=9;
else
size=8;
}
else
{
if(val>=1000000)
size=7;
else if(val>=100000)
size=6;
else
size=5;
}
}
else
{
if(val>=100)
{
if(val>=1000)
size=4;
else
size=3;
}
else
{
if(val>=10)
size=2;
else if (val==0) {
c[0]='0';
c[1] = '\0';
return;
}
else
size=1;
}
}

c += size-1;
while(val>=10000)
{
int pos = val % 10000;
val /= 10000;
char hh;

*(c-3) = digit_pairs_sf[4*pos];
*(c-2) = digit_pairs_sf[4*pos+1];
*(c-1) = digit_pairs_sf[4*pos+2];
*(c) = digit_pairs_sf[4*pos+3];

c-=4;
}
while(val>0)
{
*c--='0' + (val % 10);
val /= 10;
}
c[size+1] = '\0';
}

//13/11/14
//Tested that it works with all ulong inteters (0-4294967295). No errors.

inline void intToStr(unsigned int val, char* c)
{
int size;
if(val>=10000)
{
if(val>=10000000)
{
if(val>=1000000000)
size=10;
else if(val>=100000000)
size=9;
else
size=8;
}
else
{
if(val>=1000000)
size=7;
else if(val>=100000)
size=6;
else
size=5;
}
}
else
{
if(val>=100)
{
if(val>=1000)
size=4;
else
size=3;
}
else
{
if(val>=10)
size=2;
else if (val==0) {
c[0]='0';
c[1] = '\0';
return;
}
else
size=1;
}
}

c += size-1;
while(val>=100)
{
int pos = val % 100;
val /= 100;

*(c-1) = digit_pairs[2*pos];
*c = digit_pairs[2*pos+1];

c-=2;
}
while(val>0)
{
*c--='0' + (val % 10);
val /= 10;
}
c[size+1] = '\0';
}

void benchmarkSF() {
init_digit_pairs_sf();
clock_t begin = clock();

char str1[16];
char cU;
unsigned int errorCount = 0;
//intToStr((unsigned int)1234567, str2);
unsigned int i;
int count=0;
for(i = 0; i < 4294966000; i += 20) {

// HERE COMMENT OUT THE FUNCTION YOU WANT TO TEST:
intToStrSuperFast(i, str1);
//intToStr(i, str1);
//snprintf (str1, 15, "%u",i); // sizeof(str)
//sprintf(str1,"%d",i);
// itoa (i,str1,10); if using this, change uint to int and for
loop to end to 4294966000/2

cU += str1[6]; // use string so that compiler will not optimize
things out

++count;
}

clock_t end = clock();
double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;

std::cout<<"Time: "<<elapsed_secs<<", integers count: "<<count<<",
"<<cU<<std::endl;
std::cout<<"errors: "<<errorCount;
}

/*
Test:
- for(i = 0; i < 4294966000; i += 20)
- n. 215 million ints
- Average of 4 tests:
intToStrSuperFast : 3.415 s
intToStr : 4.687 s (+37%)
snprintf : 588 s (+17118%)
_snprintf (VC2008) : 112 s (+3179%)
_snprintf (VC2010) : 140 s (+3999%)
_itoa (VC2008) : 36.9 s (+980%)

*/

JiiPee

unread,

Nov 13, 2014, 10:45:56 PM11/13/14

to

On 14/11/2014 01:02, Öö Tiib wrote:
> On Thursday, 13 November 2014 20:21:13 UTC+2, JiiPee wrote:
>> But the 40000 trade off - its only allocated once in a program because
>> its a static array. and for me 40000 is not much memory waste, as I do
>> normal pc program mostly. its like 0.001 % of the RAM memory if having 3
>> GIG memory.
> Memory costs nothing. Gigabyte here or there. It is full L1 cache that
> costs something and L1 is 64k. Platform architects say that 64k is
> optimal middle ground between too lot of transistors and too few L1
> cache. Your tests (if they only work with algorithms that use fully the
> 64k L1 and 40k is almost full) will lie to you!

Ok, can you make/show a simple program where this would happen? A
program where using my conversion functions things get slow. I would
like to test it on my computer.

(I have attached the full code to test my new version which uses 40000
look up table in the main message above.)

Ian Collins

unread,

Nov 13, 2014, 11:50:26 PM11/13/14

to

JiiPee wrote:
>
> Can somebody test the code below so we see if you get similar results as
> I got?

> - Average of 4 tests:
> intToStrSuperFast : 3.415 s
> intToStr : 4.687 s (+37%)
> snprintf : 588 s (+17118%)
> _snprintf (VC2008) : 112 s (+3179%)
> _snprintf (VC2010) : 140 s (+3999%)
> _itoa (VC2008) : 36.9 s (+980%)
>

On my Solaris system with the native compiler:

intToStrSuperFast 2.4s
intToStr 3.3s
lltostr 4.9s
snprintf 12.8s

--
Ian Collins

JiiPee

unread,

Nov 14, 2014, 1:27:08 AM11/14/14

to

ok interesting. So there is the same 37% difference, but snprintf is
much faster than mine. lltostr is not in C++ standard, so it does not
really count. I guess you would get portability problems with lltostr?

Maybe its my old computer which is doing this. I ll have to test using
another computer.

JiiPee

unread,

Nov 14, 2014, 1:29:00 AM11/14/14

to

Clearly its not a GCC issue because VC gives me similar results.

Ian Collins

unread,

Nov 14, 2014, 2:24:59 AM11/14/14

to

JiiPee wrote:
> On 14/11/2014 04:50, Ian Collins wrote:
>> JiiPee wrote:
>>>
>>> Can somebody test the code below so we see if you get similar results as
>>> I got?
>>
>>> - Average of 4 tests:
>>> intToStrSuperFast : 3.415 s
>>> intToStr : 4.687 s (+37%)
>>> snprintf : 588 s (+17118%)
>>> _snprintf (VC2008) : 112 s (+3179%)
>>> _snprintf (VC2010) : 140 s (+3999%)
>>> _itoa (VC2008) : 36.9 s (+980%)
>>>
>>
>> On my Solaris system with the native compiler:
>>
>> intToStrSuperFast 2.4s
>> intToStr 3.3s
>> lltostr 4.9s
>> snprintf 12.8s
>>
>
> ok interesting. So there is the same 37% difference, but snprintf is
> much faster than mine. lltostr is not in C++ standard, so it does not
> really count. I guess you would get portability problems with lltostr?

You can see the source here:

https://github.com/joyent/illumos-joyent/blob/master/usr/src/lib/libc/port/gen/lltostr.c

--
Ian Collins

Luca Risolia

unread,

Nov 14, 2014, 3:00:50 AM11/14/14

to

Il 14/11/2014 04:30, JiiPee ha scritto:
> Would be nice if somebody could make a simple program which uses
> intToStrSuperFast and makes it slow as many here said it could be when
> creating a normal program.
>
> Can somebody test the code below so we see if you get similar results as
> I got?

I get similar results. But on my machine the implementation of your
super unuseful function that I provided in a previous post is still ~4x
faster. In facts, if you think better, a great part of calculations can
be done at compile time, there is no need to read from an array in
memory. Furthermore, that version will also convert unsigned integers of
any size, while your function does not (the upper limit is 10^11
apparently).

Alain Ketterlin

unread,

Nov 14, 2014, 3:21:35 AM11/14/14

to

JiiPee <n...@notvalid.com> writes:

[...]

> Some of you are already familiar with the function somebody made on
> one site, I call it intToStr. It uses 200 size char-string look-up
> table to convert for example 49 to '4','9'. I suggested even faster
> version by using 40000 look-up table converting four integers: like
> 4691 -> '4' '6' '9' '1' . I just implemented it, and it makes it even
> faster! I call it intToStrSuperFast.

Make it an array of 4 billion entries of length 10, and you get a
constant time int-to-string conversion! OK, I'm kidding.

> I have Windows XP 32 bit, using 1 cpu, Visual Studio 2008 and 2010,
> gcc (using O3 optimization). compiler: GCC 4.8.1, use C++11, -O3, -s
> Here are the results:
>
> loop: for(i = 0; i < 4294966000; i += 20)
> ->about. 215 million ints
>
> Seconds it took to convert ints to string:
> intToStrSuperFast : 3.415 s
> intToStr : 4.687 s (+37%)
> snprintf (gcc) : 588 s (+17118%)
> _snprintf (VC2008) : 112 s (+3179%)
> _snprintf (VC2010) : 140 s (+3999%)
> _itoa (VC2008) : 36.9 s (+980%)

There is something really wrong with your snprintf.

Anyway, comparing with snprintf to compare conversion time doesn't mean
much:

- your function is a specialized, inline routine, that get optimized at
compile time inside a loop (i.e., it may be unrolled, etc.), and since
you're building an executable and your function is declared inline,
even if it is not actually inlined the compiler is not required to
respect the ABI (this may lead to dramatic improvement on
register-starved archs like x86, especially for functions with 2
parameters).

- snprintf is a general, variable-argument, library routine, and the
compiler doesn't even see it. It has probably been compiled
conservatively, to work on any processor sold in the last 10 years or
so.

My guess is that you are measuring the overhead of the call, much more
than the conversion time itself. If you want something more useful, make
your function parameters var-arg, use a va_list to pop data, compile
this into a shared-library (with no arch-specific optimization), rewrite
your main to use this, link against the shared-lib.

Then, measure and compare. You'll still win (because there is no format
sting to parse, etc), but I would be surprised if the difference were
significant.

-- Alain.

JiiPee

unread,

Nov 14, 2014, 7:20:11 AM11/14/14

to

But C++ standard says that maximum uint is 4294967295, so its doing the
job its meant to do. I did not even require to convert bigger than that
as this is only a replacement of std::snprintf which I guess is also
limited like that.

So the task was to convert integers less than 4294967295 as fast as
possible....
Using integers greater than that would give proglems anyway as uint
could not take them (at least not on every machine, so portability
problems).

But if somebody wants bigger than 4294967295 then we could modify easily
intToStr to do that.

JiiPee

unread,

Nov 14, 2014, 7:26:39 AM11/14/14

to

On 14/11/2014 08:21, Alain Ketterlin wrote:
>
>
> My guess is that you are measuring the overhead of the call, much more
> than the conversion time itself. If you want something more useful, make
> your function parameters var-arg, use a va_list to pop data, compile
> this into a shared-library (with no arch-specific optimization), rewrite
> your main to use this, link against the shared-lib.
>
> Then, measure and compare. You'll still win (because there is no format
> sting to parse, etc), but I would be surprised if the difference were
> significant.
>
> -- Alain.

Ok, but the task /challenge was always to create as fast int to string
function regardless of other issues, but not using too much memory
obviously so its usefull in real life. And my functions they already
meet that criteria. So I do not see any need to try to make them as C++
library functions. But that can be the explanation yes.

JiiPee

unread,

Nov 14, 2014, 7:30:44 AM11/14/14

to

On 14/11/2014 08:00, Luca Risolia wrote:

oh I missed your function... now found it. I have to try that as well
and compare. I ll try to do it later. Obviously if it does not use any
memery and is faster than mine then its much better. But let me
check/test that.

Luca Risolia

unread,

Nov 14, 2014, 8:40:09 AM11/14/14

to

Il 14/11/2014 13:20, JiiPee ha scritto:
> But C++ standard says that maximum uint is 4294967295, so its doing the
> job its meant to do.

There is not such limit in the standard for an unsigned integer
(assuming that you mean unsigned integer by "uint"). The maximum value
depends on a number of things and is provided by a given C++
implementation via std::numeric:limits<unsigned integer>::max().

> I did not even require to convert bigger than that
> as this is only a replacement of std::snprintf which I guess is also
> limited like that.

I don't think that limit exists for std::snprintf.

> So the task was to convert integers less than 4294967295 as fast as
> possible....
> Using integers greater than that would give proglems anyway as uint
> could not take them (at least not on every machine, so portability
> problems).

This does not make any sense..

> But if somebody wants bigger than 4294967295 then we could modify easily
> intToStr to do that.

I don't buy it :) Honestly, this function is going to be useful to you only.

JiiPee

unread,

Nov 14, 2014, 2:25:40 PM11/14/14

to

On 14/11/2014 13:40, Luca Risolia wrote:
> Il 14/11/2014 13:20, JiiPee ha scritto:
>> But C++ standard says that maximum uint is 4294967295, so its doing the
>> job its meant to do.
>
> There is not such limit in the standard for an unsigned integer
> (assuming that you mean unsigned integer by "uint"). The maximum value
> depends on a number of things and is provided by a given C++
> implementation via std::numeric:limits<unsigned integer>::max().

ULONG_MAX is 4294967295 or greater. So it can be 4294967295 ....

>
>> I did not even require to convert bigger than that
>> as this is only a replacement of std::snprintf which I guess is also
>> limited like that.
>
> I don't think that limit exists for std::snprintf.

But the biggest snprintf can take is uint (or long uint). So that means
in my computer at least 4294967295. Or can you show me a code which
would convert bigger than 4294967295 on my computer to a string? I think
the compiler will complain if I put bigger than that....

>
>> So the task was to convert integers less than 4294967295 as fast as
>> possible....
>> Using integers greater than that would give proglems anyway as uint
>> could not take them (at least not on every machine, so portability
>> problems).
>
> This does not make any sense..

intToStr(myInt);
...
foo(myInt);

foo() can be maximum:
void foo(unsigned int),
so the parameter cannot be more than max_uint.

Most of the functions in normal programs are uint, and thus cannot
operate with values bigger than that. So using bigger ints than that is
nor very useful as most functions cannot use them...

>
>> But if somebody wants bigger than 4294967295 then we could modify easily
>> intToStr to do that.
>
> I don't buy it :) Honestly, this function is going to be useful to you
> only.

We could indeed modify it :). That remains a fact :).

Scott Lurndal

unread,

Nov 14, 2014, 2:33:41 PM11/14/14

to

JiiPee <n...@notvalid.com> writes:
>On 14/11/2014 13:40, Luca Risolia wrote:
>> Il 14/11/2014 13:20, JiiPee ha scritto:
>>> But C++ standard says that maximum uint is 4294967295, so its doing the
>>> job its meant to do.
>>
>> There is not such limit in the standard for an unsigned integer
>> (assuming that you mean unsigned integer by "uint"). The maximum value
>> depends on a number of things and is provided by a given C++
>> implementation via std::numeric:limits<unsigned integer>::max().
>
>ULONG_MAX is 4294967295 or greater. So it can be 4294967295 ....

Not on my system. ULONG_MAX here is:

$ printf "%u\n" $(( 0xffffffffffffffff ))
18446744073709551615

>
>>
>>> I did not even require to convert bigger than that
>>> as this is only a replacement of std::snprintf which I guess is also
>>> limited like that.
>>
>> I don't think that limit exists for std::snprintf.
>
>But the biggest snprintf can take is uint (or long uint). So that means

Most modern systems are 64-bit, and unsigned long is 64-bit. snprintf will
even take an unsigned long long, which is a 64-bit value on 32-bit CPU's
and the gnu compilers offer 128-bit scalar types (__int128_t, __uint128_t).

JiiPee

unread,

Nov 14, 2014, 2:45:11 PM11/14/14

to

On 14/11/2014 19:33, Scott Lurndal wrote:
> JiiPee <n...@notvalid.com> writes:
>> On 14/11/2014 13:40, Luca Risolia wrote:
>>> Il 14/11/2014 13:20, JiiPee ha scritto:
>>>> But C++ standard says that maximum uint is 4294967295, so its doing the
>>>> job its meant to do.
>>> There is not such limit in the standard for an unsigned integer
>>> (assuming that you mean unsigned integer by "uint"). The maximum value
>>> depends on a number of things and is provided by a given C++
>>> implementation via std::numeric:limits<unsigned integer>::max().
>> ULONG_MAX is 4294967295 or greater. So it can be 4294967295 ....
> Not on my system. ULONG_MAX here is:
>
> $ printf "%u\n" $(( 0xffffffffffffffff ))
> 18446744073709551615

But on my system it is....

>
>
>>>> I did not even require to convert bigger than that
>>>> as this is only a replacement of std::snprintf which I guess is also
>>>> limited like that.
>>> I don't think that limit exists for std::snprintf.
>> But the biggest snprintf can take is uint (or long uint). So that means
> Most modern systems are 64-bit, and unsigned long is 64-bit. snprintf will
> even take an unsigned long long, which is a 64-bit value on 32-bit CPU's
> and the gnu compilers offer 128-bit scalar types (__int128_t, __uint128_t).

But it does not take longer than build in types, which is unsigned long
long, isnt it? And that was the point.

JiiPee

unread,

Nov 14, 2014, 2:49:00 PM11/14/14

to

Because whatever build in types we use (even int128), then surely my
function also can handle them. So it does not change the "setting" . The
question was more like, that can snprintf and others convert bigger
integers than my function. And seems like no...

JiiPee

unread,

Nov 14, 2014, 9:07:32 PM11/14/14

to

On 14/11/2014 08:00, Luca Risolia wrote:

> Il 14/11/2014 04:30, JiiPee ha scritto:
>> Would be nice if somebody could make a simple program which uses
>> intToStrSuperFast and makes it slow as many here said it could be when
>> creating a normal program.
>>
>> Can somebody test the code below so we see if you get similar results as
>> I got?
>
> I get similar results. But on my machine the implementation of your
> super unuseful function that I provided in a previous post is still
> ~4x faster.

I just tested that, and I must take my hat off to you.... yes it was
much faster. I did improve my superFast even more (like 0.5 secs), but
yours is still 2.6 times faster. I also tested it converts all integers
correctly. So I guess I start using your version then :). Now I want to
understand also how it is done.... looks complicated though.

Funny that on that web site nobody found this kind of solution.

Its unbelieveble fast: on my PC it converts all unsigned integers in 22
seconds.

Jorgen Grahn

unread,

Nov 15, 2014, 4:26:15 AM11/15/14

to

On Fri, 2014-11-14, JiiPee wrote:

> On 14/11/2014 01:02, 嘱 Tiib wrote:
>> On Thursday, 13 November 2014 20:21:13 UTC+2, JiiPee wrote:
>>> But the 40000 trade off - its only allocated once in a program because
>>> its a static array. and for me 40000 is not much memory waste, as I do
>>> normal pc program mostly. its like 0.001 % of the RAM memory if having 3
>>> GIG memory.
>> Memory costs nothing. Gigabyte here or there. It is full L1 cache that
>> costs something and L1 is 64k. Platform architects say that 64k is
>> optimal middle ground between too lot of transistors and too few L1
>> cache. Your tests (if they only work with algorithms that use fully the
>> 64k L1 and 40k is almost full) will lie to you!
>
> Ok, can you make/show a simple program where this would happen? A
> program where using my conversion functions things get slow. I would
> like to test it on my computer.

Perhaps you can run a program in the background which wants all of the
cache, and also reports how well it performs.

There are also tools for looking at the cache performance, like perf
on Linux. But it's a difficult topic and I don't know it well enough.

> (I have attached the full code to test my new version which uses 40000
> look up table in the main message above.)

Side note: it's better in cases like this to put it on github or
something, and here just mention which version you're talking about --
since you're going to change the code, and others will too.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

JiiPee

unread,

Nov 15, 2014, 7:31:14 AM11/15/14

to

On 15/11/2014 09:26, Jorgen Grahn wrote:
> On Fri, 2014-11-14, JiiPee wrote:

>> On 14/11/2014 01:02, Öö Tiib wrote:
>>> On Thursday, 13 November 2014 20:21:13 UTC+2, JiiPee wrote:
>>>> But the 40000 trade off - its only allocated once in a program because
>>>> its a static array. and for me 40000 is not much memory waste, as I do
>>>> normal pc program mostly. its like 0.001 % of the RAM memory if having 3
>>>> GIG memory.
>>> Memory costs nothing. Gigabyte here or there. It is full L1 cache that
>>> costs something and L1 is 64k. Platform architects say that 64k is
>>> optimal middle ground between too lot of transistors and too few L1
>>> cache. Your tests (if they only work with algorithms that use fully the
>>> 64k L1 and 40k is almost full) will lie to you!
>> Ok, can you make/show a simple program where this would happen? A
>> program where using my conversion functions things get slow. I would
>> like to test it on my computer.
> Perhaps you can run a program in the background which wants all of the
> cache, and also reports how well it performs.

I dont really know how to do this.

>
> There are also tools for looking at the cache performance, like perf
> on Linux. But it's a difficult topic and I don't know it well enough.
>
>> (I have attached the full code to test my new version which uses 40000
>> look up table in the main message above.)
> Side note: it's better in cases like this to put it on github or
> something, and here just mention which version you're talking about --
> since you're going to change the code, and others will too.
>
> /Jorgen

oh ok. never used it. maybe I need the check that.

>

JiiPee

unread,

Nov 15, 2014, 7:44:27 AM11/15/14

to

On 15/11/2014 09:26, Jorgen Grahn wrote:

> On Fri, 2014-11-14, JiiPee wrote:

That was a good practice for me to improve that intToString function,
although Luca has even faster version so best to use that (dont
understand that code though). But I did speed up the original version
almost two times faster, which am happy. Also learned couple of things
like for-loop makes things slower and so using switch better there.

Jorgen Grahn

unread,

Nov 15, 2014, 9:46:46 AM11/15/14

to

> oh ok. never used it. maybe I need the check that.

I suspect a lot of people haven't, but things like that are becoming
quite common. It's so convenient for everyone: I'm more likely to
clone a repo than download or cut & paste a file, because I know I can
leave the repo somewhere on my disk, and a year from now I can still
see where it came from, if I have modified it, if there are new
versions, and so on.

Jorgen Grahn

unread,

Nov 15, 2014, 9:58:48 AM11/15/14

to

On Sat, 2014-11-15, JiiPee wrote:

> On 15/11/2014 09:26, Jorgen Grahn wrote:
>> On Fri, 2014-11-14, JiiPee wrote:

>>> On 14/11/2014 01:02, 嘱 Tiib wrote:
>>>> On Thursday, 13 November 2014 20:21:13 UTC+2, JiiPee wrote:
>>>>> But the 40000 trade off - its only allocated once in a program because
>>>>> its a static array. and for me 40000 is not much memory waste, as I do
>>>>> normal pc program mostly. its like 0.001 % of the RAM memory if having 3
>>>>> GIG memory.
>>>> Memory costs nothing. Gigabyte here or there. It is full L1 cache that
>>>> costs something and L1 is 64k. Platform architects say that 64k is
>>>> optimal middle ground between too lot of transistors and too few L1
>>>> cache. Your tests (if they only work with algorithms that use fully the
>>>> 64k L1 and 40k is almost full) will lie to you!
>>> Ok, can you make/show a simple program where this would happen? A
>>> program where using my conversion functions things get slow. I would
>>> like to test it on my computer.

>> Perhaps you can run a program in the background which wants all of the
>> cache, and also reports how well it performs.
>
> I dont really know how to do this.

I think it's enough to e.g. loop over a 1MB std::vector and sum up the
elements, over and over again. Or maybe update it? Anyway, something
which is hard on the data cache but nice to the instruction cache and
the CPU itself.

>> There are also tools for looking at the cache performance, like perf
>> on Linux.
>> But it's a difficult topic and I don't know it well enough.

That caveat still applies ...

Paavo Helde

unread,

Nov 16, 2014, 8:34:42 AM11/16/14

to

JiiPee <n...@notvalid.com> wrote in
news:zUH9w.1104999$Fy6.9...@fx25.am4:

> On 15/11/2014 09:26, Jorgen Grahn wrote:
>> Side note: it's better in cases like this to put it on github or
>> something, and here just mention which version you're talking about
>> -- since you're going to change the code, and others will too.
>>
>

> That was a good practice for me to improve that intToString function,
> although Luca has even faster version so best to use that (dont
> understand that code though).

Luca's code is also trading memory for speed, but this time it is the
compiled code size (in my experimentation Luca's version compiled into 2564
bytes .o and your version with 200-byte lookup table compiled into 1328
bytes. And in my tests it was slower, not faster. Probably I had too old
gcc version. Or it was something else.

Cheers
Paavo

JiiPee

unread,

Nov 16, 2014, 9:10:54 AM11/16/14

to

His version was slower than my first version? So the Superfast would be
even faster? I think I ll do more tests of other computers.

Ok, this means that this is not straigthforward at all :). It seems to
depend on what which system is running that which version is the
fastest. So what would be the best way to find out what version to use
in which setting/computer? Run all of them and start using the one which
is fastest for that computer?

Dombo

unread,

Nov 16, 2014, 9:32:42 AM11/16/14

to

Op 16-Nov-14 15:10, JiiPee schreef:

> Ok, this means that this is not straightforward at all :).

Unfortunately not.

> It seems to depend on what which system is running that which version is the
> fastest.

Not only that; also what else is running on the system affects the speed
of your function. E.g. another function may trash the cache slowing down
your function if it heavily depends on memory accesses, or your function
may trash the cache making the other functions slower.

> So what would be the best way to find out what version to use
> in which setting/computer? Run all of them and start using the one which
> is fastest for that computer?

Measuring on the target system is generally the best way and often the
only practical way to determine which is fastest. Guessing/assuming what
will be fastest is the worst way since there are too many variables, of
which many are unknown in most cases, that affect the performance.

Johann Klammer

unread,

Nov 16, 2014, 10:07:16 AM11/16/14

to

On 11/15/2014 03:46 PM, Jorgen Grahn wrote:
>
> I suspect a lot of people haven't, but things like that are becoming
> quite common. It's so convenient for everyone: I'm more likely to
> clone a repo than download or cut & paste a file, because I know I can
> leave the repo somewhere on my disk, and a year from now I can still
> see where it came from, if I have modified it, if there are new
> versions, and so on.
>
> /Jorgen
>

Yes, convenient...
Not all of them have the same functionality...
There's a list on wikipedia...

<http://en.wikipedia.org/wiki/Comparison_of_open-source_software_hosting_facilities>

JiiPee

unread,

Nov 16, 2014, 10:12:55 AM11/16/14

to

On 16/11/2014 14:34, Dombo wrote:
> Op 16-Nov-14 15:10, JiiPee schreef:
>
>

> Measuring on the target system is generally the best way and often the
> only practical way to determine which is fastest. Guessing/assuming
> what will be fastest is the worst way since there are too many
> variables, of which many are unknown in most cases, that affect the
> performance.
>

in speed critical issues, I would feel the same way... the program could
first test all of them and see which one fastest. To be perfect, it
would need to do statistical tests during weeks or months, because the
PC usage might change.

Luca Risolia

unread,

Nov 16, 2014, 10:58:01 AM11/16/14

to

Il 16/11/2014 15:10, JiiPee ha scritto:
> On 16/11/2014 13:34, Paavo Helde wrote:
>> JiiPee <n...@notvalid.com> wrote in
>> news:zUH9w.1104999$Fy6.9...@fx25.am4:
>>
>>> On 15/11/2014 09:26, Jorgen Grahn wrote:
>>>> Side note: it's better in cases like this to put it on github or
>>>> something, and here just mention which version you're talking about
>>>> -- since you're going to change the code, and others will too.
>>>>
>>> That was a good practice for me to improve that intToString function,
>>> although Luca has even faster version so best to use that (dont
>>> understand that code though).
>> Luca's code is also trading memory for speed, but this time it is the
>> compiled code size (in my experimentation Luca's version compiled into
>> 2564
>> bytes .o and your version with 200-byte lookup table compiled into 1328
>> bytes. And in my tests it was slower, not faster. Probably I had too old
>> gcc version. Or it was something else.
>>
>> Cheers
>> Paavo
>>
>>
>
> His version was slower than my first version? So the Superfast would be
> even faster? I think I ll do more tests of other computers.

Test the function below. It is equivalent to the code that the compiler
should be able to generate from my original (and more generic) version
on implementations where unsigned integer is 32-bit long.

void uint_to_str(unsigned int i, char* c) {
if (i < 10) {
c[0] = '0' + (i % 10);
c[1] = '\0';
} else if (i < 100) {
c[0] = '0' + (i / 10);
c[1] = '0' + (i % 10);
c[2] = '\0';
} else if (i < 1000) {
c[0] = '0' + (i / 100);
c[1] = '0' + (i % 100) / 10;
c[2] = '0' + (i % 10);
c[3] = '\0';
} else if (i < 10000) {
c[0] = '0' + (i / 1000);
c[1] = '0' + (i % 1000) / 100;
c[2] = '0' + (i % 100) / 10;
c[3] = '0' + (i % 10);
c[4] = '\0';
} else if (i < 100000) {
c[0] = '0' + (i / 10000);
c[1] = '0' + (i % 10000) / 1000;
c[2] = '0' + (i % 1000) / 100;
c[3] = '0' + (i % 100) / 10;
c[4] = '0' + (i % 10);
c[5] = '\0';
} else if (i < 1000000) {
c[0] = '0' + (i / 100000);
c[1] = '0' + (i % 100000) / 10000;
c[2] = '0' + (i % 10000) / 1000;
c[3] = '0' + (i % 1000) / 100;
c[4] = '0' + (i % 100) / 10;
c[5] = '0' + (i % 10);
c[6] = '\0';
} else if (i < 10000000) {
c[0] = '0' + (i / 1000000);
c[1] = '0' + (i % 1000000) / 100000;
c[2] = '0' + (i % 100000) / 10000;
c[3] = '0' + (i % 10000) / 1000;
c[4] = '0' + (i % 1000) / 100;
c[5] = '0' + (i % 100) / 10;
c[6] = '0' + (i % 10);
c[7] = '\0';
} else if (i < 100000000) {
c[0] = '0' + (i / 10000000);
c[1] = '0' + (i % 10000000) / 1000000;
c[2] = '0' + (i % 1000000) / 100000;
c[3] = '0' + (i % 100000) / 10000;
c[4] = '0' + (i % 10000) / 1000;
c[5] = '0' + (i % 1000) / 100;
c[6] = '0' + (i % 100) / 10;
c[7] = '0' + (i % 10);
c[8] = '\0';
} else if (i < 1000000000) {
c[0] = '0' + (i / 100000000);
c[1] = '0' + (i % 100000000) / 10000000;
c[2] = '0' + (i % 10000000) / 1000000;
c[3] = '0' + (i % 1000000) / 100000;
c[4] = '0' + (i % 100000) / 10000;
c[5] = '0' + (i % 10000) / 1000;
c[6] = '0' + (i % 1000) / 100;
c[7] = '0' + (i % 100) / 10;
c[8] = '0' + (i % 10);
c[9] = '\0';
} else {
c[0] = '0' + (i / 1000000000);
c[1] = '0' + (i % 1000000000) / 100000000;
c[2] = '0' + (i % 100000000) / 10000000;
c[3] = '0' + (i % 10000000) / 1000000;
c[4] = '0' + (i % 1000000) / 100000;
c[5] = '0' + (i % 100000) / 10000;
c[6] = '0' + (i % 10000) / 1000;
c[7] = '0' + (i % 1000) / 100;
c[8] = '0' + (i % 100) / 10;
c[9] = '0' + (i % 10);
c[10] = '\0';
}
}

JiiPee

unread,

Nov 16, 2014, 11:15:49 AM11/16/14

to

ok, this looks more readable :). Its also very fast, so well done. But
its 4% slower than your first version.

Alain Ketterlin

unread,

Nov 16, 2014, 11:16:43 AM11/16/14

to

Luca Risolia <luca.r...@linux-projects.org> writes:

> Test the function below. It is equivalent to the code that the
> compiler should be able to generate from my original (and more
> generic) version on implementations where unsigned integer is 32-bit
> long.
>
> void uint_to_str(unsigned int i, char* c) {
> if (i < 10) {

[...]
> } else if (i < 100) {
[...]
> } else if (i < 1000) {
[...]
> } else if (i < 10000) {
[... and so on]

Note that reverting the order of the tests may provide even better
results in the given experimental setup (all integers multiple of 20 or
something iirc). Or you could also use dichotomy, as the OP did, to
ensure never doing more than 4 tests. And you could use div() to get
quotient and remainder in (hopefully) a single operation.

(Anyway, the C++ version was impressive.)

-- Alain.

JiiPee

unread,

Nov 16, 2014, 11:34:00 AM11/16/14

to

no, using div() makes it 4 times slower. But the order surely matters.
But in a real we normally use small integers, so I think current version
is best for it.

Öö Tiib

unread,

Nov 16, 2014, 3:58:09 PM11/16/14

to

On Friday, 14 November 2014 05:45:56 UTC+2, JiiPee wrote:

> On 14/11/2014 01:02, Öö Tiib wrote:
> > On Thursday, 13 November 2014 20:21:13 UTC+2, JiiPee wrote:
> >> But the 40000 trade off - its only allocated once in a program because
> >> its a static array. and for me 40000 is not much memory waste, as I do
> >> normal pc program mostly. its like 0.001 % of the RAM memory if having 3
> >> GIG memory.
> > Memory costs nothing. Gigabyte here or there. It is full L1 cache that
> > costs something and L1 is 64k. Platform architects say that 64k is
> > optimal middle ground between too lot of transistors and too few L1
> > cache. Your tests (if they only work with algorithms that use fully the
> > 64k L1 and 40k is almost full) will lie to you!
>
> Ok, can you make/show a simple program where this would happen? A
> program where using my conversion functions things get slow. I would
> like to test it on my computer.

It does not make tests simple but it makes their results more useful
and reliable.

First do not let processor to predict branches by using adjacent values
as input. Use pseudorandom input.

Also achieve that rest of the program is what takes most of the time
(and so is worth most of the cache). For example collect the results
of conversion into 'boost::flat_multiset' or the like.

> (I have attached the full code to test my new version which uses 40000
> look up table in the main message above.)

For me Luca's function feels more efficient. I haven't tested it since
converting millions of ints to text feels useless. If text-based
communication is too slow then I switch to binary ('htonl'/'htons').

JiiPee

unread,

Nov 16, 2014, 8:36:58 PM11/16/14

to

On 16/11/2014 16:16, Alain Ketterlin wrote:

checking first
if(i > 10000)

increases the speed by another 10%.

And it is really impressive that it takes only about 17 seconds on my
machine (1 cpu) to so all the integers from 0 to 4294967293.

Nobody

unread,

Nov 19, 2014, 4:48:10 AM11/19/14

to

On Fri, 14 Nov 2014 19:33:29 +0000, Scott Lurndal wrote:

> Most modern systems are 64-bit, and unsigned long is 64-bit.

On "64-bit" versions of Windows, long is still only 32 bits.

Öö Tiib

unread,

Nov 21, 2014, 2:22:44 AM11/21/14

to

For the desire of having lot of 'goto' and 'c++' in
code I wrote such thing:

void uint32ToStr_goto(uint32_t i, char* c)
{
if (i < 1000000)
{
if (i < 10000)
{
if (i < 100)
{
if (i < 10)
goto OneDigit;
goto TwoDigits;
}
if (i < 1000)
goto ThreeDigits;
goto FourDigits;
}
if (i < 100000)
goto FiveDigits;
goto SixDigits;
}
if (i < 100000000)
{
if (i < 10000000)
goto SevenDigits;
goto EightDigits;
}
if (i < 1000000000)
goto NineDigits;

// ten digits 1000000000-4294967295
*c++ = '0' + (i / 1000000000);

NineDigits:
*c++ = '0' + (i % 1000000000) / 100000000;
EightDigits:
*c++ = '0' + (i % 100000000) / 10000000;
SevenDigits:
*c++ = '0' + (i % 10000000) / 1000000;
SixDigits:
*c++ = '0' + (i % 1000000) / 100000;
FiveDigits:
*c++ = '0' + (i % 100000) / 10000;
FourDigits:
*c++ = '0' + (i % 10000) / 1000;
ThreeDigits:
*c++ = '0' + (i % 1000) / 100;
TwoDigits:
*c++ = '0' + (i % 100) / 10;
OneDigit:
*c++ = '0' + (i % 10);
*c = '\0';
}

I don't think it is best performer since the
main point was aesthetically and logically
clear 'goto' usage. :P

Martijn Lievaart

unread,

Nov 21, 2014, 7:35:13 AM11/21/14

to

On Thu, 20 Nov 2014 23:22:29 -0800, Öö Tiib wrote:

>
> I don't think it is best performer since the main point was
> aesthetically and logically clear 'goto' usage. :P

Everyone knows you should not use gotos in C++!

void uint32ToStr_duff(uint32_t i, char* c)
{
std::array<uint32_t, 10> x =
{ 10,100,1000,10000,100000,
1000000,10000000,100000000,1000000000 };

auto it = std::find_if(x.begin(), x.end(),
std::bind1st(std::less<uint32_t>(), i));

switch (it-x.begin()) {
case 9: *c++ = '0' + (i / 1000000000);
case 8: *c++ = '0' + (i % 1000000000) / 100000000;
case 7: *c++ = '0' + (i % 100000000) / 10000000;
case 6: *c++ = '0' + (i % 10000000) / 1000000;
case 5: *c++ = '0' + (i % 1000000) / 100000;
case 4: *c++ = '0' + (i % 100000) / 10000;
case 3: *c++ = '0' + (i % 10000) / 1000;
case 2: *c++ = '0' + (i % 1000) / 100;
case 1: *c++ = '0' + (i % 100) / 10;
case 0: *c++ = '0' + (i % 10);
*c = '\0';
}
}

:-)

M4

Juha Nieminen

unread,

Nov 21, 2014, 10:19:35 AM11/21/14

to

Martijn Lievaart <m...@rtij.nl.invlalid> wrote:
> std::array<uint32_t, 10> x =
> { 10,100,1000,10000,100000,
> 1000000,10000000,100000000,1000000000 };
>
> auto it = std::find_if(x.begin(), x.end(),
> std::bind1st(std::less<uint32_t>(), i));

Wouldn't std::lower_bound() be much better for that purpose?

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

Öö Tiib

unread,

Nov 21, 2014, 10:35:39 AM11/21/14

to

On Friday, 21 November 2014 14:35:13 UTC+2, Martijn Lievaart wrote:
> On Thu, 20 Nov 2014 23:22:29 -0800, Öö Tiib wrote:
> >
> > I don't think it is best performer since the main point was
> > aesthetically and logically clear 'goto' usage. :P
>
> Everyone knows you should not use gotos in C++!

Everyone have been wrong before ... that earth is flat and what not.

> void uint32ToStr_duff(uint32_t i, char* c)
> {
> std::array<uint32_t, 10> x =
> { 10,100,1000,10000,100000,
> 1000000,10000000,100000000,1000000000 };
> auto it = std::find_if(x.begin(), x.end(),
> std::bind1st(std::less<uint32_t>(), i));
> switch (it-x.begin()) {
> case 9: *c++ = '0' + (i / 1000000000);
> case 8: *c++ = '0' + (i % 1000000000) / 100000000;
> case 7: *c++ = '0' + (i % 100000000) / 10000000;
> case 6: *c++ = '0' + (i % 10000000) / 1000000;
> case 5: *c++ = '0' + (i % 1000000) / 100000;
> case 4: *c++ = '0' + (i % 100000) / 10000;
> case 3: *c++ = '0' + (i % 10000) / 1000;
> case 2: *c++ = '0' + (i % 1000) / 100;
> case 1: *c++ = '0' + (i % 100) / 10;
> case 0: *c++ = '0' + (i % 10);
> *c = '\0';
> }
> }
>
> :-)

I don't get the joke.
Demo:

int main()
{
char str1[20] = "*garbage*";
uint32ToStr_goto(1234567890, str1 );
std::cout << "gotos:" << str1 << ":work" << std::endl;

char str2[20] = "*garbage*";
uint32ToStr_duff(1234567890, str2 );
std::cout << "fun:" << str2 << ":here" << std::endl;
}

Output:

gotos:1234567890:work
fun:*garbage*:here

Program ended with exit code: 0

:-/

Martijn Lievaart

unread,

Nov 22, 2014, 6:00:52 AM11/22/14

to

On Fri, 21 Nov 2014 15:19:15 +0000, Juha Nieminen wrote:

> Martijn Lievaart <m...@rtij.nl.invlalid> wrote:
>> std::array<uint32_t, 10> x =
>> { 10,100,1000,10000,100000,
>> 1000000,10000000,100000000,1000000000 };
>>
>> auto it = std::find_if(x.begin(), x.end(),
>> std::bind1st(std::less<uint32_t>(), i));
>
> Wouldn't std::lower_bound() be much better for that purpose?

Thanks! Did not think of that.

M4

JiiPee

unread,

Nov 22, 2014, 6:57:28 AM11/22/14

to

you guys are taking this whole thing to a new level now, :)
I was just thinking to make simple code... :)

Martijn Lievaart

unread,

Nov 22, 2014, 8:10:12 AM11/22/14

to

On Fri, 21 Nov 2014 07:35:25 -0800, Öö Tiib wrote:

> On Friday, 21 November 2014 14:35:13 UTC+2, Martijn Lievaart wrote:
>> On Thu, 20 Nov 2014 23:22:29 -0800, Öö Tiib wrote:
>> >
>> > I don't think it is best performer since the main point was
>> > aesthetically and logically clear 'goto' usage. :P
>>
>> Everyone knows you should not use gotos in C++!
>
> Everyone have been wrong before ... that earth is flat and what not.

What, the earth isn't flat?

> I don't get the joke.
> Demo:
>
> int main()
> {
> char str1[20] = "*garbage*"; uint32ToStr_goto(1234567890, str1
> );
> std::cout << "gotos:" << str1 << ":work" << std::endl;
>
> char str2[20] = "*garbage*"; uint32ToStr_duff(1234567890, str2
> );
> std::cout << "fun:" << str2 << ":here" << std::endl;
> }
>
> Output:
>
> gotos:1234567890:work fun:*garbage*:here
>
> Program ended with exit code: 0

OK, modulo the bugs:

#include <array>
#include <algorithm>
#include <iterator>

static std::array<uint32_t, 9> x = {
10,
100,
1000,
10000,
100000,
1000000,
10000000,
100000000,
1000000000
};

void uint32ToStr_duff(uint32_t i, char* c)
{

switch (std::distance(x.begin(), std::lower_bound(x.begin(), x.end(),
i))) {

case 9: *c++ = '0' + (i / 1000000000);
case 8: *c++ = '0' + (i % 1000000000) / 100000000;
case 7: *c++ = '0' + (i % 100000000) / 10000000;
case 6: *c++ = '0' + (i % 10000000) / 1000000;
case 5: *c++ = '0' + (i % 1000000) / 100000;
case 4: *c++ = '0' + (i % 100000) / 10000;
case 3: *c++ = '0' + (i % 10000) / 1000;
case 2: *c++ = '0' + (i % 1000) / 100;
case 1: *c++ = '0' + (i % 100) / 10;
case 0: *c++ = '0' + (i % 10);
*c = '\0';
}
}

:-)

M4

Luca Risolia

unread,

Nov 22, 2014, 1:24:33 PM11/22/14

to

Recursion may make the code simpler to understand:

void uint_to_str_(unsigned int i, char*& s) {
unsigned int j = i % 10;
if (i /= 10)
uint_to_str_(i, s);
*s++ = '0' + j;
}

void uint_to_str(unsigned int i, char* s) {
uint_to_str_(i, s);
*s = '\0';
}

Öö Tiib

unread,

Nov 22, 2014, 5:25:56 PM11/22/14

to

No. Most simple is to use standard library functions.
Can't get simpler by doing recursion instead of it.

Martijn Lievaart

unread,

Nov 22, 2014, 6:10:48 PM11/22/14

to

On Sat, 22 Nov 2014 14:25:34 -0800, Öö Tiib wrote:

> No. Most simple is to use standard library functions.
> Can't get simpler by doing recursion instead of it.

True! And let's not forget that! However, it is both fun and instructive
to rewrite a standard function. I got reacquainted with std::lower_bound
f.i.
M4

Luca Risolia

unread,

Nov 22, 2014, 6:42:38 PM11/22/14

to

The only fact that using standard algorithms makes your code longer
should be a good a sign that they are going to be useful to solve your
problem, not to mention that they can hardly help for converting
integers of any size.

Luca Risolia

unread,

Nov 22, 2014, 6:44:41 PM11/22/14

to

Il 23/11/2014 00:42, Luca Risolia ha scritto:
> The only fact that using standard algorithms makes your code longer
> should be a good a sign that they are going to be useful to solve your

typo..I meant to say "*NOT* going to be useful"