On 18 December 2012 00:25, bulk88 via RT <
perlbug-...@perl.org> wrote:
> On Sat Dec 15 18:28:47 2012, bulk88 wrote:
>> On Sat Dec 15 09:58:46 2012, demerphq wrote:
>> >
>> > It should be easier to compare now that I pushed smoke-me/tinymt32
>> >
>> > :-)
>> >
>> > Yves
>> >
>> >
>>
>> I will benchmark that soon.
>>
>
> VC2008, activeperl 5.12.3 32 bit, opteron 875 Server 2003 x64, nothing
> inlined, tinymt32, wellrng, and drand48 were all func calls in asm in
> RunRand, Perl_tinymt32_generate_U32 was NOT inlined away (unlike VC 2003
> later on)
> ______________________________________________________________________
> new time=14.107506, opt=-MD -Zi -DNDEBUG -O1 4970233.211236
> old time=0.639507, opt=-MD -Zi -DNDEBUG -O1 9997076.043274
> new kmx time=2.266837, opt=-MD -Zi -DNDEBUG -O1 10000148.315117
> new kmx bulk88 1 time=2.649125, opt=-MD -Zi -DNDEBUG -O1 10001037.278379
> new kmx bulk88 2 time=2.376385, opt=-MD -Zi -DNDEBUG -O1 9999965.239603
> new kmx bulk88 3 time=3.024081, opt=-MD -Zi -DNDEBUG -O1 9999970.013225
> just rand time=2.130050, opt=-MD -Zi -DNDEBUG -O1 1310540572828.000000
> rand_s time=16.052782, opt=-MD -Zi -DNDEBUG -O1 4999226.450462
> drand48 time=0.510524, opt=-MD -Zi -DNDEBUG -O1 10000680.628264
> TINYMT32 time=0.747539, opt=-MD -Zi -DNDEBUG -O1 9998574.703049
> WELLRNG512A time=0.465683, opt=-MD -Zi -DNDEBUG -O1 9998797.404216
> ______________________________________________________________________
> VC 2008, x64 Perl 5.17 (93a641ae382638ffd1980378be4810244d04f4b0),
> opteron 875 Server 2003 x64, asm shows everything in the DLL was inlined
> automatically by the compiler (drand48, tinymt32 and wellrng and all
> children calls were
> inlined). rand and rand_s can't be inlined for obvious reasons), this is
> apples to oranges, so I will have to repeat this with disabling the inlining
> ______________________________________________________________________
> new time=10.479223, opt=-MD -Zi -DNDEBUG -O1 -GL -GS- -favor:AMD64
> -fp:precise 4
> 970314.444337
> old time=0.427892, opt=-MD -Zi -DNDEBUG -O1 -GL -GS- -favor:AMD64
> -fp:precise 99
> 98732.212921
> new kmx time=1.629771, opt=-MD -Zi -DNDEBUG -O1 -GL -GS- -favor:AMD64
> -fp:precis
> e 9999588.845403
> new kmx bulk88 1 time=1.711441, opt=-MD -Zi -DNDEBUG -O1 -GL -GS-
> -favor:AMD64 -
> fp:precise 10001166.772142
> new kmx bulk88 2 time=1.639529, opt=-MD -Zi -DNDEBUG -O1 -GL -GS-
> -favor:AMD64 -
> fp:precise 9999228.837134
> new kmx bulk88 3 time=1.484076, opt=-MD -Zi -DNDEBUG -O1 -GL -GS-
> -favor:AMD64 -
> fp:precise 10000031.193598
> just rand time=1.429209, opt=-MD -Zi -DNDEBUG -O1 -GL -GS- -favor:AMD64
> -fp:prec
> ise 1310703320700.000000
> rand_s time=10.888868, opt=-MD -Zi -DNDEBUG -O1 -GL -GS- -favor:AMD64
> -fp:precis
> e 5000939.396292
> drand48 time=0.068243, opt=-MD -Zi -DNDEBUG -O1 -GL -GS- -favor:AMD64
> -fp:precis
> e 10001741.884865
> TINYMT32 time=0.427861, opt=-MD -Zi -DNDEBUG -O1 -GL -GS- -favor:AMD64
> -fp:preci
> se 9997611.127257
> WELLRNG512A time=0.254936, opt=-MD -Zi -DNDEBUG -O1 -GL -GS-
> -favor:AMD64 -fp:pr
> ecise 10001033.492300
> ______________________________________________________________________
> VC 2003, 32 bit Perl 5.17 93a641ae382638ffd1980378be4810244d04f4b0,
> Windows XP 32 bit, core 2 duo t7200, nothing was inlined in asm in
> RunRand, Perl_tinymt32_generate_U32 was inlined away.
> ______________________________________________________________________
> new time=6.801278, opt=-O1 -GL -arch:SSE2 -GF 4970279.404719
> old time=0.342173, opt=-O1 -GL -arch:SSE2 -GF 9998786.194519
> new kmx time=1.348873, opt=-O1 -GL -arch:SSE2 -GF 10001408.239714
> new kmx bulk88 1 time=1.284122, opt=-O1 -GL -arch:SSE2 -GF 9998050.657198
> new kmx bulk88 2 time=1.723125, opt=-O1 -GL -arch:SSE2 -GF 10001843.021335
> new kmx bulk88 3 time=1.273438, opt=-O1 -GL -arch:SSE2 -GF 10000463.515876
> just rand time=1.175337, opt=-O1 -GL -arch:SSE2 -GF 1310685646068.000000
> rand_s time=16.356017, opt=-O1 -GL -arch:SSE2 -GF 9997707.487858
> drand48 time=0.498862, opt=-O1 -GL -arch:SSE2 -GF 9999970.685309
> TINYMT32 time=0.601499, opt=-O1 -GL -arch:SSE2 -GF 10001623.470601
> WELLRNG512A time=0.488246, opt=-O1 -GL -arch:SSE2 -GF 9999946.450606
> _______________________________________________________________________
>
> I'm not sure how to interpret the benchmark results. Since Wellrng512a
> and drand48 are the same on 32 bit. drand48 on 32 bit windows makes a
> call to __allmul, Perl_wellrng512a_generate_double makes no calls. On
> x64 drand48 is many times faster than wellrng512a (both were inlined, I
> will try to stop that in the near future). wellrng512a takes 68 bytes
> (according to the asm in x64 dll), tinymt32 takes 28 bytes (according to
> the asm in x64 dll), drand48 takes 8 bytes. x64 Windows isn't aligning
> the U32s to U64. IDK if any other 64 bit platforms will try align all
> those U32s to U64s. drand48 is in a OS's clib. Do any OSes use
> wellrng512a and tinymt32 in their clibs?
>
> If someone is interested, I will write a patch to use drand48 on Windows
> only.
I will push a patch so everyone can use the same "drand48".
> One thing not addressed in this thread so far, what happens on platforms
> where an NV is a float (32), or a long double (80), or a quad
> float/__float128? None of the solutions in this ticket address that
> other my uselessly slow first try (with the Perl_pow and Perl_fmod calls).
I picked 32 bit based RNG's. There are other RNG's that are more
suitable for 64bit. I was thinking for instance of supporting TinyMT64
as well as TinyMT32. Also there are other RNG's to try. :-)