> But if you're computing integer square roots for a large contiguous
> range of integer values, it's much faster to do it backwards, by
> iterating over the square roots rather than over the squares:
>
> Even if this isn't what you're doing, a bit more description might lead
> to a considerably more efficient solution.
>
> --
> Keith Thompson (The_Other_Keith)
kst...@mib.org <
http://www.ghoti.net/~kst>
> Working, but not speaking, for JetHead Development, Inc.
> "We must do something. This is something. Therefore, we must do this."
> -- Antony Jay and Jonathan Lynn, "Yes Minister"
Keith Thompson: You are brilliant!
This approach reminded me when I first started this adventure seeking an answer to the as yet unsolved problem back in 1986. I wrote my program in Fortran and ran it for several months (until I got busted) on an IBM 360 32bit mainframe. I wrote the program a second time to support 100 digits after exhausting the 32bit number space.
So, yesterday I wrote the code to implement the "Thompson" method of scanning to
the next expected result using squares instead of Square Roots. (just as I did so long ago). I was able to reduce the scanning code to the following Single C Statement:
int128 i, c, c2;
i = bla(starting);
c2 = bla(target);
for ( c = i; (c*c) < c2; c++ );
/* test for exact */
Turns out this works fantastic until the numbers get big, (64bits) then it bogs.
Scanning ultimately takes longer than any SQRT function.
SQRT is KING.
The following data are hardly accurate results, BUT, yields a "feel" for my misery. :)
All tests done on lcc-win-64 with various configurations for accomplishing Square Root.
(includes at least 5% housekeeping and a few writes to disk)
67,10,18,982 64bit Integer SQRT Assembler on all 10 digit numbers - 18 seconds.
Fantastic speed but limited to the 64bit number space!
In all tests the FSQRT function takes longer except when an exact result exists.
and also there is the overhead from long float to I128 translation.
671,018,982 128 Integer Sqrt on 10 digit numbers using modified Newton method.
(the C code for this is listed in a previous post.)
174,464,936,096 128bit Sqrts Mod Newton method on 11-12 digit nums 87 hours 51 minutes
249,999,999,998 128bit Sqrts Mod Newton method on 12 digit numbers 85 hours 43 minutes
299,999,999,997 128bit Sqrts Mod Newton method on 12 digit numbers 124 Hours 19 minutes
Apples and Oranges:
lcc-win 64bit/128bit tests were run on:
Intel Core i7 975 Extreme Edition Bloomfield 4.2GHz DDR3 2600
in a VMware Workstation VM (Total system overhead already at 42%)
lcc-win 128bit modified Newton method were run on:
Intel Xeon W3680 3.33GHz DDR3 1333
in a VMware ESXi 6.1 VM (Total system overhead already at 56%)
My Haswell total screamer is arriving next week! Oops, ordered before it was
pointed out to me that the AVS enhanced assembler instruction SQRTpd does not do 128bits! DOH!
Now, I'm looking at storing previous results in a GIANT array for successive
passes on new numbers. This would use the "Thompson" method as an index into results!
Thinking out loud here, there are 1,099,511,627,776 bytes in a gigabyte.
Therefore there are 68,719,476,736 128bit integers in a Gigabyte.
So, there are 2,199,023,255,552 128bit integers storable in 32Gigs of RAM.
and 8,796,093,022,208 128 integers storable in 128Gigs of RAM.
(Side note: a Database is out of the question even if it is on an SSD,
the overhead still doesn't even come close to a one reference RAM array)
I could use the RAM array beginning at the 64bit boundary just like to do now
by switching to the Mod Newton method when the assembler 64bit SQRT runs out of gas.
An GIANT (several Gazillion) array of 128bit results does provide some leeway.
The question becomes how to implement in lcc-win64 ?!?
int128 Big128[ 8796093022208 ];
or
malloc ( 549755813888 * sizeof(int128) ];
ROFL man... are we having fun yet? shouldn't do multiple allocs as that turns the objective into a kind of BTree solution. More and more indexes...
What is the largest static array of 128bit ints that can be defined in lcc-win64?
What is the largest contiguous dynamic memory allocation that can be done in lcc-win64?
Regards,
KW