I have an interesting request: I am writing some scientific software that might be run on a PowerPC G5. I know that the G5 penalizes one quite severely for doing an int->float or a float->int conversion (speed-wise that is). I also know that other modern processors are also slow for conversions.
I'm looking for a pure floating point random number generator in C. I have searched all over the net and have not been able to find such a thing. The rationale is that a pure FPU PRNG would be inherently faster on the G5 than doing a conversion from an int PRNG to get random floating point numbers. (Not to mention that many modern CPUs actually do floating point math faster than int math anyway.)
I don't know, but in numerical computations all real-values random numbers (of diverse distributions) ultimately stem from underlying PRNGs (often only one PRNG) that work on integers, not floating numbers, as far as I am aware. So, I would think that such a thing doesn't exist. But one could on the other hand use the floating point hardware to do integer computations somehow (I have never attempted to learn actual details of this, so I may well err and someone would correct me in that case).
> I have an interesting request: I am writing some scientific > software that might be run on a PowerPC G5. I know that the > G5 penalizes one quite severely for doing an int->float or > a float->int conversion (speed-wise that is). I also know > that other modern processors are also slow for conversions.
> I'm looking for a pure floating point random number generator > in C. I have searched all over the net and have not been able > to find such a thing. The rationale is that a pure FPU PRNG > would be inherently faster on the G5 than doing a conversion > from an int PRNG to get random floating point numbers. (Not to > mention that many modern CPUs actually do floating point math > faster than int math anyway.)
> So, has anyone ever heard of such a thing?
I once tried writing a floating point PRNG by performing an IFFT on an array of 131,072 spectral components that I had set to 1+j0. Results were disappointing to say the least. The period was too short for any practical application and the speed was much slower than generating a random integer and fudging to float.
> I have an interesting request: I am writing some scientific > software that might be run on a PowerPC G5. I know that the > G5 penalizes one quite severely for doing an int->float or > a float->int conversion (speed-wise that is). I also know > that other modern processors are also slow for conversions.
Do you actually need floating point arithmetic or can you use 64 integer with scaling? Sometimes called fixed point.
> "Adam Ierymenko" <a...@N0SP4Mclearlight.com> wrote in message > news:pan.2004.02.04.03.20.39.760148@N0SP4Mclearlight.com... > > I have an interesting request: I am writing some scientific > > software that might be run on a PowerPC G5. I know that the > > G5 penalizes one quite severely for doing an int->float or > > a float->int conversion (speed-wise that is). I also know > > that other modern processors are also slow for conversions.
> Do you actually need floating point arithmetic or can you use > 64 integer with scaling? Sometimes called fixed point.
I have a question: I had the (vague/unsure) information that on machines with both integer and floating-point arithmetic units one could manage to have both units do integer computations, thus effectively getting more work done. Could this be true? If yes, could someone explain a little bit or give some references? Thanks.
> > "Adam Ierymenko" <a...@N0SP4Mclearlight.com> wrote in message > > news:pan.2004.02.04.03.20.39.760148@N0SP4Mclearlight.com... > > > I have an interesting request: I am writing some scientific > > > software that might be run on a PowerPC G5. I know that the > > > G5 penalizes one quite severely for doing an int->float or > > > a float->int conversion (speed-wise that is). I also know > > > that other modern processors are also slow for conversions.
> > Do you actually need floating point arithmetic or can you use > > 64 integer with scaling? Sometimes called fixed point.
> I have a question: I had the (vague/unsure) information > that on machines with both integer and floating-point > arithmetic units one could manage to have both units > do integer computations, thus effectively getting more > work done. Could this be true? If yes, could someone > explain a little bit or give some references? Thanks.
It is true.
Reference: Google.
The problem is though, you have to use smaller types. For example, I've ported the comba multiplier from LTM to use doubles with [iirc] 21-bit digits. The code worked flawlessly but with 21-bit digits you use more time [cuz there is little to no savings in cycle count on the Athlon for a fp mul vs. alu mul]
On older processors where the FP multiplier was heavily optimized and the ALU one wasn't [e.g. 486-586 era] this was a good idea. Even with the smaller digits you ended up saving time.
> The problem is though, you have to use smaller types.
[snip]
Then one could under circumstances use both units in parallel to get more work done. That could be valuable. For the OP, this means that he could anyway use the FP unit to realize the PRNGs in integers.
> > The problem is though, you have to use smaller types. > [snip]
> Then one could under circumstances use both units in > parallel to get more work done. That could be valuable. > For the OP, this means that he could anyway use the FP > unit to realize the PRNGs in integers.
Certainly. I think you need SSE2 [SSE1 only has single precision types] to accomplish this. You also have to make sure you're input/output is a multiple of two. But you also run into the trouble of non-128-bit-aligned stores... hmm...
Anyways, yes you can do it but you pay a price in making special considerations and other packing related woes [not so much with SSE].
Adam Ierymenko <a...@N0SP4Mclearlight.com> wrote in message <news:pan.2004.02.04.03.20.39.760148@N0SP4Mclearlight.com>... > I have an interesting request: I am writing some scientific > software that might be run on a PowerPC G5. I know that the > G5 penalizes one quite severely for doing an int->float or > a float->int conversion (speed-wise that is). I also know > that other modern processors are also slow for conversions.
> I'm looking for a pure floating point random number generator > in C. I have searched all over the net and have not been able > to find such a thing. The rationale is that a pure FPU PRNG > would be inherently faster on the G5 than doing a conversion > from an int PRNG to get random floating point numbers. (Not to > mention that many modern CPUs actually do floating point math > faster than int math anyway.)
> So, has anyone ever heard of such a thing?
Yes, I have.
In Knuth II (The Art of Computer Programming, Volume 2: Seminumerical Algorithms, Third Edition, by Donald Knuth) in Section 3.2.2, under Program A (Additive Number Generator) we find:
"Furthermore, as Richard Brent has observed, it [the Additive RNG] can be made to work correctly with floating point numbers, avoiding the need to convert between integers and fractions (see exercise 23)." (p. 28)
Unfortunately, exercise 23 does not really address the required magic. Presumably we only care about summing the fraction fields and stuff the exponent with the range we want at the end.
That said, I have doubts about how much time a finished design could possibly save.
/* this message intends to be a valid ANSI C program
In article <pan.2004.02.04.03.20.39.760...@N0SP4Mclearlight.com>,
Adam Ierymenko <a...@N0SP4Mclearlight.com> wrote: > I'm looking for a pure floating point random number > generator in C.
My 2 cents worth: */
/* <--- put what follows in a header file ---- */
/* doubleFastRng version 2 A random number generator using floating point in standard C. NOT crypto quality, but will still pass many tests.
2004-02-05 V2 Lower odds that gFastRngA enters a short cycle Added doubleFastRng 2004-02-04 V1 Creation
Public domain, by Francois Grieu */
/* Meat of the generator Use the following doubleFastRng() macro as you would use: double doubleFastRng(void);
Result is random-like and near-uniform in [0 to 1]
Note: low-order bits of mantissa are NOT very good Note: speed depends a lot on the speed of floor() Note: results are architecture dependent; this shows after like a hundred iterations.
Rationale: after the first pass, then from pass to pass, - gFastRngA is in range [0.07438 to 1.62009], non-uniform - gFastRngB is in range [0 to 1], near-uniform - gFastRngC is in range [0 to 1], random-like Also - self-feedback on gFastRngA is non-linear ("chaotic" ?) - gFastRngC has a small feedback on gFastRngA, this attempts to prevent a short cycle on gFastRngA */ #define doubleFastRng() ( \ (gFastRngA += gFastRngC*(1./9973)+3./11-floor(gFastRngA)), \ (gFastRngB += (gFastRngA *= gFastRngA)), \ (gFastRngC += (gFastRngB -= floor(gFastRngB))), \ (gFastRngC -= floor(gFastRngC)) \ )
/* Optional: seed the RNG Use the following doubleFastRng() macro as you would use: double doubleFastRngSeed(double x);
Seed value x can be any non-negative double (preferably < 1) */ #define doubleFastRngSeed(x) do { \ gFastRngA = (x); gFastRngB = gFastRngC = 0; \ } while(0)
/* Optional: define an additional state of the random generator in a local context, which might be faster. Seed value x can be any non-negative double (preferably < 1) */ #define NEWFASTRNG(x) \ double gFastRngA = (x), gFastRngB = 0, gFastRngC = 0
/* Define a default global context using above macro Note: will harmlessly define multiple equivalent contexts if the header file is included many times */ NEWFASTRNG(0);
#include <math.h> /* needed for floor(), used in doubleFastRng */
/* --- put the above in a header file ----> */
/* Test code for the above; values shown on second and third columns are expected to be of either sign, and often (but not allways) in the range [-1 to +1] */ #include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]) { NEWFASTRNG(0); /* define a local context for faster code */ double s=0,t=0,r; unsigned long j; if (argc>1 && argv[1]) /* seed the RNG if an argument is given */ doubleFastRngSeed(atof(argv[1])); for(j=1; j<=1<<28; ++j) { s += r = doubleFastRng() - 1./2; t += r*r - 1./12; if ((j&(j-1))==0) /* j a power of two */ printf("%10lu %-+12g %-+12g\n",j,s*2./sqrt(j),t*12./sqrt(j)); } return 0; }
Mok-Kong Shen <mok-kong.s...@t-online.de> writes: > I had the (vague/unsure) information that on machines with both > integer and floating-point arithmetic units one could manage to have > both units do integer computations, thus effectively getting more > work done. Could this be true? If yes, could someone explain a little > bit or give some references?
A fairly dramatic example was given by Bernstein at ECC 2001, where field arithmetic was implemented primarily with floating point registers.
A brief outline of the technique appears in our Springer book
Details for related work appears on Bernstein's site http://cr.yp.to.
The expensive conversions between integer and floating point formats are performed infrequently in elliptic curve scalar (point) multiplication, and his results were much faster than other published timings with conventional registers on common hardware (such as the Sun UltraSPARC and Intel Pentium 4, where integer multiplication is relatively slow).
I'm uncertain, however, if this matches your request to have "both units do integer computations". Bernstein's approach will use floating-point registers for most of the computations, in part since conversion tends to be expensive.
/* this message intends to be a valid ANSI C program
In article <pan.2004.02.04.03.20.39.760...@N0SP4Mclearlight.com>,
Adam Ierymenko <a...@N0SP4Mclearlight.com> wrote: > I'm looking for a pure floating point random number > generator in C.
My 2 cents worth: */
/* <--- put what follows in a header file ---- */
/* doubleFastRng version 2 A random number generator using floating point in standard C. NOT crypto quality, but will still pass many tests.
2004-02-05 V2.1 Better rationale 2004-02-05 V2 Lower odds that gFastRngA enters a short cycle Added doubleFastRng 2004-02-04 V1 Creation
Public domain, by Francois Grieu */
/* Meat of the generator Use the following doubleFastRng() macro as you would use: double doubleFastRng(void);
Result is random-like and near-uniform in [0 to 1]
Note: low-order bits of mantissa are NOT very good see below for more serious limitations Note: speed depends a lot on the speed of floor() Note: results are architecture dependent; this shows after like a hundred iterations.
Rationale: - some chaotic generator A is built - A is turned into a uniform generator B; consecutive pairs of B are NOT independent. - B is turned into a better uniform generator C; consecutive pairs of B sampled at random points should be well distributed, but triplets of C are NOT, this may matter in some applications (e.g. selecting random points in a cube); men this is a SIMPLE generator, not a perfect one ! After the first pass, then from pass to pass, - A is in range [0.07438 to 1.62009], non-uniform - B is in range [0 to 1], near-uniform - C is in range [0 to 1], random-like Also - ignoring a C*k term and precision considerations, the feedback for A is A <- ((A mod 1) + 3/11) ^ 2 - this transformation has no stationary point - order 2 cycles are unstable, and it is conjectured that higher order cycles are unstable, excluding numerical precision issues - but it is conceivable that a particlar seed will give a short cycle on a given architecture, and worst that such cycle is entered by accident - in a (perhap silly) attempt to counter this, generator C adds a small feedback on A; this probably makes the distribution of the output slightly less uniform; a binning test would reveal this, I guess.
/* Optional: seed the RNG Use the following doubleFastRng() macro as you would use: double doubleFastRngSeed(double x);
Seed value x can be any non-negative, non-huge double */ #define doubleFastRngSeed(x) do { \ gFastRngA = (x); gFastRngB = gFastRngC = 0; \ } while(0)
/* Optional: define an additional state of the random generator in a local context, which might be faster. Seed value x can be any non-negative, non-huge double */ #define NEWFASTRNG(x) \ double gFastRngA = (x), gFastRngB = 0, gFastRngC = 0
/* Define a default global context using above macro Note: will harmlessly define multiple equivalent contexts if the header file is included many times */ NEWFASTRNG(0);
#include <math.h> /* needed for floor(), used in doubleFastRng */
/* --- put the above in a header file ----> */
/* Test code for the above; values shown on second and third columns are expected to be of either sign, and often (but not allways) in the range [-1 to +1] */ #include <stdio.h> #include <stdlib.h> int main(int argc, char *argv[]) { NEWFASTRNG(0); /* define a local context for faster code */ double s=0,t=0,r; unsigned long j; if (argc>1 && argv[1]) /* seed the RNG if an argument is given */ doubleFastRngSeed(atof(argv[1])); for(j=1; j<=1<<28; ++j) { s += r = doubleFastRng() - 1./2; t += r*r - 1./12; if ((j&(j-1))==0) /* j a power of two */ printf("%10lu %-+12g %-+12g\n",j,s*2./sqrt(j),t*12./sqrt(j)); } return 0; }
> I have an interesting request: I am writing some scientific > software that might be run on a PowerPC G5. I know that the > G5 penalizes one quite severely for doing an int->float or > a float->int conversion (speed-wise that is). I also know > that other modern processors are also slow for conversions.
> I'm looking for a pure floating point random number generator > in C. I have searched all over the net and have not been able > to find such a thing. The rationale is that a pure FPU PRNG > would be inherently faster on the G5 than doing a conversion > from an int PRNG to get random floating point numbers. (Not to > mention that many modern CPUs actually do floating point math > faster than int math anyway.)
> So, has anyone ever heard of such a thing?
A method for generating 32-bit floating point random numbers directly, without the usual floating of a random integers, is in
Toward a universal random number generator, Statistics & Probability Letters, Volume 66, Issue 2, January 1990, Pages 35-39 George Marsaglia , Arif Zaman and Wai Wan Tsang . while a 64 bit version is in
The 64-bit universal RNG Statistics & Probability Letters, Volume 66, Issue 2, 15 January 2004, Pages 183-187 George Marsaglia and Wai Wan Tsang