Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

C code for pure float PRNG

38 views
Skip to first unread message

Adam Ierymenko

unread,
Feb 3, 2004, 10:19:40 PM2/3/04
to
I have an interesting request: I am writing some scientific
software that might be run on a PowerPC G5. I know that the
G5 penalizes one quite severely for doing an int->float or
a float->int conversion (speed-wise that is). I also know
that other modern processors are also slow for conversions.

I'm looking for a pure floating point random number generator
in C. I have searched all over the net and have not been able
to find such a thing. The rationale is that a pure FPU PRNG
would be inherently faster on the G5 than doing a conversion
from an int PRNG to get random floating point numbers. (Not to
mention that many modern CPUs actually do floating point math
faster than int math anyway.)

So, has anyone ever heard of such a thing?

Mok-Kong Shen

unread,
Feb 4, 2004, 4:12:26 AM2/4/04
to

Adam Ierymenko wrote:
>

> So, has anyone ever heard of such a thing?

I don't know, but in numerical computations all
real-values random numbers (of diverse distributions)
ultimately stem from underlying PRNGs (often only
one PRNG) that work on integers, not floating numbers,
as far as I am aware. So, I would think that such
a thing doesn't exist. But one could on the other hand
use the floating point hardware to do integer
computations somehow (I have never attempted to learn
actual details of this, so I may well err and someone
would correct me in that case).

M. K. Shen

John E. Hadstate

unread,
Feb 4, 2004, 6:36:01 AM2/4/04
to

"Adam Ierymenko" <a...@N0SP4Mclearlight.com> wrote in message
news:pan.2004.02.04....@N0SP4Mclearlight.com...

I once tried writing a floating point PRNG by performing an IFFT on an array
of 131,072 spectral components that I had set to 1+j0. Results were
disappointing to say the least. The period was too short for any practical
application and the speed was much slower than generating a random integer
and fudging to float.


Andrew Swallow

unread,
Feb 4, 2004, 9:13:54 AM2/4/04
to
"Adam Ierymenko" <a...@N0SP4Mclearlight.com> wrote in message
news:pan.2004.02.04....@N0SP4Mclearlight.com...
> I have an interesting request: I am writing some scientific
> software that might be run on a PowerPC G5. I know that the
> G5 penalizes one quite severely for doing an int->float or
> a float->int conversion (speed-wise that is). I also know
> that other modern processors are also slow for conversions.
>
Do you actually need floating point arithmetic or can you use
64 integer with scaling? Sometimes called fixed point.

Andrew Swallow

Mok-Kong Shen

unread,
Feb 4, 2004, 9:23:30 AM2/4/04
to

I have a question: I had the (vague/unsure) information
that on machines with both integer and floating-point
arithmetic units one could manage to have both units
do integer computations, thus effectively getting more
work done. Could this be true? If yes, could someone
explain a little bit or give some references? Thanks.

M. K. Shen

Tom St Denis

unread,
Feb 4, 2004, 9:44:26 AM2/4/04
to

"Mok-Kong Shen" <mok-ko...@t-online.de> wrote in message
news:40210062...@t-online.de...

It is true.

Reference: Google.

The problem is though, you have to use smaller types. For example, I've
ported the comba multiplier from LTM to use doubles with [iirc] 21-bit
digits. The code worked flawlessly but with 21-bit digits you use more time
[cuz there is little to no savings in cycle count on the Athlon for a fp mul
vs. alu mul]

On older processors where the FP multiplier was heavily optimized and the
ALU one wasn't [e.g. 486-586 era] this was a good idea. Even with the
smaller digits you ended up saving time.

Tom

Mok-Kong Shen

unread,
Feb 4, 2004, 9:53:00 AM2/4/04
to

Tom St Denis wrote:
>

> It is true.
>
> Reference: Google.
>
> The problem is though, you have to use smaller types.

[snip]

Then one could under circumstances use both units in
parallel to get more work done. That could be valuable.
For the OP, this means that he could anyway use the FP
unit to realize the PRNGs in integers.

M. K. Shen

Tom St Denis

unread,
Feb 4, 2004, 9:58:32 AM2/4/04
to

"Mok-Kong Shen" <mok-ko...@t-online.de> wrote in message
news:4021074C...@t-online.de...

Certainly. I think you need SSE2 [SSE1 only has single precision types] to
accomplish this. You also have to make sure you're input/output is a
multiple of two. But you also run into the trouble of non-128-bit-aligned
stores... hmm...

Anyways, yes you can do it but you pay a price in making special
considerations and other packing related woes [not so much with SSE].

Tom


Terry Ritter

unread,
Feb 4, 2004, 1:00:30 PM2/4/04
to
Adam Ierymenko <a...@N0SP4Mclearlight.com> wrote in message news:<pan.2004.02.04....@N0SP4Mclearlight.com>...

Yes, I have.

In Knuth II (The Art of Computer Programming, Volume 2:
Seminumerical Algorithms, Third Edition, by Donald
Knuth) in Section 3.2.2, under Program A (Additive
Number Generator) we find:

"Furthermore, as Richard Brent has observed, it [the
Additive RNG] can be made to work correctly with
floating point numbers, avoiding the need to convert
between integers and fractions (see exercise 23)."
(p. 28)

Unfortunately, exercise 23 does not really address the
required magic. Presumably we only care about summing
the fraction fields and stuff the exponent with the
range we want at the end.

That said, I have doubts about how much time a finished
design could possibly save.

---
Terry Ritter http://www.ciphersbyritter.com
Crypto Glossary http://www.ciphersbyritter.com/GLOSSARY.HTM

Randy Howard

unread,
Feb 4, 2004, 2:03:29 PM2/4/04
to
In article <4021074C...@t-online.de>, mok-ko...@t-online.de says...

> Then one could under circumstances use both units in
> parallel to get more work done. That could be valuable.

It would be a lot simpler to just get a SMP system if you need that much
"horsepower".

--
Randy Howard
2reply remove FOOBAR

Francois Grieu

unread,
Feb 4, 2004, 6:13:10 PM2/4/04
to
/* this message intends to be a valid ANSI C program

In article <pan.2004.02.04....@N0SP4Mclearlight.com>,
Adam Ierymenko <a...@N0SP4Mclearlight.com> wrote:

> I'm looking for a pure floating point random number
> generator in C.

My 2 cents worth:
*/

/* <--- put what follows in a header file ---- */

/* doubleFastRng version 2
A random number generator using floating point in standard C.
NOT crypto quality, but will still pass many tests.

2004-02-05 V2 Lower odds that gFastRngA enters a short cycle
Added doubleFastRng
2004-02-04 V1 Creation

Public domain, by Francois Grieu
*/


/* Meat of the generator
Use the following doubleFastRng() macro as you would use:
double doubleFastRng(void);

Result is random-like and near-uniform in [0 to 1]

Note: low-order bits of mantissa are NOT very good
Note: speed depends a lot on the speed of floor()
Note: results are architecture dependent; this shows
after like a hundred iterations.

Rationale: after the first pass, then from pass to pass,
- gFastRngA is in range [0.07438 to 1.62009], non-uniform
- gFastRngB is in range [0 to 1], near-uniform
- gFastRngC is in range [0 to 1], random-like
Also
- self-feedback on gFastRngA is non-linear ("chaotic" ?)
- gFastRngC has a small feedback on gFastRngA, this attempts
to prevent a short cycle on gFastRngA
*/
#define doubleFastRng() ( \
(gFastRngA += gFastRngC*(1./9973)+3./11-floor(gFastRngA)), \
(gFastRngB += (gFastRngA *= gFastRngA)), \
(gFastRngC += (gFastRngB -= floor(gFastRngB))), \
(gFastRngC -= floor(gFastRngC)) \
)


/* Optional: seed the RNG
Use the following doubleFastRng() macro as you would use:
double doubleFastRngSeed(double x);

Seed value x can be any non-negative double (preferably < 1)
*/
#define doubleFastRngSeed(x) do { \
gFastRngA = (x); gFastRngB = gFastRngC = 0; \
} while(0)


/* Optional: define an additional state of the random generator
in a local context, which might be faster.
Seed value x can be any non-negative double (preferably < 1)
*/
#define NEWFASTRNG(x) \
double gFastRngA = (x), gFastRngB = 0, gFastRngC = 0


/* Define a default global context using above macro
Note: will harmlessly define multiple equivalent contexts
if the header file is included many times
*/
NEWFASTRNG(0);


#include <math.h> /* needed for floor(), used in doubleFastRng */

/* --- put the above in a header file ----> */

/* Test code for the above; values shown on second and third
columns are expected to be of either sign, and often
(but not allways) in the range [-1 to +1] */
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
NEWFASTRNG(0); /* define a local context for faster code */
double s=0,t=0,r;
unsigned long j;
if (argc>1 && argv[1]) /* seed the RNG if an argument is given */
doubleFastRngSeed(atof(argv[1]));
for(j=1; j<=1<<28; ++j)
{
s += r = doubleFastRng() - 1./2;
t += r*r - 1./12;
if ((j&(j-1))==0) /* j a power of two */
printf("%10lu %-+12g %-+12g\n",j,s*2./sqrt(j),t*12./sqrt(j));
}
return 0;
}

/* results of the above on a PowerPC G4 with MrC under MPW
1 -0.85124 +1.17383
2 -0.928248 +0.574721
4 -0.88665 +0.221971
8 -0.32914 +0.705442
16 -0.630065 +0.736995
32 -0.0536323 +1.27563
64 -0.291745 +0.51509
128 -0.37266 +0.745991
256 -0.758367 +0.8036
512 -0.23023 +0.40877
1024 +0.322027 +0.863811
2048 +0.686953 +0.403408
4096 +0.274433 +1.66491
8192 -0.109999 +0.515839
16384 -0.487148 +0.28429
32768 -0.368597 +0.917197
65536 -0.364967 +1.70257
131072 +0.0397136 +1.73647
262144 +0.21818 +1.96605
524288 -0.369138 +0.767671
1048576 -0.804561 -0.926822
2097152 -0.154766 -1.0003
4194304 +0.440308 -1.4173
8388608 +0.443935 -2.18429
16777216 -0.507174 -0.808045
33554432 -0.250194 +0.234557
67108864 -0.384728 -1.01272
134217728 -0.40035 -1.04248
268435456 +0.576571 -0.562746
*/

/* results of the above on an Athlon XP with GCC under Win32
1 -0.85124 +1.17383
2 -0.928248 +0.574721
4 -0.88665 +0.221971
8 -0.32914 +0.705442
16 -0.630065 +0.736995
32 -0.0536323 +1.27563
64 -0.291745 +0.51509
128 -0.514703 +1.52217
256 -0.212286 +1.24088
512 -0.422173 -0.272143
1024 -1.14005 +0.293329
2048 -0.952017 +0.819124
4096 -0.583247 +0.982798
8192 -0.589248 +1.13498
16384 -0.215161 +2.21373
32768 -0.308418 +0.281441
65536 -0.402067 +0.0375496
131072 +0.191811 +0.447173
262144 +0.25173 +0.156406
524288 +0.345341 +0.834386
1048576 +0.240871 +0.985605
2097152 +0.0712098 +0.489712
4194304 +0.224906 +0.160624
8388608 -0.661534 +0.23568
16777216 -0.73162 -0.421065
33554432 -0.581571 -0.299703
67108864 +0.677287 -0.984173
134217728 +0.150005 -0.23203
268435456 +0.352258 +0.0239879
*/

/* Francois Grieu */

Darrel Hankerson

unread,
Feb 4, 2004, 7:16:23 PM2/4/04
to
Mok-Kong Shen <mok-ko...@t-online.de> writes:

> I had the (vague/unsure) information that on machines with both
> integer and floating-point arithmetic units one could manage to have
> both units do integer computations, thus effectively getting more
> work done. Could this be true? If yes, could someone explain a little
> bit or give some references?

A fairly dramatic example was given by Bernstein at ECC 2001, where
field arithmetic was implemented primarily with floating point
registers.

A brief outline of the technique appears in our Springer book

Guide to Elliptic Curve Cryptography
http://www.cacr.math.uwaterloo.ca/ecc/

Details for related work appears on Bernstein's site http://cr.yp.to.

The expensive conversions between integer and floating point formats
are performed infrequently in elliptic curve scalar (point)
multiplication, and his results were much faster than other published
timings with conventional registers on common hardware (such as the
Sun UltraSPARC and Intel Pentium 4, where integer multiplication is
relatively slow).

I'm uncertain, however, if this matches your request to have
"both units do integer computations". Bernstein's approach will use
floating-point registers for most of the computations, in part since
conversion tends to be expensive.

--Darrel Hankerson

Francois Grieu

unread,
Feb 5, 2004, 2:28:55 AM2/5/04
to
/* this message intends to be a valid ANSI C program

In article <pan.2004.02.04....@N0SP4Mclearlight.com>,
Adam Ierymenko <a...@N0SP4Mclearlight.com> wrote:

> I'm looking for a pure floating point random number
> generator in C.

My 2 cents worth:
*/

/* <--- put what follows in a header file ---- */

/* doubleFastRng version 2
A random number generator using floating point in standard C.

NOT crypto quality, but will still pass many tests.

2004-02-05 V2.1 Better rationale


2004-02-05 V2 Lower odds that gFastRngA enters a short cycle
Added doubleFastRng
2004-02-04 V1 Creation

Public domain, by Francois Grieu
*/


/* Meat of the generator
Use the following doubleFastRng() macro as you would use:
double doubleFastRng(void);

Result is random-like and near-uniform in [0 to 1]

Note: low-order bits of mantissa are NOT very good

see below for more serious limitations


Note: speed depends a lot on the speed of floor()
Note: results are architecture dependent; this shows
after like a hundred iterations.

Rationale:
- some chaotic generator A is built
- A is turned into a uniform generator B; consecutive
pairs of B are NOT independent.
- B is turned into a better uniform generator C;
consecutive pairs of B sampled at random points should
be well distributed, but triplets of C are NOT,
this may matter in some applications (e.g. selecting
random points in a cube); men this is a SIMPLE
generator, not a perfect one !
After the first pass, then from pass to pass,
- A is in range [0.07438 to 1.62009], non-uniform
- B is in range [0 to 1], near-uniform
- C is in range [0 to 1], random-like
Also
- ignoring a C*k term and precision considerations,
the feedback for A is
A <- ((A mod 1) + 3/11) ^ 2
- this transformation has no stationary point
- order 2 cycles are unstable, and it is conjectured
that higher order cycles are unstable, excluding
numerical precision issues
- but it is conceivable that a particlar seed will give
a short cycle on a given architecture, and worst that
such cycle is entered by accident
- in a (perhap silly) attempt to counter this, generator
C adds a small feedback on A; this probably makes the
distribution of the output slightly less uniform;
a binning test would reveal this, I guess.

*/
#define doubleFastRng() ( \
(gFastRngA += gFastRngC*(1./9973)+3./11-floor(gFastRngA)), \
(gFastRngB += (gFastRngA *= gFastRngA)), \
(gFastRngC += (gFastRngB -= floor(gFastRngB))), \
(gFastRngC -= floor(gFastRngC)) \
)


/* Optional: seed the RNG
Use the following doubleFastRng() macro as you would use:
double doubleFastRngSeed(double x);

Seed value x can be any non-negative, non-huge double


*/
#define doubleFastRngSeed(x) do { \
gFastRngA = (x); gFastRngB = gFastRngC = 0; \
} while(0)


/* Optional: define an additional state of the random generator
in a local context, which might be faster.

Seed value x can be any non-negative, non-huge double

George Marsaglia

unread,
Feb 5, 2004, 7:19:31 AM2/5/04
to

"Adam Ierymenko" <a...@N0SP4Mclearlight.com> wrote in message
news:pan.2004.02.04....@N0SP4Mclearlight.com...


A method for generating 32-bit floating point random numbers directly,
without the usual floating of a random integers, is in

Toward a universal random number generator,
Statistics & Probability Letters, Volume 66, Issue 2, January 1990, Pages
35-39
George Marsaglia , Arif Zaman and Wai Wan Tsang
.
while a 64 bit version is in

The 64-bit universal RNG
Statistics & Probability Letters, Volume 66, Issue 2, 15 January 2004, Pages
183-187
George Marsaglia and Wai Wan Tsang

George Marsaglia


0 new messages