Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

38 views

Skip to first unread message

Feb 3, 2004, 10:19:40 PM2/3/04

to

I have an interesting request: I am writing some scientific

software that might be run on a PowerPC G5. I know that the

G5 penalizes one quite severely for doing an int->float or

a float->int conversion (speed-wise that is). I also know

that other modern processors are also slow for conversions.

software that might be run on a PowerPC G5. I know that the

G5 penalizes one quite severely for doing an int->float or

a float->int conversion (speed-wise that is). I also know

that other modern processors are also slow for conversions.

I'm looking for a pure floating point random number generator

in C. I have searched all over the net and have not been able

to find such a thing. The rationale is that a pure FPU PRNG

would be inherently faster on the G5 than doing a conversion

from an int PRNG to get random floating point numbers. (Not to

mention that many modern CPUs actually do floating point math

faster than int math anyway.)

So, has anyone ever heard of such a thing?

Feb 4, 2004, 4:12:26 AM2/4/04

to

Adam Ierymenko wrote:

>

> So, has anyone ever heard of such a thing?

I don't know, but in numerical computations all

real-values random numbers (of diverse distributions)

ultimately stem from underlying PRNGs (often only

one PRNG) that work on integers, not floating numbers,

as far as I am aware. So, I would think that such

a thing doesn't exist. But one could on the other hand

use the floating point hardware to do integer

computations somehow (I have never attempted to learn

actual details of this, so I may well err and someone

would correct me in that case).

M. K. Shen

Feb 4, 2004, 6:36:01 AM2/4/04

to

"Adam Ierymenko" <a...@N0SP4Mclearlight.com> wrote in message

news:pan.2004.02.04....@N0SP4Mclearlight.com...

I once tried writing a floating point PRNG by performing an IFFT on an array

of 131,072 spectral components that I had set to 1+j0. Results were

disappointing to say the least. The period was too short for any practical

application and the speed was much slower than generating a random integer

and fudging to float.

Feb 4, 2004, 9:13:54 AM2/4/04

to

"Adam Ierymenko" <a...@N0SP4Mclearlight.com> wrote in message

news:pan.2004.02.04....@N0SP4Mclearlight.com...

news:pan.2004.02.04....@N0SP4Mclearlight.com...

> I have an interesting request: I am writing some scientific

> software that might be run on a PowerPC G5. I know that the

> G5 penalizes one quite severely for doing an int->float or

> a float->int conversion (speed-wise that is). I also know

> that other modern processors are also slow for conversions.

>

Do you actually need floating point arithmetic or can you use> software that might be run on a PowerPC G5. I know that the

> G5 penalizes one quite severely for doing an int->float or

> a float->int conversion (speed-wise that is). I also know

> that other modern processors are also slow for conversions.

>

64 integer with scaling? Sometimes called fixed point.

Andrew Swallow

Feb 4, 2004, 9:23:30 AM2/4/04

to

I have a question: I had the (vague/unsure) information

that on machines with both integer and floating-point

arithmetic units one could manage to have both units

do integer computations, thus effectively getting more

work done. Could this be true? If yes, could someone

explain a little bit or give some references? Thanks.

M. K. Shen

Feb 4, 2004, 9:44:26 AM2/4/04

to

"Mok-Kong Shen" <mok-ko...@t-online.de> wrote in message

news:40210062...@t-online.de...

It is true.

Reference: Google.

The problem is though, you have to use smaller types. For example, I've

ported the comba multiplier from LTM to use doubles with [iirc] 21-bit

digits. The code worked flawlessly but with 21-bit digits you use more time

[cuz there is little to no savings in cycle count on the Athlon for a fp mul

vs. alu mul]

On older processors where the FP multiplier was heavily optimized and the

ALU one wasn't [e.g. 486-586 era] this was a good idea. Even with the

smaller digits you ended up saving time.

Tom

Feb 4, 2004, 9:53:00 AM2/4/04

to

Tom St Denis wrote:

>

> It is true.

>

> Reference: Google.

>

> The problem is though, you have to use smaller types.

[snip]

Then one could under circumstances use both units in

parallel to get more work done. That could be valuable.

For the OP, this means that he could anyway use the FP

unit to realize the PRNGs in integers.

M. K. Shen

Feb 4, 2004, 9:58:32 AM2/4/04

to

news:4021074C...@t-online.de...

Certainly. I think you need SSE2 [SSE1 only has single precision types] to

accomplish this. You also have to make sure you're input/output is a

multiple of two. But you also run into the trouble of non-128-bit-aligned

stores... hmm...

Anyways, yes you can do it but you pay a price in making special

considerations and other packing related woes [not so much with SSE].

Tom

Feb 4, 2004, 1:00:30 PM2/4/04

to

Adam Ierymenko <a...@N0SP4Mclearlight.com> wrote in message news:<pan.2004.02.04....@N0SP4Mclearlight.com>...

Yes, I have.

In Knuth II (The Art of Computer Programming, Volume 2:

Seminumerical Algorithms, Third Edition, by Donald

Knuth) in Section 3.2.2, under Program A (Additive

Number Generator) we find:

"Furthermore, as Richard Brent has observed, it [the

Additive RNG] can be made to work correctly with

floating point numbers, avoiding the need to convert

between integers and fractions (see exercise 23)."

(p. 28)

Unfortunately, exercise 23 does not really address the

required magic. Presumably we only care about summing

the fraction fields and stuff the exponent with the

range we want at the end.

That said, I have doubts about how much time a finished

design could possibly save.

---

Terry Ritter http://www.ciphersbyritter.com

Crypto Glossary http://www.ciphersbyritter.com/GLOSSARY.HTM

Feb 4, 2004, 2:03:29 PM2/4/04

to

In article <4021074C...@t-online.de>, mok-ko...@t-online.de says...

> Then one could under circumstances use both units in

> parallel to get more work done. That could be valuable.

> Then one could under circumstances use both units in

> parallel to get more work done. That could be valuable.

It would be a lot simpler to just get a SMP system if you need that much

"horsepower".

--

Randy Howard

2reply remove FOOBAR

Feb 4, 2004, 6:13:10 PM2/4/04

to

/* this message intends to be a valid ANSI C program

In article <pan.2004.02.04....@N0SP4Mclearlight.com>,

Adam Ierymenko <a...@N0SP4Mclearlight.com> wrote:

> I'm looking for a pure floating point random number

> generator in C.

My 2 cents worth:

*/

/* <--- put what follows in a header file ---- */

/* doubleFastRng version 2

A random number generator using floating point in standard C.

NOT crypto quality, but will still pass many tests.

2004-02-05 V2 Lower odds that gFastRngA enters a short cycle

Added doubleFastRng

2004-02-04 V1 Creation

Public domain, by Francois Grieu

*/

/* Meat of the generator

Use the following doubleFastRng() macro as you would use:

double doubleFastRng(void);

Result is random-like and near-uniform in [0 to 1]

Note: low-order bits of mantissa are NOT very good

Note: speed depends a lot on the speed of floor()

Note: results are architecture dependent; this shows

after like a hundred iterations.

Rationale: after the first pass, then from pass to pass,

- gFastRngA is in range [0.07438 to 1.62009], non-uniform

- gFastRngB is in range [0 to 1], near-uniform

- gFastRngC is in range [0 to 1], random-like

Also

- self-feedback on gFastRngA is non-linear ("chaotic" ?)

- gFastRngC has a small feedback on gFastRngA, this attempts

to prevent a short cycle on gFastRngA

*/

#define doubleFastRng() ( \

(gFastRngA += gFastRngC*(1./9973)+3./11-floor(gFastRngA)), \

(gFastRngB += (gFastRngA *= gFastRngA)), \

(gFastRngC += (gFastRngB -= floor(gFastRngB))), \

(gFastRngC -= floor(gFastRngC)) \

)

/* Optional: seed the RNG

Use the following doubleFastRng() macro as you would use:

double doubleFastRngSeed(double x);

Seed value x can be any non-negative double (preferably < 1)

*/

#define doubleFastRngSeed(x) do { \

gFastRngA = (x); gFastRngB = gFastRngC = 0; \

} while(0)

/* Optional: define an additional state of the random generator

in a local context, which might be faster.

Seed value x can be any non-negative double (preferably < 1)

*/

#define NEWFASTRNG(x) \

double gFastRngA = (x), gFastRngB = 0, gFastRngC = 0

/* Define a default global context using above macro

Note: will harmlessly define multiple equivalent contexts

if the header file is included many times

*/

NEWFASTRNG(0);

#include <math.h> /* needed for floor(), used in doubleFastRng */

/* --- put the above in a header file ----> */

/* Test code for the above; values shown on second and third

columns are expected to be of either sign, and often

(but not allways) in the range [-1 to +1] */

#include <stdio.h>

#include <stdlib.h>

int main(int argc, char *argv[])

{

NEWFASTRNG(0); /* define a local context for faster code */

double s=0,t=0,r;

unsigned long j;

if (argc>1 && argv[1]) /* seed the RNG if an argument is given */

doubleFastRngSeed(atof(argv[1]));

for(j=1; j<=1<<28; ++j)

{

s += r = doubleFastRng() - 1./2;

t += r*r - 1./12;

if ((j&(j-1))==0) /* j a power of two */

printf("%10lu %-+12g %-+12g\n",j,s*2./sqrt(j),t*12./sqrt(j));

}

return 0;

}

/* results of the above on a PowerPC G4 with MrC under MPW

1 -0.85124 +1.17383

2 -0.928248 +0.574721

4 -0.88665 +0.221971

8 -0.32914 +0.705442

16 -0.630065 +0.736995

32 -0.0536323 +1.27563

64 -0.291745 +0.51509

128 -0.37266 +0.745991

256 -0.758367 +0.8036

512 -0.23023 +0.40877

1024 +0.322027 +0.863811

2048 +0.686953 +0.403408

4096 +0.274433 +1.66491

8192 -0.109999 +0.515839

16384 -0.487148 +0.28429

32768 -0.368597 +0.917197

65536 -0.364967 +1.70257

131072 +0.0397136 +1.73647

262144 +0.21818 +1.96605

524288 -0.369138 +0.767671

1048576 -0.804561 -0.926822

2097152 -0.154766 -1.0003

4194304 +0.440308 -1.4173

8388608 +0.443935 -2.18429

16777216 -0.507174 -0.808045

33554432 -0.250194 +0.234557

67108864 -0.384728 -1.01272

134217728 -0.40035 -1.04248

268435456 +0.576571 -0.562746

*/

/* results of the above on an Athlon XP with GCC under Win32

1 -0.85124 +1.17383

2 -0.928248 +0.574721

4 -0.88665 +0.221971

8 -0.32914 +0.705442

16 -0.630065 +0.736995

32 -0.0536323 +1.27563

64 -0.291745 +0.51509

128 -0.514703 +1.52217

256 -0.212286 +1.24088

512 -0.422173 -0.272143

1024 -1.14005 +0.293329

2048 -0.952017 +0.819124

4096 -0.583247 +0.982798

8192 -0.589248 +1.13498

16384 -0.215161 +2.21373

32768 -0.308418 +0.281441

65536 -0.402067 +0.0375496

131072 +0.191811 +0.447173

262144 +0.25173 +0.156406

524288 +0.345341 +0.834386

1048576 +0.240871 +0.985605

2097152 +0.0712098 +0.489712

4194304 +0.224906 +0.160624

8388608 -0.661534 +0.23568

16777216 -0.73162 -0.421065

33554432 -0.581571 -0.299703

67108864 +0.677287 -0.984173

134217728 +0.150005 -0.23203

268435456 +0.352258 +0.0239879

*/

/* Francois Grieu */

Feb 4, 2004, 7:16:23 PM2/4/04

to

Mok-Kong Shen <mok-ko...@t-online.de> writes:

> I had the (vague/unsure) information that on machines with both

> integer and floating-point arithmetic units one could manage to have

> both units do integer computations, thus effectively getting more

> work done. Could this be true? If yes, could someone explain a little

> bit or give some references?

A fairly dramatic example was given by Bernstein at ECC 2001, where

field arithmetic was implemented primarily with floating point

registers.

A brief outline of the technique appears in our Springer book

Guide to Elliptic Curve Cryptography

http://www.cacr.math.uwaterloo.ca/ecc/

Details for related work appears on Bernstein's site http://cr.yp.to.

The expensive conversions between integer and floating point formats

are performed infrequently in elliptic curve scalar (point)

multiplication, and his results were much faster than other published

timings with conventional registers on common hardware (such as the

Sun UltraSPARC and Intel Pentium 4, where integer multiplication is

relatively slow).

I'm uncertain, however, if this matches your request to have

"both units do integer computations". Bernstein's approach will use

floating-point registers for most of the computations, in part since

conversion tends to be expensive.

--Darrel Hankerson

Feb 5, 2004, 2:28:55 AM2/5/04

to

/* this message intends to be a valid ANSI C program

In article <pan.2004.02.04....@N0SP4Mclearlight.com>,

Adam Ierymenko <a...@N0SP4Mclearlight.com> wrote:

> I'm looking for a pure floating point random number

> generator in C.

My 2 cents worth:

*/

/* <--- put what follows in a header file ---- */

/* doubleFastRng version 2

A random number generator using floating point in standard C.

NOT crypto quality, but will still pass many tests.

2004-02-05 V2.1 Better rationale

2004-02-05 V2 Lower odds that gFastRngA enters a short cycle

Added doubleFastRng

2004-02-04 V1 Creation

Public domain, by Francois Grieu

*/

/* Meat of the generator

Use the following doubleFastRng() macro as you would use:

double doubleFastRng(void);

Result is random-like and near-uniform in [0 to 1]

Note: low-order bits of mantissa are NOT very good

see below for more serious limitations

Note: speed depends a lot on the speed of floor()

Note: results are architecture dependent; this shows

after like a hundred iterations.

Rationale:

- some chaotic generator A is built

- A is turned into a uniform generator B; consecutive

pairs of B are NOT independent.

- B is turned into a better uniform generator C;

consecutive pairs of B sampled at random points should

be well distributed, but triplets of C are NOT,

this may matter in some applications (e.g. selecting

random points in a cube); men this is a SIMPLE

generator, not a perfect one !

After the first pass, then from pass to pass,

- A is in range [0.07438 to 1.62009], non-uniform

- B is in range [0 to 1], near-uniform

- C is in range [0 to 1], random-like

Also

- ignoring a C*k term and precision considerations,

the feedback for A is

A <- ((A mod 1) + 3/11) ^ 2

- this transformation has no stationary point

- order 2 cycles are unstable, and it is conjectured

that higher order cycles are unstable, excluding

numerical precision issues

- but it is conceivable that a particlar seed will give

a short cycle on a given architecture, and worst that

such cycle is entered by accident

- in a (perhap silly) attempt to counter this, generator

C adds a small feedback on A; this probably makes the

distribution of the output slightly less uniform;

a binning test would reveal this, I guess.

*/

#define doubleFastRng() ( \

(gFastRngA += gFastRngC*(1./9973)+3./11-floor(gFastRngA)), \

(gFastRngB += (gFastRngA *= gFastRngA)), \

(gFastRngC += (gFastRngB -= floor(gFastRngB))), \

(gFastRngC -= floor(gFastRngC)) \

)

/* Optional: seed the RNG

Use the following doubleFastRng() macro as you would use:

double doubleFastRngSeed(double x);

Seed value x can be any non-negative, non-huge double

*/

#define doubleFastRngSeed(x) do { \

gFastRngA = (x); gFastRngB = gFastRngC = 0; \

} while(0)

/* Optional: define an additional state of the random generator

in a local context, which might be faster.

Seed value x can be any non-negative, non-huge double

Feb 5, 2004, 7:19:31 AM2/5/04

to

"Adam Ierymenko" <a...@N0SP4Mclearlight.com> wrote in message

news:pan.2004.02.04....@N0SP4Mclearlight.com...

A method for generating 32-bit floating point random numbers directly,

without the usual floating of a random integers, is in

Toward a universal random number generator,

Statistics & Probability Letters, Volume 66, Issue 2, January 1990, Pages

35-39

George Marsaglia , Arif Zaman and Wai Wan Tsang

.

while a 64 bit version is in

The 64-bit universal RNG

Statistics & Probability Letters, Volume 66, Issue 2, 15 January 2004, Pages

183-187

George Marsaglia and Wai Wan Tsang

George Marsaglia

0 new messages

Search

Clear search

Close search

Google apps

Main menu