pqRNG - an unbiased Random Number Generator with entire Cycle Length

Karl-Uwe Frank

unread,

Nov 2, 2011, 2:02:28 PM11/2/11

to

Today I would like to introduce a new Pseudo Random Number Generator
which I have invented over the last weeks. Any constructive critic or
comment is welcome.

Mathematical Formula of pqRNG

Q[n] = ((Q[n-1] xor R) * P) mod M

An unbiased complete Cycle Length of all Integers between 0 and 2**n
will be achieved if the following Conditions are complied

R mod 8 = 5
P mod 8 = 3

M = 2**n >= 8

One Advantage of this Pseudo Random Number Generator is that we can
feed it with 3 different random Values or Values of Choice for Q, P
and R as IV (Initialization Vectors). Of course R and P have to be
adjusted accordingly in order to gain the unbiased entire Cycle Length
of M. Due to the mathematical relation between R and P the maximum
Quantity of different Number Sequences pqRNG can generate will than
count (2**n/4).

Below a short Example of one unbiased entire Cycle Length of M

Start IV (pqRNG)
Q = 0, P = 3, R = 13, M = 16

7, 14, 9, 12, 3, 10, 5, 8, 15, 6, 1, 4, 11, 2, 13, 0

The 32bit binary Output of pqRNG with M = 2147483647 over the Integer
Interval [0, 255] passes quite remarkable all empirical and
statistical Tests for Randomness as there are FIPS-140-1, Diehard Test
Battery, Frequency-, Poker-, Runs-, Long-Runs and Serial-Test, also
Monte Carlo Value of Pi, Arithmetic Mean, Serial Correlation
Coefficient and it generate a good Uniform Distribution of Zeros (0)
and Ones(1).

Annotation: This PRNG should not be considered cryptographically
secure. If a cryptographically secure Keystream for Encryption is
needed it should be considered to use two pqRNG and XOR the binary
Output of each into a new Stream of Byte. In that Case of course the
best Results are obtained if you seed both pqRNG with a Total of 6
different IV.

Further Information can be found at http://www.freecx.co.uk/pqRNG/

Cheers,
Karl-Uwe

---
Copyright (c) 2011, Karl-Uwe Frank
All rights reserved.

Greg Rose

unread,

Nov 2, 2011, 5:12:29 PM11/2/11

to

In article <ab095fe1-5a90-4d83...@j36g2000prh.googlegroups.com>,

Karl-Uwe Frank <karl....@freecx.co.uk> wrote:
>Today I would like to introduce a new Pseudo Random Number Generator
>which I have invented over the last weeks. Any constructive critic or
>comment is welcome.
>
>Mathematical Formula of pqRNG
>
> Q[n] = ((Q[n-1] xor R) * P) mod M
>
>An unbiased complete Cycle Length of all Integers between 0 and 2**n
>will be achieved if the following Conditions are complied
>
> R mod 8 = 5
> P mod 8 = 3
>
> M = 2**n >= 8
>
>One Advantage of this Pseudo Random Number Generator is that we can
>feed it with 3 different random Values or Values of Choice for Q, P
>and R as IV (Initialization Vectors). Of course R and P have to be
>adjusted accordingly in order to gain the unbiased entire Cycle Length
>of M. Due to the mathematical relation between R and P the maximum
>Quantity of different Number Sequences pqRNG can generate will than
>count (2**n/4).
>
>Below a short Example of one unbiased entire Cycle Length of M
>
>Start IV (pqRNG)
>Q = 0, P = 3, R = 13, M = 16
>
>7, 14, 9, 12, 3, 10, 5, 8, 15, 6, 1, 4, 11, 2, 13, 0

Odd, even, odd, even, ...

In fact this is necessary if you look at the formula.

>The 32bit binary Output of pqRNG with M = 2147483647 over the Integer

Hang on, I thought you said M was a power of two?

>Interval [0, 255] passes quite remarkable all empirical and
>statistical Tests for Randomness as there are FIPS-140-1, Diehard Test
>Battery, Frequency-, Poker-, Runs-, Long-Runs and Serial-Test, also
>Monte Carlo Value of Pi, Arithmetic Mean, Serial Correlation
>Coefficient and it generate a good Uniform Distribution of Zeros (0)
>and Ones(1).
>
>Annotation: This PRNG should not be considered cryptographically
>secure. If a cryptographically secure Keystream for Encryption is
>needed it should be considered to use two pqRNG and XOR the binary
>Output of each into a new Stream of Byte. In that Case of course the
>best Results are obtained if you seed both pqRNG with a Total of 6
>different IV.

If you do this, the LSB of the combined output stream
will be constant, and will bias the output from the
point of view of balance of 0s and 1s.

Greg.
--

Maaartin G

unread,

Nov 2, 2011, 5:52:04 PM11/2/11

to

On Nov 2, 7:02 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
> will be achieved if the following Conditions are complied
>
> R mod 8 = 5
> P mod 8 = 3
>
> M = 2**n >= 8
>
> One Advantage of this Pseudo Random Number Generator is that we can
> feed it with 3 different random Values or Values of Choice for Q, P
> and R as IV (Initialization Vectors). Of course R and P have to be
> adjusted accordingly in order to gain the unbiased entire Cycle Length
> of M. Due to the mathematical relation between R and P the maximum
> Quantity of different Number Sequences pqRNG can generate will than
> count (2**n/4).

There are (2**n)/8 possibilities for both P and R, so it makes for
2**(2*n-6) possibilities for the parameters, doesn't it? Some of them
could lead to the same sequences, but given the formula they obviously
do not. I'm leaving Q[0] aside since it's just a starting point.

> Below a short Example of one unbiased entire Cycle Length of M

What do you mean by unbiased?

> 7, 14, 9, 12, 3, 10, 5, 8, 15, 6, 1, 4, 11, 2, 13, 0
>
> The 32bit binary Output of pqRNG with M = 2147483647 over the Integer
> Interval [0, 255] passes quite remarkable all empirical and
> statistical Tests for Randomness

How is it possible when the LSB is that trivial?

> Annotation: This PRNG should not be considered cryptographically
> secure. If a cryptographically secure Keystream for Encryption is
> needed it should be considered to use two pqRNG and XOR the binary

This can't work. The lower N bits repeat with a period at most 2**N.
You need to use rotations, right shifts, drop the lower bits, or
whatever.

> Further Information can be found athttp://www.freecx.co.uk/pqRNG/

Here I see you're using right shift by 23 and dropping all but 8 bits.

On Nov 2, 10:12 pm, g...@nope.ucsd.edu (Greg Rose) wrote:
> Odd, even, odd, even, ...
>
> In fact this is necessary if you look at the formula.

Doesn't it happen with all recurrent formulas using one previous
element only? Obviously, you could something like

Q[n] = g(((f(Q[n-1]) xor R) * P) mod M)

with f being the inverse of an arbitrary function g, but this only
shifts the problem somewhere else (e.g., with f(x) = g(x) =
rotateRightBy16(x), another bit is trivial, with more complicated
functions you hide the problem better, ....).

Message has been deleted

Karl-Uwe Frank

unread,

Nov 3, 2011, 7:31:03 AM11/3/11

to

On Nov 2, 9:12 pm, g...@nope.ucsd.edu (Greg Rose) wrote:

> >Below a short Example of one unbiased entire Cycle Length of M

> >Start IV (pqRNG)
> >Q = 0, P = 3, R = 13, M = 16

> >7, 14, 9, 12, 3, 10, 5, 8, 15, 6, 1, 4, 11, 2, 13, 0

> Odd, even, odd, even, ...

> In fact this is necessary if you look at the formula.

Yes this correct and I don't see any problem here.

> >The 32bit binary Output of pqRNG with M = 2147483647 over the Integer

> Hang on, I thought you said M was a power of two?

Of course, but due to the limitations of signed 32 bit integer the
value for M has to be 2**31-1, but nonetheless all numbers of 2**32
will be generated, in that case from -2.147.483.648 up to
2.147.483.647. If you set Q, R, P and M as 64 bit integer M can be
4294967296 and the formula will then generate each number from 0 up to
4294967296.

> >Annotation: This PRNG should not be considered cryptographically
> >secure. If a cryptographically secure Keystream for Encryption is
> >needed it should be considered to use two pqRNG and XOR the binary
> >Output of each into a new Stream of Byte. In that Case of course the
> >best Results are obtained if you seed both pqRNG with a Total of 6
> >different IV.

> If you do this, the LSB of the combined output stream
> will be constant, and will bias the output from the
> point of view of balance of 0s and 1s.

Thanks for mentioning that, I will do some tests to figure out that
behaviour.

Karl-Uwe

Karl-Uwe Frank

unread,

Nov 3, 2011, 9:29:01 AM11/3/11

to

On Nov 2, 9:52 pm, Maaartin G <grajc...@seznam.cz> wrote:

> > of M. Due to the mathematical relation between R and P the maximum
> > Quantity of different Number Sequences pqRNG can generate will than
> > count (2**n/4).
>
> There are (2**n)/8 possibilities for both P and R, so it makes for
> 2**(2*n-6) possibilities for the parameters, doesn't it? Some of them
> could lead to the same sequences, but given the formula they obviously
> do not. I'm leaving Q[0] aside since it's just a starting point.
>

Your'e right and now it surprises me that the number of possible rows
is increasing proportional with M. I just programmed a little routine
to verify that and it's quite amazing. Thanks a lot for pointing on
that fact.

> > Below a short Example of one unbiased entire Cycle Length of M
>
> What do you mean by unbiased?
>
> > 7, 14, 9, 12, 3, 10, 5, 8, 15, 6, 1, 4, 11, 2, 13, 0

Here my answer to a similar question from mok-kong shen. From my point
of view it should be considered as a weakness of any PRNG if it will
run into a bias

Unbiased, in my opinion, for example mean that the PRNG would produce
always a good uniform distribution like this below (some sample
results from my new PRNG generating 102.400.000 ones and zeros)

51203881 Zeros (0)
51196119 Ones (1)
99.98 Percent uniform Spread

51193234 Zeros (0)
51206766 Ones (1)
99.97 Percent uniform Spread

A biased PRNG would perhaps generate something like this

21203881 Zeros (0)
81196119 Ones (1)

or that

93279658 Zeros (0)
9120342 Ones (1)

which indicate that the bit-stream from this PRNG is prone to be
biased, with either 1s or 0s predominating.

Another example of a biased PRNG would be if it produce a random
sequence like this

7, 14, 9, 12, 3, 10, 5, 8, 15, 6, 6, 6, 6, 6, 6, 6

instead of this

7, 14, 9, 12, 3, 10, 5, 8, 15, 6, 1, 4, 11, 2, 13, 0

From my point of view a PRNG tending to be biased has nothing to do
with statistical quality other than being rather useless.

> > The 32bit binary Output of pqRNG with M = 2147483647 over the Integer
> > Interval [0, 255] passes quite remarkable all empirical and
> > statistical Tests for Randomness
>
> How is it possible when the LSB is that trivial?

Oh yes it surprises even me when I found out. When I look now on what
overcomplicated formulas I have created until I found this it is quite
astonishing. Clearly it was that citation of Einstein - “Everything
should be made as simple as possible, but not simpler.” - which
inspired me to remove and strip off all unnecessary routines and
calculations and finally that simple formula does the trick.

> > Annotation: This PRNG should not be considered cryptographically
> > secure. If a cryptographically secure Keystream for Encryption is
> > needed it should be considered to use two pqRNG and XOR the binary
>
> This can't work. The lower N bits repeat with a period at most 2**N.
> You need to use rotations, right shifts, drop the lower bits, or
> whatever.

Yes I have to consider this and take a closer look again, as Greg Rose
also mentioned that there might be a problem.

> > Further Information can be found at http://www.freecx.co.uk/pqRNG/

>
> Here I see you're using right shift by 23 and dropping all but 8 bits.

I'm quite not sure what you mean by "right shift by 23" because I
mentioned that R = P left shift 3 gave good results. Where did you
read right shift by 23?

> On Nov 2, 10:12 pm, g...@nope.ucsd.edu (Greg Rose) wrote:
>
> > Odd, even, odd, even, ...
>
> > In fact this is necessary if you look at the formula.
>
> Doesn't it happen with all recurrent formulas using one previous
> element only? Obviously, you could something like
>
> Q[n] = g(((f(Q[n-1]) xor R) * P) mod M)
>
> with f being the inverse of an arbitrary function g, but this only
> shifts the problem somewhere else (e.g., with f(x) = g(x) =
> rotateRightBy16(x), another bit is trivial, with more complicated
> functions you hide the problem better, ....).

Currently I am extending the formula in order to make the results more
unpredictable, but currently as always only byte [0x0,0xff] are drawn
from each Q it doesn't really matter.

Karl-Uwe

Greg Rose

unread,

Nov 3, 2011, 1:53:57 PM11/3/11

to

In article <6dd92dd9-abd5-406f...@er6g2000vbb.googlegroups.com>,

Karl-Uwe Frank <karl....@freecx.co.uk> wrote:
>
>> >Below a short Example of one unbiased entire Cycle Length of M
>>
>> >Start IV (pqRNG)
>> >Q = 0, P = 3, R = 13, M = 16
>>
>> >7, 14, 9, 12, 3, 10, 5, 8, 15, 6, 1, 4, 11, 2, 13, 0
>>
>> Odd, even, odd, even, ...
>>
>> In fact this is necessary if you look at the formula.
>>

>Yes this correct and I don't see any problem here.

You don't know how the random numbers are going to be
used. Imagine a game in which you fight a monster,
and whether one of you hits the other is decided by
the least significant bit of a generated random number.
Now you're standing there, you try to hit and miss, it
hits you, you try and miss, it hits, ... your character
dies. (This was an *actual* bug in a very early
dungeons and dragons game. Whenever things got bad, you
just rested for a turn...)

[snip]

>> >Annotation: This PRNG should not be considered cryptographically
>> >secure. If a cryptographically secure Keystream for Encryption is
>> >needed it should be considered to use two pqRNG and XOR the binary
>> >Output of each into a new Stream of Byte. In that Case of course the
>> >best Results are obtained if you seed both pqRNG with a Total of 6
>> >different IV.
>>
>> If you do this, the LSB of the combined output stream
>> will be constant, and will bias the output from the
>> point of view of balance of 0s and 1s.
>>

>Thanks for mentioning that, I will do some tests to figure out that
>behaviour.

It's obvious; if you XOR the two streams, and the LSBs
are alternating, they will either be in phase (resulting
in 0s) or out of phase (resulting in always 1s).

Greg.
--

Maaartin G

unread,

Nov 3, 2011, 3:53:52 PM11/3/11

to

On Nov 3, 2:29 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
> On Nov 2, 9:52 pm, Maaartin G <grajc...@seznam.cz> wrote:
> > Here I see you're using right shift by 23 and dropping all but 8 bits.
>
> I'm quite not sure what you mean by "right shift by 23" because I
> mentioned that R = P left shift 3 gave good results. Where did you
> read right shift by 23?

From you code:

Byte = (Q / 0x800000) and 0xff

A division by a power of two is a nice way to keep the CPU busy,
normally you'd write Q/(2**23) as Q>>23. Maybe the compiler does it
for you, but still, I find the shift clearer here.

Karl-Uwe Frank

unread,

Nov 3, 2011, 7:21:02 PM11/3/11

to

On Nov 3, 5:53 pm, g...@nope.ucsd.edu (Greg Rose) wrote:
> In article <6dd92dd9-abd5-406f-929f-32ff71730...@er6g2000vbb.googlegroups.com>,

> Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
>
>
>
> >> >Below a short Example of one unbiased entire Cycle Length of M
>
> >> >Start IV (pqRNG)
> >> >Q = 0, P = 3, R = 13, M = 16
>
> >> >7, 14, 9, 12, 3, 10, 5, 8, 15, 6, 1, 4, 11, 2, 13, 0
>
> >> Odd, even, odd, even, ...
>
> >> In fact this is necessary if you look at the formula.
>
> >Yes this correct and I don't see any problem here.
>
> You don't know how the random numbers are going to be
> used. Imagine a game in which you fight a monster,
> and whether one of you hits the other is decided by
> the least significant bit of a generated random number.
> Now you're standing there, you try to hit and miss, it
> hits you, you try and miss, it hits, ... your character
> dies. (This was an *actual* bug in a very early
> dungeons and dragons game. Whenever things got bad, you
> just rested for a turn...)
>

Well, my intention in creating a new PRNG is finally having one for
cryptographically purposes and therefore only 8 bit integers are of
any interest. But perhaps even for games my formula could be useful if
you don't focus on the values of Q but instead on those in range
[0,255] which are drawn from each Q?

> [snip]
>
> >> >Annotation: This PRNG should not be considered cryptographically
> >> >secure. If a cryptographically secure Keystream for Encryption is
> >> >needed it should be considered to use two pqRNG and XOR the binary
> >> >Output of each into a new Stream of Byte. In that Case of course the
> >> >best Results are obtained if you seed both pqRNG with a Total of 6
> >> >different IV.
>
> >> If you do this, the LSB of the combined output stream
> >> will be constant, and will bias the output from the
> >> point of view of balance of 0s and 1s.
>
> >Thanks for mentioning that, I will do some tests to figure out that
> >behaviour.
>
> It's obvious; if you XOR the two streams, and the LSBs
> are alternating, they will either be in phase (resulting
> in 0s) or out of phase (resulting in always 1s).
>

Yes if you look at it this way you're absolutely right, but as
mentioned above the result of any interest is not Q.

Karl-Uwe

Karl-Uwe Frank

unread,

Nov 3, 2011, 7:40:44 PM11/3/11

to

On Nov 3, 7:53 pm, Maaartin G <grajc...@seznam.cz> wrote:
> On Nov 3, 2:29 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
>
> > On Nov 2, 9:52 pm, Maaartin G <grajc...@seznam.cz> wrote:
> > > Here I see you're using right shift by 23 and dropping all but 8 bits.
>
> > I'm quite not sure what you mean by "right shift by 23" because I
> > mentioned that R = P left shift 3 gave good results. Where did you
> > read right shift by 23?
>
> From you code:
>
> Byte = (Q / 0x800000) and 0xff
>
> A division by a power of two is a nice way to keep the CPU busy,
> normally you'd write Q/(2**23) as Q>>23. Maybe the compiler does it
> for you, but still, I find the shift clearer here.
>

Oh yes, I see. I remember that I have read somewhere that bitwise
shifting should always be preferred against dividing or multiplication
because it's much faster.

For my tests I am using PureBasic http://www.purebasic.com/ which is
extremely fast and cross platform too. Maybe the compiler just
optimise the function as you suppose, because when I measure the time
for generating 10 Mio. numbers the difference between both ways to
calculate are only fractions of seconds.

But some difference appear in the results after changing the function.
Some values are lower than before. Look at this for example

Q = 941321602
P = 63921411
R = 512607117
M = 2147483647

Byte = (Q / 0x800000) & 0xff

471891245 ==> 56
698964960 ==> 83
2012242503 ==> 239
278218590 ==> 33
-232311431 ==> 229
-444182797 ==> 204
-1482701595 ==> 80

Byte = Q >> 23 & 0xff

471891245 ==> 56
698964960 ==> 83
2012242503 ==> 239
278218590 ==> 33
-232311431 ==> 228
-444182797 ==> 203
-1482701595 ==> 79

You see that all results in range [0,255] are lower if Q < 0 when I
use the bitwise shift. In terms where the results were used as a
keystream for encryption it would be problem. I suppose I need to
check how other compiler handle that calculation.

Karl-Uwe

Maaartin G

unread,

Nov 4, 2011, 4:20:21 AM11/4/11

to

On Nov 4, 12:40 am, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
> For my tests I am using PureBasichttp://www.purebasic.com/which is

> extremely fast and cross platform too. Maybe the compiler just
> optimise the function as you suppose,

Probably not since it holds only for unsigned numbers and you're using
signed, aren't you. There's still an optimization possible, but it
uses additional instructions for compensating the difference for
negative numbers.

> because when I measure the time
> for generating 10 Mio. numbers the difference between both ways to
> calculate are only fractions of seconds.

This may be fast or not.... compare it with others. Btw., measurements
taking less then a couple of seconds are quite imprecise.

> But some difference appear in the results after changing the function.
> Some values are lower than before. Look at this for example

> Byte = (Q / 0x800000) & 0xff

> -232311431 ==> 229

> Byte = Q >> 23 & 0xff

> -232311431 ==> 228

Yes, that's it... signed division differs for negative numbers. If you
really want to extract those bits, use shift. If you don't care, use
shift, too.

> You see that all results in range [0,255] are lower if Q < 0 when I
> use the bitwise shift. In terms where the results were used as a
> keystream for encryption it would be problem. I suppose I need to
> check how other compiler handle that calculation.

Probably the same.

Karl-Uwe Frank

unread,

Nov 4, 2011, 5:24:30 PM11/4/11

to

On Nov 4, 8:20 am, Maaartin G <grajc...@seznam.cz> wrote:
> On Nov 4, 12:40 am, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
>
> > For my tests I am using PureBasichttp://www.purebasic.com/whichis
> > extremely fast and cross platform too. Maybe the compiler just
> > optimise the function as you suppose,
>
> Probably not since it holds only for unsigned numbers and you're using
> signed, aren't you. There's still an optimization possible, but it
> uses additional instructions for compensating the difference for
> negative numbers.
>
> > because when I measure the time
> > for generating 10 Mio. numbers the difference between both ways to
> > calculate are only fractions of seconds.
>
> This may be fast or not.... compare it with others. Btw., measurements
> taking less then a couple of seconds are quite imprecise.
>

Okay, now I must confess that using bitwise shift is quite much faster
than division (build on a P4, 2.8Ghz, 2.5GB-RAM)

##################################################

Start IV (pqRNG)

Q = 1234567890
P = 31478795
R = 251830365
M = 2147483647
==============

Total generated 409.600.000 Byte

# (Q / 0x800000) & 0xff
12.4020004272 Seconds

# Q >> 23 & 0xff
5.1110000610 Seconds

##################################################

Total generated 1.024.000.000 Byte

# (Q / 0x800000) & 0xff
31.6119995117 Seconds

# Q >> 23 & 0xff
11.6099996567 Seconds

##################################################

> > But some difference appear in the results after changing the function.
> > Some values are lower than before. Look at this for example
> > Byte = (Q / 0x800000) & 0xff
> > -232311431 ==> 229
> > Byte = Q >> 23 & 0xff
> > -232311431 ==> 228
>
> Yes, that's it... signed division differs for negative numbers.
>

Well than bitwise shift should be used only.

> If you really want to extract those bits, use shift. If you don't
> care, use shift, too.
>

Oh yes, I will use shift in the future definitely where-ever
possible ... really didn't know that it's sometimes that dramatically
faster.

By the way I found this webpage recommending bitwise shift too
http://lab.polygonal.de/2007/05/10/bitwise-gems-fast-integer-math/

> > You see that all results in range [0,255] are lower if Q < 0 when I
> > use the bitwise shift. In terms where the results were used as a
> > keystream for encryption it would be problem. I suppose I need to
> > check how other compiler handle that calculation.
>
> Probably the same.
>

Absolutely right.

Karl-Uwe Frank

unread,

Nov 4, 2011, 6:45:18 PM11/4/11

to

Greg Rose wrote:
> In article <6dd92dd9-abd5-406f...@er6g2000vbb.googlegroups.com>,

> Karl-Uwe Frank <karl....@freecx.co.uk> wrote:
>
> >> >7, 14, 9, 12, 3, 10, 5, 8, 15, 6, 1, 4, 11, 2, 13, 0
> >>
> >> Odd, even, odd, even, ...
> >>
> >> In fact this is necessary if you look at the formula.
> >>

> >Yes this correct and I don't see any problem here.
>
> You don't know how the random numbers are going to be
> used. Imagine a game in which you fight a monster,
> and whether one of you hits the other is decided by
> the least significant bit of a generated random number.
>

May I ask why it has to be the least significant bit of a generated
random number and not the result of a calculation like this

Procedure pqRNG()
Q = ((Q xor R) * P)

Return Q
End

n = pqRNG() >> 30 and 0x1

That way n will be either 0 or 1. I have tested it and the result
looks quite random to me and is definitely not alternating nor in
phase.

> >> If you do this, the LSB of the combined output stream
> >> will be constant, and will bias the output from the
> >> point of view of balance of 0s and 1s.
> >>

> >Thanks for mentioning that, I will do some tests to figure out that
> >behaviour.
>
> It's obvious; if you XOR the two streams, and the LSBs
> are alternating, they will either be in phase (resulting
> in 0s) or out of phase (resulting in always 1s).
>

Just need to come back to this, because of course I would not XOR the
values of both Q from each stream but the resulting 8bit values. This
way it's also never alternating or in phase.

Q1 = 708188033
P1 = 48318707
R1 = 386549661

Q2 = 1792581991
P2 = 73077163
R2 = 584617309

M = 2147483647

Byte_1 = pqRNG_1() >> 23 and 0xff
Byte_2 = pqRNG_2() >> 23 and 0xff

Byte_X = (Byte_1 xor Byte_2)

Greg Rose

unread,

Nov 4, 2011, 7:28:51 PM11/4/11

to

In article <de65850d-4ec1-420d...@p16g2000yqd.googlegroups.com>,

Karl-Uwe Frank <karl....@freecx.co.uk> wrote:
>Greg Rose wrote:
>> In article
><6dd92dd9-abd5-406f...@er6g2000vbb.googlegroups.com>,
>> Karl-Uwe Frank <karl....@freecx.co.uk> wrote:
>>
>> >> >7, 14, 9, 12, 3, 10, 5, 8, 15, 6, 1, 4, 11, 2, 13, 0
>> >>
>> >> Odd, even, odd, even, ...
>> >>
>> >> In fact this is necessary if you look at the formula.
>> >>
>> >Yes this correct and I don't see any problem here.
>>
>> You don't know how the random numbers are going to be
>> used. Imagine a game in which you fight a monster,
>> and whether one of you hits the other is decided by
>> the least significant bit of a generated random number.
>>
>May I ask why it has to be the least significant bit of a generated
>random number and not the result of a calculation like this

I never said it had to be, I just said that if it
is to be generally useful, you can't predict the
ways in which it MIGHT be used (or abused) by
someone else.

>Procedure pqRNG()
> Q = ((Q xor R) * P)
>
> Return Q
>End
>
>n = pqRNG() >> 30 and 0x1

Your first proposal didn't mention these shifts, I
don't think.

>That way n will be either 0 or 1. I have tested it and the result
>looks quite random to me and is definitely not alternating nor in
>phase.
>
>
>> >> If you do this, the LSB of the combined output stream
>> >> will be constant, and will bias the output from the
>> >> point of view of balance of 0s and 1s.
>> >>
>> >Thanks for mentioning that, I will do some tests to figure out that
>> >behaviour.
>>
>> It's obvious; if you XOR the two streams, and the LSBs
>> are alternating, they will either be in phase (resulting
>> in 0s) or out of phase (resulting in always 1s).
>>
>Just need to come back to this, because of course I would not XOR the
>values of both Q from each stream but the resulting 8bit values. This
>way it's also never alternating or in phase.
>
>
>Q1 = 708188033
>P1 = 48318707
>R1 = 386549661
>
>Q2 = 1792581991
>P2 = 73077163
>R2 = 584617309
>
>M = 2147483647
>
>
>Byte_1 = pqRNG_1() >> 23 and 0xff
>Byte_2 = pqRNG_2() >> 23 and 0xff
>
>Byte_X = (Byte_1 xor Byte_2)
>

Anyway, you've now said that you want
cryptographic level of security out of these
things, and you keep changing the target. If you
are prepared to lock in to this proposal, I'll
mention that there's a trivial divide-and-conquer
attack; guess the state of one of the generators,
and then use Boyar's (Plumstead's) method from a
couple of decades ago to derive the state of the
other one. Her method applied to linear
congruential generators, you are using XOR instead
of addition, but it should work. The complexity
should be around 2^32 (for the guess) times say
another 2^4 for the derivation (just estimating
here), which is well short of the 2^64 state.

While ever we were just talking about PRNGs, I was
prepared to work with you, but you are a very long
way away from cryptographic security.

Greg.
--

Maaartin G

unread,

Nov 5, 2011, 5:30:21 AM11/5/11

to

On Nov 4, 11:45 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
> May I ask why it has to be the least significant bit of a generated
> random number and not the result of a calculation like this

The quality of your PRNG is bounded by the quality of the worst bit
you produce.

> n = pqRNG() >> 30 and 0x1
>
> That way n will be either 0 or 1. I have tested it and the result
> looks quite random to me and is definitely not alternating nor in
> phase.

Sure, the higher bits are better, just like for a
http://en.wikipedia.org/wiki/Linear_congruential_generator
Your generator is very similar, you do
x[n+1] = (x[n] ^ R) * P
(all operations modulo M) which can be rewritten as
y[n+1] = (y[n] * P) ^ R
by defining y[n] = x[n] ^ R, so the only difference is using XOR
instead of ADD.

Did you prove that you maximum period conditions R mod 8 = 5 and P mod
8 = 3 are sufficient?

> Byte_1 = pqRNG_1() >> 23 and 0xff

Actually, once you've got rid of the division, there's no reason for
shifting by 23 only. The higher the bit, the better its quality, so
use shift by 24.

> Well, my intention in creating a new PRNG is finally having one for
> cryptographically purposes

I don't think this is achievable in any way. If you get it obviously
wrong, somebody here will show you how to break it. If you get it
right...

Anything called secure these days requires years of analysis by many
cryptographers.

> and therefore only 8 bit integers are of
> any interest. But perhaps even for games my formula could be useful if
> you don't focus on the values of Q but instead on those in range
> [0,255] which are drawn from each Q?

It's perfectly legal to output only part of the state. Outputting the
whole state would make it trivially insecure, since using three
subsequent states, P and R can be computed, and so all future and
previous states.

Karl-Uwe Frank

unread,

Nov 5, 2011, 11:43:13 AM11/5/11

to

On Nov 4, 11:28 pm, g...@nope.ucsd.edu (Greg Rose) wrote:
> In article <de65850d-4ec1-420d-b150-77c005cb3...@p16g2000yqd.googlegroups.com>,

> Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
>
>
>
> >Greg Rose wrote:
> >> In article

> ><6dd92dd9-abd5-406f-929f-32ff71730...@er6g2000vbb.googlegroups.com>,

> >> Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
>
> >> >> >7, 14, 9, 12, 3, 10, 5, 8, 15, 6, 1, 4, 11, 2, 13, 0
>
> >> >> Odd, even, odd, even, ...
>
> >> >> In fact this is necessary if you look at the formula.
>
> >> >Yes this correct and I don't see any problem here.
>
> >> You don't know how the random numbers are going to be
> >> used. Imagine a game in which you fight a monster,
> >> and whether one of you hits the other is decided by
> >> the least significant bit of a generated random number.
>
> >May I ask why it has to be the least significant bit of a generated
> >random number and not the result of a calculation like this
>
> I never said it had to be, I just said that if it
> is to be generally useful, you can't predict the
> ways in which it MIGHT be used (or abused) by
> someone else.

Please don't get me wrong, I was curious and asking because I have
never ever dealt with computer games in any purpose - not even playing
them ;-)

> >Procedure pqRNG()
> > Q = ((Q xor R) * P)
>
> > Return Q
> >End
>
> >n = pqRNG() >> 30 and 0x1
>
> Your first proposal didn't mention these shifts, I
> don't think.

That's correct. The formula itself is straight forward in generating
all numbers in M, that's how it's meant to be in the first place.

But when it comes to programming we need to look at it in a different
way, that's my opinion. Now Q become some sort of "carrier" in the
context, that we don't use Q directly but rather drawing the needed
result of if. So for programming games there seems to be a need for 0s
and 1s somehow, for encryption (just let's assume the formula would be
cryptographically secure) we only need byte [0x0,0xff], for simulation
some other value ranges are of interest. For example if we would need
only 16 bit integer, even they should be drawn out of Q instead of
using Q directly. Nonetheless this might be the way a PRNG always will
be handled.

[snip]

> Anyway, you've now said that you want
> cryptographic level of security out of these
> things, and you keep changing the target. If you
> are prepared to lock in to this proposal, I'll
> mention that there's a trivial divide-and-conquer
> attack; guess the state of one of the generators,
> and then use Boyar's (Plumstead's) method from a
> couple of decades ago to derive the state of the
> other one. Her method applied to linear
> congruential generators, you are using XOR instead
> of addition, but it should work. The complexity
> should be around 2^32 (for the guess) times say
> another 2^4 for the derivation (just estimating
> here), which is well short of the 2^64 state.

I am sure right now that my algorithm would be easy to break, even
when XORing the results of two independent versions of my PRNG,
because it's internal state is not "unpredictable" enough. You know,
my first idea was using a standard linear congruential with 4
different always changing IV values instead of constants for
cryptographic purpose. That's how I get into all this :-) But very
soon I discovered that it tend to run into several problems using it
this way. So I thought that I need to find a way myself on how to get
such a PRNG. Well, pqRNG is my first step into this direction, in my
opinion. Clearly it is not meant to be used for cryptographic purpose,
as I mentioned right from the start. Maybe it's useful in simulation
or perhaps gaming :-)

When I said using two pqRNG and XOR the results I must now confess
that it was merely a swift thought rather than a profound
announcement. Maybe I was just to swift and thoughtless in posting
it.

> While ever we were just talking about PRNGs, I was
> prepared to work with you, but you are a very long
> way away from cryptographic security.

Thanks for you kind offer and maybe my PRNG is useful somehow as it
is.
And be assured that I am aware that the current formula is not
suitable for encryption purposes, but I am trying to extend it which
may possibly work out.

Cheers,
Karl-Uwe

Karl-Uwe Frank

unread,

Nov 5, 2011, 3:40:52 PM11/5/11

to

On Nov 5, 9:30 am, Maaartin G <grajc...@seznam.cz> wrote:
> On Nov 4, 11:45 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
>
> > May I ask why it has to be the least significant bit of a generated
> > random number and not the result of a calculation like this
>
> The quality of your PRNG is bounded by the quality of the worst bit
> you produce.
>

So, never take the output directely (for instance the value of Q) but
instead draw the needed number out of Q, right?

> > n = pqRNG() >> 30 and 0x1
>
> > That way n will be either 0 or 1. I have tested it and the result
> > looks quite random to me and is definitely not alternating nor in
> > phase.
>
> Sure, the higher bits are better, just like for a http://en.wikipedia.org/wiki/Linear_congruential_generator

Really interesting text passage "Advantages and disadvantages of
LCGs".

> Your generator is very similar, you do

Yes it is somehow.

> x[n+1] = (x[n] ^ R) * P
> (all operations modulo M) which can be rewritten as
> y[n+1] = (y[n] * P) ^ R
> by defining y[n] = x[n] ^ R, so the only difference is using XOR
> instead of ADD.

Is it right that you suppose calculating it this way?

y[n+1] = ((y[n] ^ R) * P) ^ R

> Did you prove that you maximum period conditions R mod 8 = 5 and P mod
> 8 = 3 are sufficient?

How do you meant that? Of course I have done several tests in order to
make sure that addjusting P and R accordingly will always generate the
full cycle length.

(Just re-posting my answer regarding this question from another forum)

For example it is quite simple to calculate all possible random number
sequences my PRNG can generate with a small value of M and check them
for any irregularities i.e. duplicate numbers or duplicate rows,
starting with M=16, than M=32, M=64, M=128, M=256 and so on. As
computers in our days are terribly fast it was quite easy to calculate
such sequences. And it has been always the full cycle length of 0 to M
without any double random number or any double sequence. Even several
runs of the PRNG with M=2**32 always generate the full periodic
permutation (of course it took nearly 6 hours for each run ^_^ )
So it becomes clear that the formula will always generate an unbiased
periodic permutation of 0 to M=2**n >= 8, even with M=2**4096 or
higher ^_^

> > Byte_1 = pqRNG_1() >> 23 and 0xff
>
> Actually, once you've got rid of the division, there's no reason for
> shifting by 23 only. The higher the bit, the better its quality, so
> use shift by 24.
>
> > Well, my intention in creating a new PRNG is finally having one for
> > cryptographically purposes
>
> I don't think this is achievable in any way. If you get it obviously
> wrong, somebody here will show you how to break it. If you get it
> right...
>
> Anything called secure these days requires years of analysis by many
> cryptographers.
>

Oh yes, I see. A really big task at all. But I will try to do my best
in finding at least a somehow usable and acceptable solution, I hope.

> > and therefore only 8 bit integers are of
> > any interest. But perhaps even for games my formula could be useful if
> > you don't focus on the values of Q but instead on those in range
> > [0,255] which are drawn from each Q?
>
> It's perfectly legal to output only part of the state. Outputting the
> whole state would make it trivially insecure, since using three
> subsequent states, P and R can be computed, and so all future and
> previous states.

And even with just some bits of chipherstream some PRNG considered
cryptographically secure have been "dismanteld" lately.

I'm quite unsure how "unpredictable" a PRNG should be for
cryptographic purpose. Currently I have just a vague Idea on how to
build one...

Cheers,
Karl-Uwe

MrD

unread,

Nov 5, 2011, 4:20:34 PM11/5/11

to

On 05/11/2011 19:40, Karl-Uwe Frank wrote:
>
> I'm quite unsure how "unpredictable" a PRNG should be for
> cryptographic purpose. Currently I have just a vague Idea on how to
> build one...

It should be completely unpredictable. It should be infeasible to deduce
the next output in any useful period of time, given resources that might
be available to the adversary.

--
MrD.

Karl-Uwe Frank

unread,

Nov 5, 2011, 4:52:58 PM11/5/11

to

Oh yes, that's what we all are looking for, but from my point of view
right now it seems to be impossible, because if the algorithm is known
there has to be a way to break it. I suppose that's the nature of
mathematics, because what can be calculated the one way can be
reversed the other. The only possible solution to prevent this is
keeping the algorithm secret. The other possible solution is to
obfuscate the results so strongly that, even knowing the algorithm,
would make it hard to figure out the internal state. But this are
merely philosophical thought of mine perhaps.

Karl-Uwe

Karl-Uwe Frank

unread,

Nov 5, 2011, 6:37:04 PM11/5/11

to

Maaartin G wrote:
> On Nov 4, 11:45 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
> > May I ask why it has to be the least significant bit of a generated
> > random number and not the result of a calculation like this
>
> The quality of your PRNG is bounded by the quality of the worst bit
> you produce.

[snip]

> Actually, once you've got rid of the division, there's no reason for
> shifting by 23 only. The higher the bit, the better its quality, so
> use shift by 24.

Just have checked this and yes of course right shift by 24 make the
extra & 0xff unnecessary and therefore also the calculation much
faster. In comparison the results with the same IV from the diehard
test battery, ENT and CrypTool tend to be slightly better also. Well,
well, well arithmetic shift should really be used wherever possible.

Cheers,
Karl-Uwe

Maaartin G

unread,

Nov 6, 2011, 3:03:38 AM11/6/11

to

On Nov 5, 8:40 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:

> On Nov 5, 9:30 am, Maaartin G <grajc...@seznam.cz> wrote:> On Nov 4, 11:45 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
> > The quality of your PRNG is bounded by the quality of the worst bit
> > you produce.
>

> So, never take the output directely (for instance the value of Q) but
> instead draw the needed number out of Q, right?

Right.

> > x[n+1] = (x[n] ^ R) * P
> > (all operations modulo M) which can be rewritten as
> > y[n+1] = (y[n] * P) ^ R
> > by defining y[n] = x[n] ^ R, so the only difference is using XOR
> > instead of ADD.
>

> Is it right that you suppose calculating it this way?
>
> y[n+1] = ((y[n] ^ R) * P) ^ R

No, I meant exactly what I wrote. With y[n] = x[n] ^ R you get

y[n+1] = x[n+1] ^ R = ((x[n] ^ R) * P) ^ R = (y[n] * P) ^ R

so you can use a formula more similar to LCG without losing anything.

> > Did you prove that you maximum period conditions R mod 8 = 5 and P mod
> > 8 = 3 are sufficient?
>

> How do you meant that?

Again, exactly as I wrote it. I was asking a mathematical proof valid
for each P, R, Q[0} and M.

> Of course I have done several tests in order to
> make sure that addjusting P and R accordingly will always generate the
> full cycle length.

> (Just re-posting my answer regarding this question from another forum)
>
> For example it is quite simple to calculate all possible random number
> sequences my PRNG can generate with a small value of M and check them
> for any irregularities i.e. duplicate numbers or duplicate rows,
> starting with M=16, than M=32, M=64, M=128, M=256 and so on.

For M above 64 you can just forget it, and for M=256 the universe will
not last long enough.

On Nov 5, 9:52 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
> On Nov 5, 8:20 pm, MrD <mrdemean...@jackpot.invalid> wrote:> On 05/11/2011 19:40, Karl-Uwe Frank wrote:
> > It should be completely unpredictable. It should be infeasible to deduce
> > the next output in any useful period of time, given resources that might
> > be available to the adversary.
>
> Oh yes, that's what we all are looking for, but from my point of view
> right now it seems to be impossible, because if the algorithm is known
> there has to be a way to break it. I suppose that's the nature of
> mathematics, because what can be calculated the one way can be
> reversed the other. The only possible solution to prevent this is
> keeping the algorithm secret.

http://en.wikipedia.org/wiki/Kerckhoffs's_principle

> The other possible solution is to
> obfuscate the results so strongly that, even knowing the algorithm,
> would make it hard to figure out the internal state. But this are
> merely philosophical thought of mine perhaps.

Actually, there is quite a lot of theory behind this "obfuscation".
Look e.g. at
csrc.nist.gov/archive/aes/rijndael/Rijndael-ammended.pdf
for an example of a proof of resistance of a cipher against known
attacks.

On Nov 5, 11:37 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:

> Maaartin G wrote:
> > Actually, once you've got rid of the division, there's no reason for
> > shifting by 23 only. The higher the bit, the better its quality, so
> > use shift by 24.
>
> Just have checked this and yes of course right shift by 24 make the
> extra & 0xff unnecessary and therefore also the calculation much
> faster. In comparison the results with the same IV from the diehard
> test battery, ENT and CrypTool tend to be slightly better also. Well,
> well, well arithmetic shift should really be used wherever possible.

Actually you need a "logical" shift (which is a misnomer) treating the
values as unsigned numbers (simply shift right without sign extension,
which is like unsigned division by a power of two).

Karl-Uwe Frank

unread,

Nov 6, 2011, 8:35:57 AM11/6/11

to karl....@freecx.co.uk

On Nov 6, 8:03 am, Maaartin G <grajc...@seznam.cz> wrote:
> On Nov 5, 8:40 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
>
> > On Nov 5, 9:30 am, Maaartin G <grajc...@seznam.cz> wrote:> On Nov 4, 11:45 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
>
> > > x[n+1] = (x[n] ^ R) * P
> > > (all operations modulo M) which can be rewritten as
> > > y[n+1] = (y[n] * P) ^ R
> > > by defining y[n] = x[n] ^ R, so the only difference is using XOR
> > > instead of ADD.
>

Okay, if I write it step by step as a computer would calculate you
first mention it that way, correct?

y[n] = x[n] ^ R

x[n+1] = (x[n] ^ R) * P

y[n+1] = (y[n] * P) ^ R

> No, I meant exactly what I wrote. With y[n] = x[n] ^ R you get
>
> y[n+1] = x[n+1] ^ R = ((x[n] ^ R) * P) ^ R = (y[n] * P) ^ R
>

Now, if I write it step by step as a computer would calculate you mean
it this way?

x[n] = (y[n] * P) ^ R
x[n+1] = ((x[n] ^ R) * P) ^ R

y[n+1] = x[n+1] ^ R

> so you can use a formula more similar to LCG without losing anything.

What do you suppose my formula is loosing as it generate all numbers
accordingly?

> > > Did you prove that you maximum period conditions R mod 8 = 5 and P mod
> > > 8 = 3 are sufficient?
>
> > How do you meant that?
>
> Again, exactly as I wrote it. I was asking a mathematical proof valid
> for each P, R, Q[0} and M.
>

Unfortnatelly currently I don't have mathematical proof.

> > Of course I have done several tests in order to
> > make sure that addjusting P and R accordingly will always generate the
> > full cycle length.
> > (Just re-posting my answer regarding this question from another forum)
>
> > For example it is quite simple to calculate all possible random number
> > sequences my PRNG can generate with a small value of M and check them
> > for any irregularities i.e. duplicate numbers or duplicate rows,
> > starting with M=16, than M=32, M=64, M=128, M=256 and so on.
>
> For M above 64 you can just forget it, and for M=256 the universe will
> not last long enough.
>

Please don't get me wrong, when I was talking about M it's simply the
modulo variable M having M=16, than M=32, M=64, M=128, M=256, M=512
and so on, not 2^M.

What I have done is just calculating all P in M and all R in M into an
array and then calculate all possible number sequences with Q=0 and
several different Q in order to compare them for missing numbers or
double sequences. Even with M=2^32 the undergone test show all numbers
as expected (but I did not calculate all possible sequences for R and
M with M=2^32 because it would obviously take too long.) I let the
computer do the work for me and I can assure you that all numbers are
generated in all my tests. So if all numbers in all possible sequences
where M=2^4, M=2^5, M=2^6, M=2^7, M=2^8 and M=2^9 occure, why should
there be any problem with larger scale like M=2^16, M=2^32, M=2^64 or
M=2^128? Do integer numbers change their behaviour with a greater M
when using the same formula?

> On Nov 5, 9:52 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
>
> > On Nov 5, 8:20 pm, MrD <mrdemean...@jackpot.invalid> wrote:> On 05/11/2011 19:40, Karl-Uwe Frank wrote:
> > > It should be completely unpredictable. It should be infeasible to deduce
> > > the next output in any useful period of time, given resources that might
> > > be available to the adversary.
>
> > Oh yes, that's what we all are looking for, but from my point of view
> > right now it seems to be impossible, because if the algorithm is known
> > there has to be a way to break it. I suppose that's the nature of
> > mathematics, because what can be calculated the one way can be
> > reversed the other. The only possible solution to prevent this is
> > keeping the algorithm secret.
>
> http://en.wikipedia.org/wiki/Kerckhoffs's_principle
>

Oh yes, Kerckhoffs's principle is what I quite recommend too. Hiding
the algorithm is clearly not a solution. If an encryption algorithm is
strong enough it should withstand most if not all attacks for a long
time and also should be testified by as much professional minds as
possible.

> > The other possible solution is to
> > obfuscate the results so strongly that, even knowing the algorithm,
> > would make it hard to figure out the internal state. But this are
> > merely philosophical thought of mine perhaps.
>
> Actually, there is quite a lot of theory behind this "obfuscation".
> Look e.g. at
> csrc.nist.gov/archive/aes/rijndael/Rijndael-ammended.pdf
> for an example of a proof of resistance of a cipher against known
> attacks.

Thanks a lot for the link and I like to add that the paper contains
some very valuable information.

But regarding encrytion using a PRNG maybe we should start another
thread, because this formula must be useful somehow as it is - but
perhaps not for encryption right away.

> On Nov 5, 11:37 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
>
> > Maaartin G wrote:
> > > Actually, once you've got rid of the division, there's no reason for
> > > shifting by 23 only. The higher the bit, the better its quality, so
> > > use shift by 24.
>
> > Just have checked this and yes of course right shift by 24 make the
> > extra & 0xff unnecessary and therefore also the calculation much
> > faster. In comparison the results with the same IV from the diehard
> > test battery, ENT and CrypTool tend to be slightly better also. Well,
> > well, well arithmetic shift should really be used wherever possible.
>
> Actually you need a "logical" shift (which is a misnomer) treating the
> values as unsigned numbers (simply shift right without sign extension,
> which is like unsigned division by a power of two).

Unfortunaltelly my programming language does not support logical shift
(>>>), but it seem that the compiler take care of that when using
bitwise shift right and the target varaiable is declared as of type
Byte (0 to +255), than it seems to work properly. Bitwise shift is
something I clearly need to get more into, that's for sure.
http://en.wikipedia.org/wiki/Logical_shift

Karl-Uwe

Karl-Uwe Frank

unread,

Nov 7, 2011, 10:53:27 AM11/7/11

to

On Nov 6, 8:03 am, Maaartin G <grajc...@seznam.cz> wrote:
> On Nov 5, 8:40 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
>
> > On Nov 5, 9:30 am, Maaartin G <grajc...@seznam.cz> wrote:> On Nov 4, 11:45 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
>
> Again, exactly as I wrote it. I was asking a mathematical proof valid
> for each P, R, Q[0} and M.

Just to let you know that I am currently preparing a proof and will
publish it soon.

Karl-Uwe Frank

unread,

Nov 8, 2011, 7:16:38 AM11/8/11

to

Yesterday I have uploaded a source code which allows anyone to verify
that my formula is working correctly in generating always all integer
modulo 2^n in a full cycle, in this example all signed 16-bit integer.
Furthermore two examples of test results, one with modulo 32767 and one
with modulo 65535 (*** which of course are not power of 2 for obvious
reason when the variables are 16-bit words ***).

http://www.freecx.co.uk/pqRNG/test_sources/pqRNG_32767_full_period.txt
http://www.freecx.co.uk/pqRNG/test_sources/pqRNG_65535_full_period.txt
http://www.freecx.co.uk/pqRNG/test_sources/pqRNG_full_cycle_2**16_test.pb

The source code is simply a text file with extension *.pb and can be
compiled with PureBasic on MacOSX, Linux and Windows using the
PureBasic-Demo which can be downloaded for free at this URL
http://www.purebasic.com/

Cheers,
Karl-Uwe

Karl-Uwe Frank

unread,

Nov 8, 2011, 9:45:04 AM11/8/11

to

Right now I have uploaded another file with test results over the range
signed Int16 (-32768 to +32767) as well as the appropriate source code
to generate results of any desire

http://www.freecx.co.uk/pqRNG/test_sources/pqRNG_32767*2+1_full_period.txt

http://www.freecx.co.uk/pqRNG/test_sources/pqRNG_full_cycle_2**15-1_test.pb

#############

orz

unread,

Nov 8, 2011, 5:37:16 PM11/8/11

to

That looks a lot like an LCG.

I implemented it in C and ran it through some generic tests. I used
two variants, one with a modulus of 2**32, one with a modulus of
2**64. In both cases I used only the upper half of the state bits as
output in order to filter out the low quality bits in the lower
positions. My values of R & P were picked out of a hat (but conform
to your requirements), but preliminary analysis suggests that they're
probably not much worse than any other values of R & P.

I applied two test batteries: TestU01 SmallCrush/Crush/BigCrush and my
own PractRand standard battery.

32/16 bit variant:
TestU01: flunked SmallCrush badly - didn't bother trying Crush or
BigCrush
PractRand: flunked badly after 16 MB (well under 1 second of testing)

64/32 bit variant:
TestU01: passed SmallCrush, passed Crush, BigCrush has not yet
finished (it takes 4 hours)
PractRand: failed badly after 32 GB (took 7 minutes to fail)

One possible reason for failure is that I think that, like an LCG with
power-of-2 modulus, the Nth bit may have a cycle of 2**N. This means
that with M=2**64 and discarding the bottom 32 bits, the bottom output
bit still has a cycle length of just 2**33. I may try discarding the
bottom 48 bits on an M=2**64 version to see if that helps any.

I would not recommend it to anyone for real world use. Its speed is
good on desktop CPUs, but the statistical quality is poor compared to
modern RNGs, the fact that the usable output is smaller than the
machine word counts against it, and the use of multiplication prevents
it from being efficient on some platforms.

With your permission I might add it to my list of sample RNGs to test
tests on - more variety is always good there. I might also try
combining it with a non-power-of-2-modulus LCG.

Karl-Uwe Frank

unread,

Nov 8, 2011, 8:14:32 PM11/8/11

to

On 08.11.11 22:37, orz wrote:
>
First of all I like to thank you for testing my algorithm.

> That looks a lot like an LCG.
>

Yes, it has some similarity.

> I implemented it in C and ran it through some generic tests. I used
> two variants, one with a modulus of 2**32, one with a modulus of
> 2**64. In both cases I used only the upper half of the state bits as
> output in order to filter out the low quality bits in the lower
> positions.

I would like to suggest using only the higher 8-bit by applying the
output of Q >> 24 into a variable of type byte (range 0 to +255) and run
the test on that output. The test results with ENT and diehard published
on my website were all done with files containing bytes (range 0 to
+255) drawn from Q.

> My values of R& P were picked out of a hat (but conform to your

> requirements), but preliminary analysis suggests that they're
> probably not much worse than any other values of R & P.

This is quite astonishing, do you mean that my algorithm in your C
implementation is delivering the same results *without* the P and R mod
calculation? When I don't set P and R accordingly, in nearly every test
the PRNG is running into a massive bias, which mean at one point it is
generating the same value of Q endlessly.

> I applied two test batteries: TestU01 SmallCrush/Crush/BigCrush and my
> own PractRand standard battery.
>
> 32/16 bit variant:
> TestU01: flunked SmallCrush badly - didn't bother trying Crush or
> BigCrush
> PractRand: flunked badly after 16 MB (well under 1 second of testing)
>
> 64/32 bit variant:
> TestU01: passed SmallCrush, passed Crush, BigCrush has not yet
> finished (it takes 4 hours)
> PractRand: failed badly after 32 GB (took 7 minutes to fail)
>

This sound quite promising so far, even with those crashes.

> One possible reason for failure is that I think that, like an LCG with
> power-of-2 modulus, the Nth bit may have a cycle of 2**N. This means
> that with M=2**64 and discarding the bottom 32 bits, the bottom output
> bit still has a cycle length of just 2**33. I may try discarding the
> bottom 48 bits on an M=2**64 version to see if that helps any.
>

Please, could you give it another try with Q >> 24 and see what
happening than?

> I would not recommend it to anyone for real world use. Its speed is
> good on desktop CPUs, but the statistical quality is poor compared to
> modern RNGs, the fact that the usable output is smaller than the
> machine word counts against it, and the use of multiplication prevents
> it from being efficient on some platforms.
>

If my calculation is correct the output is limited to 2**32-1 byte with
for example mod 2**15-1 and signed Int32, because it will reach the end
of the period at 2**32-1 after generating the max. possible amount of
byte. I am not sure if this limitation would make it improper for real
world use.

Regarding the multiplication problem maybe I can replace it in favour of
some-thing more platform independent.

> With your permission I might add it to my list of sample RNGs to test
> tests on - more variety is always good there.

Of course you have my permission to add it to your test battery, but
please let us first try the way of calculation I mentioned above,
because it passed a huge number of undergone tests without failure.

> I might also try combining it with a non-power-of-2-modulus LCG.

Right and I am working on a similar solution since several days now.

Cheers,
Karl-Uwe

Karl-Uwe Frank

unread,

Nov 8, 2011, 8:28:29 PM11/8/11

to

On 08.11.11 22:37, orz wrote:

Just forgot to mention the formula should than be used like

Q = ((Q xor R) * P)

without the modulus calculation with signed Int32, because otherwise you
would lose the negative values of Q. Than

byte = Q >> 24

Could you please be so kind and confirm this?

Cheers,
Karl-Uwe

orz

unread,

Nov 8, 2011, 10:23:35 PM11/8/11

to

Dunno why my previous post hasn't shown up yet... Googles usenet
interface sucks, though it's still probably better than the
alternatives.

Anyway, just adding: I ran SmallCrush on the M=2**32 version with 24
bits discarded, it failed SmallCrush, though not nearly as badly as
the M=2**32, discarded bits = 16 version did.

orz

unread,

Nov 8, 2011, 10:13:33 PM11/8/11

to

You're welcome. I am using the formula Q = (Q xor R) * P, with no
modulus aside from the power-of-2 modulus inherent in the integer
types involved (2**32 or 2**64 depending upon how Q is declared). I'm
using unsigned math, though it makes no difference for those
operations (except for the bit discarding, where signed math might
unintentionally sign-extend the output).

There's definitely the Nth-bit-has-cycle-length-2**N issue. That
issue is fundamental to any single word RNG that uses only addition,
xor, or multiplication with no right-shifts or barrel-shifts or slow
operations (non-power-of-2 modulus/division, trig functions, etc), and
similar issues are fundamental to multiword RNGs that use such
operations.

This means that the quality of output is roughly linear with how many
bits get discarded in the output stage. You tested it with 24 bits
discarded, I tested it with 16 and with 32 bits discarded, and I'm
currently testing 48 bit discards. I'll do a brief test at a 24 of 32
bits discarded version when these finish, but I'm pretty sure it will
produce intermediate results worse than the 32 bit discarded version
and better than the 16 bits discarded version; besides, with a cycle
length of just 2**32 it can't possible pass some tests - PractRand
needs a lot of data, and I think TestU01 will want at least 2**32 for
BigCrush. Preliminary results suggest that 48 bits discarded is
sufficient to reach pretty reasonable quality.

Note: the 64 bit version discarding 32 bits of output just finished
TestU01 BigCrush - it passed, but it was very marginal on one test,
just barely squeaking by. That test was named "BirthdaySpacings, t =
4", on which it got a p-value of 3.7e-08. I count p-values of <
1.0e-10 as failures - that is the standard that the TestU01 authors
used in their paper on TestU01.

Your usable output length should be less than the cycle. For
starters, as I mentioned the bottom bit of your output has a cycle
shorter than the whole things cycle - for your M=2**32 / discard = 24
bits, the bottom bit of your output has a cycle of length 2**25 even
though your overall output has cycle length 2**32. Also, as your
output length approaches your cycle length your basic frequency
distribution will be forced to flatten out suspiciously. Probably
lots of other reasons too. Anyway, 2**32 is too short a cycle length
for a modern RNG.

I wouldn't worry about multiplication much - it's somewhat fundamental
to the algorithm (you could replace it with a leftshift & addition,
but that would reduce quality), and it's not likely to be a great
candidate for embedded or hardware implementation regardless of what
you do. It's just that in the absence of an exceptional combination
of speed and quality, an RNG needs some sort of niche it shows extra
strength in to really compete. The various niches that I see are
strong theoretical basis / lots of analysis, good for implementation
on exotic hardware, random access, cryptography, fast seeding, etc.
In practice the theory / analysis niche seems more important than the
rest (often even more important than quality & speed) in spurring
adoption, presumably due to the influence of the academic community.
Your RNG is an orderly, single-cycle RNG (er... you can prove that,
right? I see the assertion but I haven't noticed a proof around
here... I can test it empirically if need be, but many people will
want a proof). That implies that it might be possible to figure a way
to skip forward or backwards arbitrarily far in the cycle in constant
time (or at least much better than O(n) time), in which case it would
qualify for the random access niche.

Note that Diehard and ENT are not serious tests for RNGs. Diehard has
been outdated for decades and ENT is too limited. TestU01 and my own
PractRand are the main ones I use. Other RNG test suites that aren't
too outdated include NIST STS (our tax dollars at work; quite popular,
but my impression is that it's not quite as good as other modern test
suites), Dieharder (named after Diehard), RaBiGeTe (no longer
available... its website disappeared sometime in the last year, I
don't know why), and gjrand (don't know if that one is any good or
not).

Karl-Uwe Frank

unread,

Nov 9, 2011, 11:00:58 AM11/9/11

to

On 09.11.11 03:13, orz wrote:
> You're welcome. I am using the formula Q = (Q xor R) * P, with no
> modulus aside from the power-of-2 modulus inherent in the integer
> types involved (2**32 or 2**64 depending upon how Q is declared). I'm
> using unsigned math, though it makes no difference for those
> operations (except for the bit discarding, where signed math might
> unintentionally sign-extend the output).
>

Well that's how I implemented it too, except from being unable to use
signed Int32/Int64 because my compiler does not support them. When I do
>> 24 the output is not signed because I can define the byte var as
unsigned Int8, but I suppose both results of the calculation should be
the same.

> There's definitely the Nth-bit-has-cycle-length-2**N issue. That
> issue is fundamental to any single word RNG that uses only addition,
> xor, or multiplication with no right-shifts or barrel-shifts or slow
> operations (non-power-of-2 modulus/division, trig functions, etc), and
> similar issues are fundamental to multiword RNGs that use such
> operations.
>

There is at one point some sort of conspicuousness, because in all
number sequences at ((2^n)/4)*1, ((2^n)/4)*2, ((2^n)/4)*3 the same
number occure if I set the same start value Q for all sequences. But
perhaps I can live with that, because I am currently working intensly on
it's successor.

> This means that the quality of output is roughly linear with how many
> bits get discarded in the output stage. You tested it with 24 bits
> discarded, I tested it with 16 and with 32 bits discarded, and I'm
> currently testing 48 bit discards. I'll do a brief test at a 24 of 32
> bits discarded version when these finish, but I'm pretty sure it will
> produce intermediate results worse than the 32 bit discarded version
> and better than the 16 bits discarded version;

Thanks again for your effort and I am very grateful for your detailed
explanations and remarks.

> besides, with a cycle length of just 2**32 it can't possible pass
> some tests - PractRand needs a lot of data, and I think TestU01 will
> want at least 2**32 for BigCrush. Preliminary results suggest that
> 48 bits discarded is sufficient to reach pretty reasonable quality.
>

I am getting really interested in you test suite. Does it also need a
big binary file on disk for testing, like ENT, diehard or CrypTool?
Generating huge binary files for each test slow down the whole process a
lot, but ENT, diehard or CrypTool need those files.

> Note: the 64 bit version discarding 32 bits of output just finished
> TestU01 BigCrush - it passed, but it was very marginal on one test,
> just barely squeaking by. That test was named "BirthdaySpacings, t =
> 4", on which it got a p-value of 3.7e-08. I count p-values of<
> 1.0e-10 as failures - that is the standard that the TestU01 authors
> used in their paper on TestU01.
>

Ouuh, that's not too bad, I really didn't expect that.

> Your usable output length should be less than the cycle. For
> starters, as I mentioned the bottom bit of your output has a cycle
> shorter than the whole things cycle - for your M=2**32 / discard = 24
> bits, the bottom bit of your output has a cycle of length 2**25 even
> though your overall output has cycle length 2**32. Also, as your
> output length approaches your cycle length your basic frequency
> distribution will be forced to flatten out suspiciously. Probably
> lots of other reasons too. Anyway, 2**32 is too short a cycle length
> for a modern RNG.
>

Well thanks for this good expalantion. I suppose I should do some more
test with the whole cycle lenght instead of just 10MB or some 200MB
binary files in order to see what happens and how to cope wiht that
unwanted behaviour. Maybe I should start gettinf my head into C and
compile your test batterie on my Mac.

> I wouldn't worry about multiplication much - it's somewhat fundamental

> to the algorithm (you could replace it with a leftshift& addition,

> but that would reduce quality), and it's not likely to be a great
> candidate for embedded or hardware implementation regardless of what
> you do. It's just that in the absence of an exceptional combination
> of speed and quality, an RNG needs some sort of niche it shows extra
> strength in to really compete. The various niches that I see are
> strong theoretical basis / lots of analysis, good for implementation
> on exotic hardware, random access, cryptography, fast seeding, etc.
> In practice the theory / analysis niche seems more important than the

> rest (often even more important than quality& speed) in spurring

> adoption, presumably due to the influence of the academic community.

This sounds really encouraging to me :-)

> Your RNG is an orderly, single-cycle RNG (er... you can prove that,
> right? I see the assertion but I haven't noticed a proof around
> here... I can test it empirically if need be, but many people will
> want a proof).

Unfortunatelly there is no mathematical proof right now, but I am
working on that too.

> That implies that it might be possible to figure a way to skip
> forward or backwards arbitrarily far in the cycle in constant time
> (or at least much better than O(n) time), in which case it would
> qualify for the random access niche.

You hit the bull's eye, that's exactly what I am trying right now. The
idea is randomly jumping from one number sequence to another (like
channel hopping) therefore also back and forward. First I used this
formula, but also trying something different.

#############
;// Int32 (long)
Global X = 0 ; int32

Global rQ = 0 ; int32
Global rP = 0 ; int32
Global rR = 0 ; int32

Global pQ = 0 ; int32
Global pP = 0 ; int32
Global pR = 0 ; int32

than initialise all and rP, rR, pP, pR with left shift 3 and mod calulation.

Procedure xqRNG() // <== the name for the extended version of pqRNG
While ((pQ % 8) unequal 5) And ((pQ % 8) unequal -5)
pQ = pQ +1
Wend
While ((rQ % 8) unequal 3) And ((rQ % 8) unequal -3)
rQ = rQ +1
Wend

R = rQ // <== just temp to simplify testing
P = pQ // <== just temp to simplify testing

X = ((X xor R) * P)

rQ = ((rQ xor rR) * rP)
pQ = ((pQ xor pR) * pP)

Return X
End
#############

The results even of all 32-bit integer of Q seems to be slightly better
when testing 10MB binary files - diehard, ENT, CrypTool of course.

Obvoiusly there are too much arithmetic calculations involved, so the
logical next step would be filling two arrays with bunches of P and R
values at the initialisation stage of xqRNG and using those array values
later randomly for calculating X.

> Note that Diehard and ENT are not serious tests for RNGs. Diehard has
> been outdated for decades and ENT is too limited. TestU01 and my own
> PractRand are the main ones I use. Other RNG test suites that aren't
> too outdated include NIST STS (our tax dollars at work; quite popular,
> but my impression is that it's not quite as good as other modern test
> suites), Dieharder (named after Diehard), RaBiGeTe (no longer
> available... its website disappeared sometime in the last year, I
> don't know why), and gjrand (don't know if that one is any good or
> not).

Well, as mentioned above it looks like I need your PractRand.

> Dunno why my previous post hasn't shown up yet... Googles usenet
> interface sucks, though it's still probably better than the
> alternatives.

Oh yes, that's why I now connected ThunderBird with a news gateway at
mixmint.net, so no need for the crappy Google Web-Gateway anymore.

> Anyway, just adding: I ran SmallCrush on the M=2**32 version with 24
> bits discarded, it failed SmallCrush, though not nearly as badly as
> the M=2**32, discarded bits = 16 version did.

Another big surprise for me so far, didn't expect that either, but
really glad to hear that, as you can imagine.

Maaartin G

unread,

Nov 9, 2011, 12:35:26 PM11/9/11

to

On Nov 9, 5:00 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
> On 09.11.11 03:13, orz wrote:> You're welcome. I am using the formula Q = (Q xor R) * P, with no

> Well that's how I implemented it too, except from being unable to use
> signed Int32/Int64 because my compiler does not support them. When I do
> >> 24 the output is not signed because I can define the byte var as
> unsigned Int8, but I suppose both results of the calculation should be
> the same.

You can always get an unsigned shift by using a signed one followed by
AND zeroing the unwanted bits. Conversion into a short enough type
drops the unwanted bits, so yes, it's OK.

> > There's definitely the Nth-bit-has-cycle-length-2**N issue. That
> > issue is fundamental to any single word RNG that uses only addition,
> > xor, or multiplication with no right-shifts or barrel-shifts or slow
> > operations (non-power-of-2 modulus/division, trig functions, etc), and
> > similar issues are fundamental to multiword RNGs that use such
> > operations.
>
> There is at one point some sort of conspicuousness, because in all
> number sequences at ((2^n)/4)*1, ((2^n)/4)*2, ((2^n)/4)*3 the same
> number occure if I set the same start value Q for all sequences.

In your PRNG (and many others) a given bit never influences lower
bits. So the n-2 lowest bits are the same for the three above initial
values, so getting exactly the same number is no surprise.

> I am getting really interested in you test suite. Does it also need a
> big binary file on disk for testing, like ENT, diehard or CrypTool?
> Generating huge binary files for each test slow down the whole process a
> lot, but ENT, diehard or CrypTool need those files.

No idea if they need it too, but probably yes. Using a file is the
easiest and most portable way to attach the PRNG to the test suite.

> While ((pQ % 8) unequal 5) And ((pQ % 8) unequal -5)
> pQ = pQ +1
> Wend

IMHO this is both slower (which doesn't matter) and more obfuscated
than simply

while ((pQ & 7) != 5) pQ++;

I used C syntax, but my main point is using bit operations instead of
"%".

Karl-Uwe Frank

unread,

Nov 9, 2011, 2:00:28 PM11/9/11

to

On 09.11.11 17:35, Maaartin G wrote:
> On Nov 9, 5:00 pm, Karl-Uwe Frank<karl.fr...@freecx.co.uk> wrote:
>> On 09.11.11 03:13, orz wrote:> You're welcome. I am using the formula Q = (Q xor R) * P, with no
>> Well that's how I implemented it too, except from being unable to use
>> signed Int32/Int64 because my compiler does not support them. When I do
>> >> 24 the output is not signed because I can define the byte var as
>> unsigned Int8, but I suppose both results of the calculation should be
>> the same.
>
> You can always get an unsigned shift by using a signed one followed by
> AND zeroing the unwanted bits. Conversion into a short enough type
> drops the unwanted bits, so yes, it's OK.
>

I am still to unfamiliar with bit shift and when I try -2345 & 0x0 the
reuslt is 0. Currently I just rely on the fact that the compiler does
the math for me when I define the target variable as byte (0 to +255),
which is called ASCII in PureBasic
http://www.purebasic.com/documentation/reference/variables.html

>>> There's definitely the Nth-bit-has-cycle-length-2**N issue. That
>>> issue is fundamental to any single word RNG that uses only addition,
>>> xor, or multiplication with no right-shifts or barrel-shifts or slow
>>> operations (non-power-of-2 modulus/division, trig functions, etc), and
>>> similar issues are fundamental to multiword RNGs that use such
>>> operations.
>>
>> There is at one point some sort of conspicuousness, because in all
>> number sequences at ((2^n)/4)*1, ((2^n)/4)*2, ((2^n)/4)*3 the same
>> number occure if I set the same start value Q for all sequences.
>
> In your PRNG (and many others) a given bit never influences lower
> bits. So the n-2 lowest bits are the same for the three above initial
> values, so getting exactly the same number is no surprise.
>

I have the intention that maybe changing the initials not just only to
comply to the mod calculation but just more bit shifting could possibly
get arround that problem - if it's really necessary.

>> I am getting really interested in you test suite. Does it also need a
>> big binary file on disk for testing, like ENT, diehard or CrypTool?
>> Generating huge binary files for each test slow down the whole process a
>> lot, but ENT, diehard or CrypTool need those files.
>
> No idea if they need it too, but probably yes. Using a file is the
> easiest and most portable way to attach the PRNG to the test suite.
>

I would prefer a test suite which can verify several aspects "on the
fly" by generating enough amount of data in RAM for shorter test
sequences. For example, right now I have to write files of at least 10MB
to the harddisk instead using my 2.5 GB RAM. Maybe PractRand can handle
bigger sizes of binary test data in RAM, lets say 500MB. This would
speed up my excessive testing quite enormous.

>> While ((pQ % 8) unequal 5) And ((pQ % 8) unequal -5)
>> pQ = pQ +1
>> Wend
>
> IMHO this is both slower (which doesn't matter) and more obfuscated
> than simply
>

> while ((pQ& 7) != 5) pQ++;

>
> I used C syntax, but my main point is using bit operations instead of
> "%".

I really like this ingenious idea of yours in getting rid of the mod
calculation, unfortunaltelly when I apply it as you recommend I also get
rid of all values of pQ and rQ < 0 and than the results don't pass the
tests anymore. Currently I am getting good results when I extract the
higher 16-bit out of signed 32-bit X >> 16 (X in the new formula), where
before is has been only 8-bit.

Maaartin G

unread,

Nov 9, 2011, 3:00:20 PM11/9/11

to

On Nov 9, 8:00 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
> On 09.11.11 17:35, Maaartin G wrote:> On Nov 9, 5:00 pm, Karl-Uwe Frank<karl.fr...@freecx.co.uk> wrote:
> > You can always get an unsigned shift by using a signed one followed by
> > AND zeroing the unwanted bits. Conversion into a short enough type
> > drops the unwanted bits, so yes, it's OK.
>
> I am still to unfamiliar with bit shift and when I try -2345 & 0x0 the
> reuslt is 0.

I wrote "zeroing all UNWANTED bits" not "zeroing all bits". It's quite
trivial, x & 0 equals 0. To zero all but the 8 lowest bits use x &
0xFF.

> I would prefer a test suite which can verify several aspects "on the
> fly" by generating enough amount of data in RAM for shorter test
> sequences. For example, right now I have to write files of at least 10MB
> to the harddisk instead using my 2.5 GB RAM. Maybe PractRand can handle
> bigger sizes of binary test data in RAM, lets say 500MB. This would
> speed up my excessive testing quite enormous.

Isn't you OS capable of caching the data? Isn't 10 MB a negligigle
amount of data taking hardly measurable time to write?

On Linux you could use /tmp which is usually a tmpfs living in RAM.

> >> While ((pQ % 8) unequal 5) And ((pQ % 8) unequal -5)
> >> pQ = pQ +1
> >> Wend
>
> > IMHO this is both slower (which doesn't matter) and more obfuscated
> > than simply
>
> > while ((pQ& 7) != 5) pQ++;
>
> > I used C syntax, but my main point is using bit operations instead of
> > "%".
>
> I really like this ingenious idea of yours in getting rid of the mod
> calculation, unfortunaltelly when I apply it as you recommend I also get
> rid of all values of pQ and rQ < 0 and than the results don't pass the
> tests anymore.

I've got no idea what's going on here, but it must be equivalent. On
the second and third looks I can see no difference.

orz

unread,

Nov 9, 2011, 6:53:03 PM11/9/11

to

PractRand is a C++ library for generating and testing RNGs. It needs
more random bits than other RNG testing tools to find bias, but it
tests bits much faster than other modern test suites and finds a good
variety of biases as well. The end result is that it's excellent for
fast RNGs, but mediocre for slow RNGs and very poor for very slow RNGs
(blum-blum-shub, ICGs, etc). Also, it has a harder time figuring out
exact p-values than some other RNG testing tools - it's comparable to
ENT in that regard. In normal operation it is intended to be linked
with C++ code for the RNG that it is testing, and random numbers are
stored in memory only (the entire sequence does NOT need to be stored
- generally only 32 MB or so of random numbers are kept in memory at
any one time). On the normal settings it needs, I don't know, maybe
512 MB of RAM, 0.75 MB of CPU cache, and 1 CPU core.
PractRand 0.84 can be downloaded from sourceforge:
https://sourceforge.net/projects/pracrand/files/
PractRand 0.85 (not yet released) documentation can be seen here:
http://pracrand.sourceforge.net/

TestU01 is normally linked with the RNG being tested as well, but it
also supports reading it from disk IIRC. I can't recall how NIST STS
and Dieharder expect things at the moment. I had a little difficulty
compiling TestU01 on windows, and have been unable to compile
Dieharder on windows, I'm investigating installing a linux emulator
for Dieharder atm.

BTW, the variant using M=2**64, outputting only the upper 16 bits is
doing quite well on statistical tests. I haven't bothered running
BigCrush on it since the weaker variant passed that, but it's passed 4
terabytes of testing on PractRands standard test battery so far. When
it gets up to 8 terabytes I'll abort that and try some slower variants
of PractRands test batteries just in case.

> I have the intention that maybe changing the initials not just only to
> comply to the mod calculation but just more bit shifting could possibly
> get arround that problem - if it's really necessary.

Bit shifting can get around the problem, but maintaining your single-
cycle nature and possible potential to fast-forward the RNG while
getting rid of that kind of issue efficiently tends to be
problematic.

summary of the results of testing I've done on this:
(8/32 means the RNG variant with 8 output bits out of 32 internal
bits)

PractRand standard: (4 GB / minute)
16/32: 16 MB
8/32: 32 MB
32/64: 32 GB
16/64: passed 4 TB so far

TestU01 SmallCrush: (9 seconds)
16/32: failed 10 subtests
8/32: failed 1 subtest
32/64: passed
16/64: untested

TestU01 Crush: (half an hour)
16/32: untested
8/32: untested
32/64: passed
16/64: untested

TestU01 BigCrush: (4 hours)
16/32: untested
8/32: untested
32/64: passed, but almost failed on one subtest
16/64: untested

Karl-Uwe Frank

unread,

Nov 10, 2011, 7:38:00 AM11/10/11

to

On 09.11.11 20:00, Maaartin G wrote:
> On Nov 9, 8:00 pm, Karl-Uwe Frank<karl.fr...@freecx.co.uk> wrote:
>> On 09.11.11 17:35, Maaartin G wrote:> On Nov 9, 5:00 pm, Karl-Uwe Frank<karl.fr...@freecx.co.uk> wrote:
>>> You can always get an unsigned shift by using a signed one followed by
>>> AND zeroing the unwanted bits. Conversion into a short enough type
>>> drops the unwanted bits, so yes, it's OK.
>>

>> I am still to unfamiliar with bit shift and when I try -2345& 0x0 the

>> reuslt is 0.
>
> I wrote "zeroing all UNWANTED bits" not "zeroing all bits". It's quite

> trivial, x& 0 equals 0. To zero all but the 8 lowest bits use x&
> 0xFF.
>
Sorry, I didn't know that before, but now I understand how it works.

>> I would prefer a test suite which can verify several aspects "on the
>> fly" by generating enough amount of data in RAM for shorter test
>> sequences. For example, right now I have to write files of at least 10MB
>> to the harddisk instead using my 2.5 GB RAM. Maybe PractRand can handle
>> bigger sizes of binary test data in RAM, lets say 500MB. This would
>> speed up my excessive testing quite enormous.
>
> Isn't you OS capable of caching the data? Isn't 10 MB a negligigle
> amount of data taking hardly measurable time to write?
>
> On Linux you could use /tmp which is usually a tmpfs living in RAM.
>

Actually my computer is not state of the art, just a several years old
machine running Mac OSX 10.4, also I need to copy each test file to a
Virtual PC with XP because diehard and ENT can only be accessed through
the horrible DOS-Prompt.

>>>> While ((pQ % 8) unequal 5) And ((pQ % 8) unequal -5)
>>>> pQ = pQ +1
>>>> Wend
>>
>>> IMHO this is both slower (which doesn't matter) and more obfuscated
>>> than simply
>>
>>> while ((pQ& 7) != 5) pQ++;
>>
>>> I used C syntax, but my main point is using bit operations instead of
>>> "%".
>>
>> I really like this ingenious idea of yours in getting rid of the mod
>> calculation, unfortunaltelly when I apply it as you recommend I also get
>> rid of all values of pQ and rQ< 0 and than the results don't pass the
>> tests anymore.
>
> I've got no idea what's going on here, but it must be equivalent. On
> the second and third looks I can see no difference.

Surprisingly the results look different, even with the same IV

X = 1740510591 ; int32

rQ = 1120011808 ; int32
rP = 49471475 ; int32
rR = 395771805 ; int32

pQ = 622226615 ; int32
pP = 70218803 ; int32
pR = 561750429 ; int32

#####################

While ((pQ % 8) unequal 5) And ((pQ % 8) unequal -5)
pQ = pQ +1
Wend

X : rQ : pQ
1 : -89318676 : -1764650662 : -877503136
2 : -1359261069 : 566714432 : -1216184678
3 : 646316816 : -2053561926 : 1458578994
4 : 80206801 : -1311917984 : -1843648904
5 : 377134972 : 1690720360 : 846088914
6 : 310397987 : 1785378946 : 238190424
7 : 66469408 : -825842822 : -907648960
8 : 1430916951 : -739293024 : -486000838
9 : 1612485318 : 689091112 : 1573141266
10 : -1909239183 : 2035502530 : 724134168
11 : 1818597674 : -1260113606 : 1624764800
12 : 1441463027 : -1153990176 : 1040635080

#####################

While ((pQ and 0x7) unequal 5) : pQ +1 : Wend

X : rQ : pQ
1 : -89318676 : -1764650662 : -877503136
2 : 1310890547 : 863543282 : -1637497496
3 : 1786852032 : -1203676566 : 551214032
4 : 1967087431 : -1848651134 : 298581080
5 : 1700556852 : 391050618 : -740446912
6 : 1967608139 : 402436690 : 1351511816
7 : -229511112 : 661277834 : 1107388080
8 : 1931394191 : 380250082 : -1542730504
9 : 1291561404 : -797752934 : 462768672
10 : 1855443875 : -1924752974 : 657479080
11 : -989438768 : 1884229546 : 1083663760
12 : 1865031063 : -349565630 : -394875496

When I extract only the higher 16-bit from X the result (20MB binary
file) fail on some diehard tests and ENT as well using your bit
operation, while using the "obfuscated" calculation the result pass all
test. I will try to figure out why.

Karl-Uwe Frank

unread,

Nov 10, 2011, 7:55:34 AM11/10/11

to

On 09.11.11 23:53, orz wrote:
> ... and random numbers are stored in memory only (the entire sequence

> does NOT need to be stored - generally only 32 MB or so of random
> numbers are kept in memory at any one time). On the normal settings
> it needs, I don't know, maybe 512 MB of RAM, 0.75 MB of CPU cache,
> and 1 CPU core.

This sound as it is exactly what I'm looking for.

> PractRand 0.84 can be downloaded from sourceforge:
> https://sourceforge.net/projects/pracrand/files/
> PractRand 0.85 (not yet released) documentation can be seen here:
> http://pracrand.sourceforge.net/

Just downloaded the source code and realised that there is a DOS
executable. I will try that first and see if it can run tests on my PRNG
results.

> BTW, the variant using M=2**64, outputting only the upper 16 bits is
> doing quite well on statistical tests. I haven't bothered running
> BigCrush on it since the weaker variant passed that, but it's passed 4
> terabytes of testing on PractRands standard test battery so far. When
> it gets up to 8 terabytes I'll abort that and try some slower variants
> of PractRands test batteries just in case.

I nearly can't believe that my algorithm can generate such huge amount
of random bytes without any failure - but I am so happy to hear about

that, as you can imagine.

>> I have the intention that maybe changing the initials not just only to
>> comply to the mod calculation but just more bit shifting could possibly
>> get arround that problem - if it's really necessary.
>
> Bit shifting can get around the problem, but maintaining your single-
> cycle nature and possible potential to fast-forward the RNG while
> getting rid of that kind of issue efficiently tends to be
> problematic.

Okay, I see.

> summary of the results of testing I've done on this:
> (8/32 means the RNG variant with 8 output bits out of 32 internal
> bits)
>
> PractRand standard: (4 GB / minute)
> 16/32: 16 MB
> 8/32: 32 MB
> 32/64: 32 GB
> 16/64: passed 4 TB so far
>
> TestU01 SmallCrush: (9 seconds)
> 16/32: failed 10 subtests
> 8/32: failed 1 subtest
> 32/64: passed
> 16/64: untested
>
> TestU01 Crush: (half an hour)
> 16/32: untested
> 8/32: untested
> 32/64: passed
> 16/64: untested
>
> TestU01 BigCrush: (4 hours)
> 16/32: untested
> 8/32: untested
> 32/64: passed, but almost failed on one subtest
> 16/64: untested

Thanks a lot for posting the results. Somehow they look quite promising,
even when only 16-bit or 32-bit out of 64-bit are usable.

Maaartin G

unread,

Nov 10, 2011, 10:42:00 AM11/10/11

to

On Nov 10, 1:38 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
> While ((pQ and 0x7) unequal 5) : pQ +1 : Wend

Shouldn't it be

While ((pQ and 0x7) unequal 5) : pQ = pQ + 1 : Wend

? No idea, what an expression in place of a statement does in your
Basic, but in C it just get evaluated which does nothing in this case.

Karl-Uwe Frank

unread,

Nov 11, 2011, 6:33:35 AM11/11/11

to

Holy moly! A simple Copy-Paste error caused this difference.

It has to be

While ((pQ and 0x7) unequal 3) : pQ +1 : Wend
While ((rQ and 0x7) unequal 5) : rQ +1 : Wend

P mod 8 = 3
R mod 8 = 5

...and PureBasic accept pQ +1 instead pQ = pQ + 1, it's similar to pQ++

Now the 16-bit result extracted from 32-bit passes ENT, diehard ... :-)

Maaartin G

unread,

Nov 11, 2011, 5:24:00 PM11/11/11

to

On Nov 11, 12:33 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
> On 10.11.11 15:42, Maaartin G wrote:
> ...and PureBasic accept pQ +1 instead pQ = pQ + 1, it's similar to pQ++

One more reason to hate Basic. While C has countless gotchas, it's not
*that* bad. Btw., there's a shortcut syntax there for "a = a + b"
which looks like "a += b". Using "+" only and depending on the absent
space is a very silly syntax.

Karl-Uwe Frank

unread,

Nov 14, 2011, 6:37:13 AM11/14/11

to

On 11.11.11 22:24, Maaartin G wrote:
> On Nov 11, 12:33 pm, Karl-Uwe Frank<karl.fr...@freecx.co.uk> wrote:
>> On 10.11.11 15:42, Maaartin G wrote:
>> ...and PureBasic accept pQ +1 instead pQ = pQ + 1, it's similar to pQ++
>
> One more reason to hate Basic.

Well, this version of Basic has some major advantages, for example rapid
prototyping for Win, MacOSX and Linux using the same source code. Than
the opportunity to use Inline Assembler, which will speed up a routine
like my PRNG massively I suppose - definitely something I will learn to
programm soon. You see, C is powerful, but I don't like it wasting my
time in hacking i.e. a file save dialogue or a progress bar and later
struggling in porting it cross platform.

> While C has countless gotchas, it's not *that* bad. Btw., there's a
> shortcut syntax there for "a = a + b" which looks like "a += b".
> Using "+" only and depending on the absent space is a very silly
> syntax.

Oh yes, I know that construct from JavaScript and it's not that bad.

But nonetheless I am getting my head into C++ and yesterday I was able
to compile TestU01.

The first thing I have running so far were tests on binary files
generated by the successor of pqRNG , the extended version named xqRNG.
In dozens of tests with different IV the 32-bit results (not 32-bit out
of 64-bit) passed the Rabbit- and also the Alphabit-Test every time like
a charm.

========= Summary results of Rabbit =========

Version: TestU01 1.2.3
File: ./xqRNG.bin
Number of bits: 163840000
Number of statistics: 40
Total CPU time: 00:04:52.26

All tests were passed

========= Summary results of Alphabit =========

Version: TestU01 1.2.3
File: ./xqRNG.bin
Number of bits: 163840000
Number of statistics: 17
Total CPU time: 00:00:21.05

All tests were passed

More to come soon...

orz

unread,

Nov 17, 2011, 1:33:50 PM11/17/11

to

Inline assembly is readily accessible from most C/C++ compilers,
though the syntax for it varies from compiler to compiler. Nothing
you're doing should be cost-effective to write even partially in asm
though.

Here's an updated & normalized chart of results for power-of-2 modulus
LCGs and your xor-based LCG variant on PractRand standard and TestU01
*Crush:

regular LCGs on PractRand standard:
16/32 - 16 MB (<1 second)
16/40 - 16 MB (<1 second)
16/48 - 4 GB (1 minute)
16/56 - 256 GB (1 hour)
16/64 - > 2 TB (no failure found, aborted after 8 hours of testing)

xor-based LCG variants on PractRand standard:
16/32 - 16 MB (< 1 second)
16/40 - 64 MB (1 second)
16/48 - 8 GB (2 minutes)
16/56 - 512 GB (2 hours)
16/64 - > 8 TB (no failure found, aborted after 32 hours of testing)

regular LCGs on TestU01 SmallCrush (10 seconds)
16/32 - failed 6 subtests
16/40 - failed 1 subtest
16/48 - failed 1 subtest
16/56 - failed 1 subtest
16/64 - passed

xor-based LCG variants on TestU01 SmallCrush (10 seconds)
16/32 - failed 10 subtests
16/40 - failed 1 subtest
16/48 - passed
16/56 - passed
16/64 - passed

regular LCGs on TestU01 Crush (35 minutes)
16/32 - failed 48 subtests
16/40 - failed 29 subtest
16/48 - failed 7 subtest
16/56 - failed 6 subtest
16/64 - failed 3 subtests

xor-based LCG variants on TestU01 Crush (35 minutes)
16/32 - not run
16/40 - failed 11 subtests
16/48 - failed 2 subtests
16/56 - passed
16/64 - passed

regular LCGs on TestU01 BigCrush (4 hours)
16/32 - not run
16/40 - not run
16/48 - not run
16/56 - not run
16/64 - failed 4 subtests

xor-based LCG variants on TestU01 BigCrush (4 hours)
16/32 - not run
16/40 - not run
16/48 - failed 3 subtests
16/56 - passed
16/64 - passed

The xor-based LCG variant seems to outperform regular power-of-2
modulus LCGs. The only test where it does worse is SmallCrush on the
lowest quality parameterization tested. But that could be an artifact
of the tests. Many test designers consider LCGs as kind of the
default target RNG to find bias in. PractRand makes no particular
effort to target LCGs beyond making sure that low bits get tested more
thoroughly than high bits (I generally assume that fast LCGs will run
in to trouble on the low bits no matter what specific tests I do, and
slow LCGs aren't that important), but the difference in the results on
PractRand standard between regular LCGs and the xor based variant is
small.

PractRand standard - I wrote all the tests, so I ought to be happy
with it. But looking at the results on LCGs, it looks like it may be
relying too heavily on period exhaustion issues on the lowest bit for
LCGs, underperforming on LCG-like cases where low bit period
exhaustion issues aren't practical to detect.

TestU01 *Crush - I've seen some oddly brittle behavior out of TestU01
Crush and BigCrush at times when small but unusual changes are applied
to popular RNG types. I think it may be relying too much on very
specific properties of popular RNG types. For instance, it shows an
amazing ability to catch very large GFSRs even when their output is
hashed, like mt19937... but if you make a slight change to your GFSR
implementation as simple as walking the buffer backwards in the output
stage (which has no effect on performance and almost no effect on real
quality) then it is suddenly unable to detect bias in all but the
smallest and simplest GFSRs.

TestU01 Alphabit - IIRC that's intended to find bias types common to
physical random number generators, not pseudo random number
generators.

TestU01 Rabbit - It catches a fair number of issues that Crush &
BigCrush miss. But I don't recall ever seeing it catch something that
was missed by both Crush and PractRand standard. And it has some...
strangenesses. Like, it sometimes actually decreases in bias
detection power when you give it more bits. And if you give it a ton
of bits it will just crash.

Karl-Uwe Frank

unread,

Nov 19, 2011, 11:39:24 AM11/19/11

to

> Inline assembly is readily accessible from most C/C++ compilers,
> though the syntax for it varies from compiler to compiler. Nothing
> you're doing should be cost-effective to write even partially in asm
> though.
>

Currently I am using "gcc" which is part of Xcode (Mac OSX). Quite nice
to realise that the C++ syntax has some similarity with JavaScript,
which makes it's much easier for me. Even compiling in a terminal is
very fast and convenient. Also I figured out that there is a nice cross
platform GUI library available http://www.wxwidgets.org/ as well as a
cool IDE http://www.codeblocks.org/. Now I start loving C++ :-)

> Here's an updated& normalized chart of results for power-of-2 modulus

> LCGs and your xor-based LCG variant on PractRand standard and TestU01
> *Crush:
>

Well the results are amazing once again, thanks a lot for the update. I
am quite happy to get your results, as they are independent from what I
have been able to test so far. But mostly I am surprised about the good
performance that my xor-based LCG variant show on Crush and BigCrush.

> The xor-based LCG variant seems to outperform regular power-of-2
> modulus LCGs. The only test where it does worse is SmallCrush on the
> lowest quality parameterization tested. But that could be an artifact
> of the tests. Many test designers consider LCGs as kind of the
> default target RNG to find bias in. PractRand makes no particular
> effort to target LCGs beyond making sure that low bits get tested more
> thoroughly than high bits (I generally assume that fast LCGs will run
> in to trouble on the low bits no matter what specific tests I do, and
> slow LCGs aren't that important), but the difference in the results on
> PractRand standard between regular LCGs and the xor based variant is
> small.
>

Please allow me to confess that I am a little bit proud of my xor-based
LCG variant because you mentioned it's outperforming capacity. I have
not tested my PRNG with SmallCrush so far and therefore never realised
that it is able passing through the more difficult test batteries other
than diehard or ENT, even if the full bit output is not usable.

> PractRand standard - I wrote all the tests, so I ought to be happy
> with it. But looking at the results on LCGs, it looks like it may be
> relying too heavily on period exhaustion issues on the lowest bit for
> LCGs, underperforming on LCG-like cases where low bit period
> exhaustion issues aren't practical to detect.
>
> TestU01 *Crush - I've seen some oddly brittle behavior out of TestU01
> Crush and BigCrush at times when small but unusual changes are applied
> to popular RNG types. I think it may be relying too much on very
> specific properties of popular RNG types. For instance, it shows an
> amazing ability to catch very large GFSRs even when their output is
> hashed, like mt19937... but if you make a slight change to your GFSR
> implementation as simple as walking the buffer backwards in the output
> stage (which has no effect on performance and almost no effect on real
> quality) then it is suddenly unable to detect bias in all but the
> smallest and simplest GFSRs.
>

Currently I am in progress creating and testing the successor of pqRNG
with SmallCrush and Crush. Apparently SmallCrush shows some kind of
behaviour as you mentioned for Crush and BigCrush. I have build a simple
shell script that runs test series with SmallCrush. For example 600
times SmallCrush with xqRNG and save the results into files. After
finishing it divide the result files into those which passed and those
which failed the test. Surprisingly there is about 1 to 2% of tests that
fail on every test series. More surprisingly the reason for the failure
is every time a different one. Sometimes Collision problems where
detected, than it's RandomWalk, next time Gap, and so on.

> TestU01 Alphabit - IIRC that's intended to find bias types common to
> physical random number generators, not pseudo random number
> generators.
>
> TestU01 Rabbit - It catches a fair number of issues that Crush&
> BigCrush miss. But I don't recall ever seeing it catch something that
> was missed by both Crush and PractRand standard. And it has some...
> strangenesses. Like, it sometimes actually decreases in bias
> detection power when you give it more bits. And if you give it a ton
> of bits it will just crash.

Well, I have tested only some really small files about 20 or 40MB with
Alphabit and Rabbit and now try to rely on TestU01 SmallCrush and Crush.
If my new algorithm pass these tests all the time I will give BigCrush a
try.

By the way, do you suppose there might by a chance in compiling
PractRand with gcc?

orz

unread,

Nov 19, 2011, 5:55:56 PM11/19/11

to

I have compiled PractRand with gcc / MinGW on windows. I don't test
every version with gcc though, so it's possible a problem wandered in,
but I think I tested 0.83 or 0.84. I've never tested PractRand on non-
windows OSes though, so there might be some issues there.

On sporadic but consistent failures of a wide variety of SmallCrush
subtests:
1. What are you defining as a failure? It prints p-values that are
less than .001 or greater than .999 IIRC, but I generally do not count
anything as a failure until it reaches a p-value of 10**-10 or
1-10**-10 (the TestU01 paper used that standard, and I think it's a
pretty good standard). While I count higher p-values as passing, I do
treat p-values in the 10**-5 to 10**-7 range as cause for additional
scrutiny, and p-values below 10**-7 as cause for extreme suspicion.
2. RNGs that just barely pass or fail a specific test will often
produce faintly suspicious p-values that could easily be coincidence,
just produce them too often. With my own tests I have found it far
more cost effective to run the test on a longer sequence of data than
to repeatedly run the test and examine the distribution of results,
but with other peoples tests the interfaces don't always make that
easy. I'd recommend switching to Crush if that happens on SmallCrush,
and to BigCrush if that happens on Crush, but there's no easy option
if that happens on BigCrush except to do many runs of BigCrush and
look at how common suspicious results are.
3. I did once write an RNG that produced borderline p-values on
BigCrush like in #2 above, not just 1 or 2 or 3 subtests but on like
more than half of the 160 subtests in BigCrush - it never produced a p-
value less than 10**-10, but on a typical run it would have perhaps 5
(out of 160) subtests with p-values in the 10**-3 to 10**-5 range,
with a very wide variety in which 5 it had suspicious results on. Any
single BigCrush run it passed with very little suspicion, but when I
looked at the results across a dozen BigCrush runs it was clear that
something was very wrong. That seems to be a very unusual failure
pattern though. I try to do that kind of test on every RNG I think
about recommending just in case, but it seems like a waste of CPU
because there's only that one RNG I ever saw with that failure
pattern.

Karl-Uwe Frank

unread,

Nov 20, 2011, 1:46:09 PM11/20/11

to

On 05.11.11 09:30, Maaartin G wrote:

> Sure, the higher bits are better, just like for a
> http://en.wikipedia.org/wiki/Linear_congruential_generator
> Your generator is very similar, you do

> x[n+1] = (x[n] ^ R) * P
> (all operations modulo M) which can be rewritten as
> y[n+1] = (y[n] * P) ^ R
> by defining y[n] = x[n] ^ R, so the only difference is using XOR
> instead of ADD.
>

Sorry for the late reply, I was too involved in testing my new
algorithm. And yes, you're absolutely right in what you mention here. If
you switch P and R with it's appropriate way of calculation the number
sequences are than more or less generated backwards. So it's obvious
that one can write my algorithm like x[n+1] = (x[n] ^ R) * P or even the
other way round like x[n+1] = (x[n] * P) ^ R.

Karl-Uwe Frank

unread,

Nov 20, 2011, 2:22:34 PM11/20/11

to

On 19.11.11 22:55, orz wrote:
> I have compiled PractRand with gcc / MinGW on windows. I don't test
> every version with gcc though, so it's possible a problem wandered in,
> but I think I tested 0.83 or 0.84. I've never tested PractRand on non-
> windows OSes though, so there might be some issues there.
>

What I noticed and read in your documentation that there is no
configure, autoconf or make file. Unfortunately I am currently not able
to build these, so it's obvious that I have to wait until my knowledge
of C is more profound.

> On sporadic but consistent failures of a wide variety of SmallCrush
> subtests:
> 1. What are you defining as a failure? It prints p-values that are
> less than .001 or greater than .999 IIRC, but I generally do not count
> anything as a failure until it reaches a p-value of 10**-10 or
> 1-10**-10 (the TestU01 paper used that standard, and I think it's a
> pretty good standard). While I count higher p-values as passing, I do
> treat p-values in the 10**-5 to 10**-7 range as cause for additional
> scrutiny, and p-values below 10**-7 as cause for extreme suspicion.

I got 1.5% failed runs with p-values between 0.00056 and 0.9996. What I
really would like is that it's always pass the SmallCrush first before I
start running Crush. But it seems that some minor failures are normal. I
have tested xorshift and xorshift128 and both fail on every run of
SmallCrush, like these results show

Version: TestU01 1.2.3
Generator: xorshift
Number of statistics: 15
Total CPU time: 00:00:34.67
The following tests gave p-values outside [0.001, 0.9990]:
(eps means a value < 1.0e-300):
(eps1 means a value < 1.0e-15):

Test p-value
----------------------------------------------
1 BirthdaySpacings eps
2 Collision 1 - eps1
6 MaxOft 1.1e-12
8 MatrixRank eps
10 RandomWalk1 H 3.4e-14
----------------------------------------------
All other tests were passed

========= Summary results of SmallCrush =========

Version: TestU01 1.2.3
Generator: xor128
Number of statistics: 15
Total CPU time: 00:00:33.92
The following tests gave p-values outside [0.001, 0.9990]:
(eps means a value < 1.0e-300):
(eps1 means a value < 1.0e-15):

Test p-value
----------------------------------------------
6 MaxOft eps
----------------------------------------------
All other tests were passed

On my test runs with xqRNG "eps" or "eps-1" did not occur so far. Those
earlier mentioned 1.5% failed tests of xqRNG only have 1 subset with
failure which currently differs every time between MaxOft, MaxOft AD and
RandomWalk, for example

Test p-value
----------------------------------------------
10 RandomWalk1 M 5.6e-4
----------------------------------------------
All other tests were passed

or

Test p-value
----------------------------------------------
6 MaxOft AD 0.9997
----------------------------------------------
All other tests were passed

or

Test p-value
----------------------------------------------
10 RandomWalk1 C 0.9996
----------------------------------------------
All other tests were passed

> 2. RNGs that just barely pass or fail a specific test will often
> produce faintly suspicious p-values that could easily be coincidence,
> just produce them too often. With my own tests I have found it far
> more cost effective to run the test on a longer sequence of data than
> to repeatedly run the test and examine the distribution of results,
> but with other peoples tests the interfaces don't always make that
> easy. I'd recommend switching to Crush if that happens on SmallCrush,
> and to BigCrush if that happens on Crush, but there's no easy option
> if that happens on BigCrush except to do many runs of BigCrush and
> look at how common suspicious results are.

Well thanks a lot for the advice, but first I will make some more
exhaustive tests with SmallCrush until I'm convinced that the results
are acceptable. Than of course Crush will be the next hurdle. ;-)

> 3. I did once write an RNG that produced borderline p-values on
> BigCrush like in #2 above, not just 1 or 2 or 3 subtests but on like
> more than half of the 160 subtests in BigCrush - it never produced a p-
> value less than 10**-10, but on a typical run it would have perhaps 5
> (out of 160) subtests with p-values in the 10**-3 to 10**-5 range,
> with a very wide variety in which 5 it had suspicious results on. Any
> single BigCrush run it passed with very little suspicion, but when I
> looked at the results across a dozen BigCrush runs it was clear that
> something was very wrong. That seems to be a very unusual failure
> pattern though. I try to do that kind of test on every RNG I think
> about recommending just in case, but it seems like a waste of CPU
> because there's only that one RNG I ever saw with that failure
> pattern.

I see ... and believe me, I am very nervous about what Crush or later
BigCrush will tell me about my algorithm. Hopefully it will pass even
after several dozens of runs.

orz

unread,

Nov 22, 2011, 11:22:31 PM11/22/11

to

You can't (or rather, shouldn't) completely eliminate p-values in
those ranges. In theory they should be rare in proportion to how
extreme the p-values are - since there are 15 subtests to smallcrush,
and only 0.2% of p-values are supposed to fall in those ranges, there
should be an average of 0.03 lines like that printed per SmallCrush
run (and 144 * 0.002 = 0.288 per Crush run, etc) on a good RNG if the
p-value computations are exactly correct. I know I see those lines
even on known good RNGs, and it feels like more then 0.03 such lines
per run even on good RNGs so possibly their p-value calculations are
slightly off (not unusual, though I think TestU01 tries hard to have
good p-value calculations). If you feel like it you could do say 100
SmallCrush runs on your RNG and 100 on a known good RNG and compare
the number of total p-values listed for each RNG. But notice that the
authors of TestU01 did not count p-values in [0.0000000001, 0.001] and
[0.9999999999, 0.999] as failures in their paper, and I normally don't
count them as failures either.

On building PractRand with gcc: I just tried, and, well, I succeeded
but it took more work than could reasonably be expected from someone
new to C++. The command line was fairly simple, but recent gcc
versions are more pedantic than the version I last built it with so it
took some several minutes of massaging the source code to get
everything working right.
When PractRand 0.85 is released (maybe a week or so? I'll have a hard
time getting anything done the next few days with thanksgiving) I'll
make sure it compiles fine with gcc versions up to 4.6.1.

Karl-Uwe Frank

unread,

Nov 24, 2011, 1:45:13 PM11/24/11

to

On 23.11.11 04:22, orz wrote:
> You can't (or rather, shouldn't) completely eliminate p-values in
> those ranges. In theory they should be rare in proportion to how
> extreme the p-values are - since there are 15 subtests to smallcrush,
> and only 0.2% of p-values are supposed to fall in those ranges, there
> should be an average of 0.03 lines like that printed per SmallCrush
> run (and 144 * 0.002 = 0.288 per Crush run, etc) on a good RNG if the
> p-value computations are exactly correct. I know I see those lines
> even on known good RNGs, and it feels like more then 0.03 such lines
> per run even on good RNGs so possibly their p-value calculations are
> slightly off (not unusual, though I think TestU01 tries hard to have
> good p-value calculations). If you feel like it you could do say 100
> SmallCrush runs on your RNG and 100 on a known good RNG and compare
> the number of total p-values listed for each RNG. But notice that the
> authors of TestU01 did not count p-values in [0.0000000001, 0.001] and
> [0.9999999999, 0.999] as failures in their paper, and I normally don't
> count them as failures either.
>

Well, in that case my new algorithm pass all tests always. Currently I
am collecting some test results of TestU01 to publish them together with
the C++ listings.

Unfortunately even my "new" Macbook is not that fast and one run of
Crush take about 1:35 h. But yesterday xqRNG32 passed BigCrush for the
first time - after a "nice, quick" 13 hours run ;-)

> On building PractRand with gcc: I just tried, and, well, I succeeded
> but it took more work than could reasonably be expected from someone
> new to C++. The command line was fairly simple, but recent gcc
> versions are more pedantic than the version I last built it with so it
> took some several minutes of massaging the source code to get
> everything working right.
> When PractRand 0.85 is released (maybe a week or so? I'll have a hard
> time getting anything done the next few days with thanksgiving) I'll
> make sure it compiles fine with gcc versions up to 4.6.1.

I'm happy to hear that, because I am really keen testing the new xqRNG32
with PractRand. The version on my machine is gcc version 4.0.1 (Apple
Computer, Inc. build 5370) and hopefully it will compile, thanks to your
code massage ;-)

Karl-Uwe Frank

unread,

Nov 28, 2011, 1:33:34 PM11/28/11

to

Just to keep you informed that today I have published my new algorithm
xqRNG32. The Newsgroup-Posting has the subject
*xqRNG32 - "Lottery Machine" Random Number Generator*
and can also be found at
http://groups.google.com/group/sci.crypt.random-numbers/msg/17c8f4237292fbf4

Furthermore the test results of TestU01 as well as a documented C++
source code is available at http://www.adverteam.co.uk/xqRNG32/

And now I wish you a nice, peaceful and happy upcoming Thanksgiving!

On 23.11.11 04:22, orz wrote:

orz

unread,

Dec 1, 2011, 12:46:16 PM12/1/11

to

On Nov 28, 10:33 am, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
> Just to keep you informed that today I have published my new algorithm
> xqRNG32. The Newsgroup-Posting has the subject
> *xqRNG32 - "Lottery Machine" Random Number Generator*

> and can also be found athttp://groups.google.com/group/sci.crypt.random-numbers/msg/17c8f4237...

>
> Furthermore the test results of TestU01 as well as a documented C++

> source code is available athttp://www.adverteam.co.uk/xqRNG32/

I'll take a look at your new RNG. Hm... 257 32 bit XLCGs, each using
only the upper 8 bits, one choosing which of the other 256 gets used?
I doubt PractRand could find any bias in that, at least until maybe
256 terabytes or so of output at a guess (pattern of switching tubes
has a cycle length of 2**32, lowest bit of each tube has a cycle
length of 2**24, so 2**56 effective cycle length, figure that
PractRand will notice the cycle getting exhausted after 1/(2**8) of
it... and nothing else will be obvious enough for PractRand to
notice). Worth testing just in case... conceivably (though
doubtfully) the long range linear correlation test could notice
something going on in the low bits across the cycle length of the xlcg
that switches among the other xlcgs.

If you're using g++ 4.0.1 there's some chance it can compile without
modification - that's a really old gcc version, probably even older
than the version I used on PractRand. From the PractRand directory
try:
g++ -o rng_test src/*.cpp src/RNGs/*.cpp src/RNGs/entropy_pools/*.cpp
src/RNGs/other/*.cpp test/test.cpp -Iinclude -O3

If gcc doesn't get pissed about anything that will produce a program
that tests some dummy RNG defined at the top of test/test.cpp.

Meanwhile I've got to get 0.85 out the door... I've had a bit too much
feature creep on it, going to have to trim back a bunch of stuff to
get it out the door. The new recommended RNGs, the multithreaded test
manager, the new tests, the revised test interface, the addition of
Dieharder results to the results page... going to have to delay most
or all of it till 0.86 or later.

Speaking of Dieharder, that's another test suite to check out. So far
it has not given me very good results though, finding flaws in very
few of the RNGs that I've tried so far.

Karl-Uwe Frank

unread,

Dec 5, 2011, 8:39:45 AM12/5/11

to

On 01.12.11 17:46, orz wrote:

> I'll take a look at your new RNG. Hm... 257 32 bit XLCGs, each using
> only the upper 8 bits, one choosing which of the other 256 gets used?
> I doubt PractRand could find any bias in that, at least until maybe
> 256 terabytes or so of output at a guess (pattern of switching tubes
> has a cycle length of 2**32, lowest bit of each tube has a cycle
> length of 2**24, so 2**56 effective cycle length, figure that
> PractRand will notice the cycle getting exhausted after 1/(2**8) of
> it... and nothing else will be obvious enough for PractRand to
> notice). Worth testing just in case... conceivably (though
> doubtfully) the long range linear correlation test could notice
> something going on in the low bits across the cycle length of the xlcg
> that switches among the other xlcgs.
>

Thanks for taking a look at it and I think *XLCG* is quite a very good
term for my PRNG. Would be quite interesting to see if PractRand really
can find out how the XLCG which act as the "Selector" behave and perhaps
can extract something to observe any pattern.

> If you're using g++ 4.0.1 there's some chance it can compile without
> modification - that's a really old gcc version, probably even older
> than the version I used on PractRand. From the PractRand directory
> try:
> g++ -o rng_test src/*.cpp src/RNGs/*.cpp src/RNGs/entropy_pools/*.cpp
> src/RNGs/other/*.cpp test/test.cpp -Iinclude -O3
>

I just gave PractRand_0.84 a try as you mentioned above but only got a
few errors

src/rand.cpp: In constructor
‘PractRand::AutoSeedingStateWalker::AutoSeedingStateWalker(void*)’:
src/rand.cpp:77: error: ISO C++ forbids declaration of ‘__declspec’ with
no type
src/rand.cpp:77: error: ‘thread’ was not declared in this scope
src/rand.cpp:77: error: expected ‘,’ or ‘;’ before ‘Uint64’
src/rand.cpp:78: error: ‘autoseeding_count’ was not declared in this scope

> Meanwhile I've got to get 0.85 out the door... I've had a bit too much
> feature creep on it, going to have to trim back a bunch of stuff to
> get it out the door. The new recommended RNGs, the multithreaded test
> manager, the new tests, the revised test interface, the addition of
> Dieharder results to the results page... going to have to delay most
> or all of it till 0.86 or later.
>

I looked at you project website
http://sourceforge.net/projects/pracrand/ but can't find 0.85 anywhere
for download.

> Speaking of Dieharder, that's another test suite to check out. So far
> it has not given me very good results though, finding flaws in very
> few of the RNGs that I've tried so far.

Currently I am please with diehard, ENT, Cryptool and recently TestU01.
Now wating for PractRand 0.85 ... maybe I will take a look at dieharder
some time in the future.

orz

unread,

Dec 6, 2011, 1:44:36 AM12/6/11

to

That error is just a configuration thing... edit include/PractRand/
config.h, comment out:
#define PRACTRAND_THREAD_LOCAL_STORAGE __declspec(thread)
and uncomment
#define PRACTRAND_THREAD_LOCAL_STORAGE __thread
And that error should go away.

> > Meanwhile I've got to get 0.85 out the door... I've had a bit too much
> > feature creep on it, going to have to trim back a bunch of stuff to
> > get it out the door. The new recommended RNGs, the multithreaded test
> > manager, the new tests, the revised test interface, the addition of
> > Dieharder results to the results page... going to have to delay most
> > or all of it till 0.86 or later.
>

> I looked at you project websitehttp://sourceforge.net/projects/pracrand/but can't find 0.85 anywhere
> for download.

Yeah... I still haven't managed to trim out enough stuff to get it out
the door. Part of the problem is that I started updating
documentation for features that I haven't finished yet... just forget
about 0.85 for the moment, 0.84 should work with gcc versions that old
once config.h is adjusted to use gcc instead of MSVC. Compile it with
the command line from my last post, run the executable to make sure
it's working, then modify class Raw_DummyRNG near the top of test/
test.cpp to implement your RNG algorithm instead of the one it does by
default. IIRC test.cpp should be set up to benchmark a bunch of RNGs
including DummyRNG, then do statistical tests on DummyRNG.

> > Speaking of Dieharder, that's another test suite to check out. So far
> > it has not given me very good results though, finding flaws in very
> > few of the RNGs that I've tried so far.
>
> Currently I am please with diehard, ENT, Cryptool and recently TestU01.
> Now wating for PractRand 0.85 ... maybe I will take a look at dieharder
> some time in the future.

I'm not familiar with CrypTool, but I know the rest. diehard and ENT
and now dieharder have not managed to impress me any though.

Karl-Uwe Frank

unread,

Dec 7, 2011, 6:26:04 PM12/7/11

to

On 06.12.11 06:44, orz wrote:
> On Dec 5, 5:39 am, Karl-Uwe Frank<karl.fr...@freecx.co.uk> wrote:
>> On 01.12.11 17:46, orz wrote:
>>
>>
>>> If you're using g++ 4.0.1 there's some chance it can compile without
>>> modification - that's a really old gcc version, probably even older
>>> than the version I used on PractRand. From the PractRand directory
>>> try:
>>> g++ -o rng_test src/*.cpp src/RNGs/*.cpp src/RNGs/entropy_pools/*.cpp
>>> src/RNGs/other/*.cpp test/test.cpp -Iinclude -O3
>>
>> I just gave PractRand_0.84 a try as you mentioned above but only got a
>> few errors
>>
>> src/rand.cpp: In constructor
>> ‘PractRand::AutoSeedingStateWalker::AutoSeedingStateWalker(void*)’:
>> src/rand.cpp:77: error: ISO C++ forbids declaration of ‘__declspec’ with
>> no type
>> src/rand.cpp:77: error: ‘thread’ was not declared in this scope
>> src/rand.cpp:77: error: expected ‘,’ or ‘;’ before ‘Uint64’
>> src/rand.cpp:78: error: ‘autoseeding_count’ was not declared in this scope
>
> That error is just a configuration thing... edit include/PractRand/
> config.h, comment out:
> #define PRACTRAND_THREAD_LOCAL_STORAGE __declspec(thread)
> and uncomment
> #define PRACTRAND_THREAD_LOCAL_STORAGE __thread
> And that error should go away.
>

Well, I got this error

src/rand.cpp: In constructor
‘PractRand::AutoSeedingStateWalker::AutoSeedingStateWalker(void*)’:

src/rand.cpp:77: error: thread-local storage not supported for this target

but after commenting out the second #define two

/*2. Thread-local-storage: choose one, or neither */
//#define PRACTRAND_THREAD_LOCAL_STORAGE __declspec(thread)
//#define PRACTRAND_THREAD_LOCAL_STORAGE __thread

it compiled a nice "rng_test" which show some test info while running.

>>> Meanwhile I've got to get 0.85 out the door... I've had a bit too much
>>> feature creep on it, going to have to trim back a bunch of stuff to
>>> get it out the door. The new recommended RNGs, the multithreaded test
>>> manager, the new tests, the revised test interface, the addition of
>>> Dieharder results to the results page... going to have to delay most
>>> or all of it till 0.86 or later.
>>
>> I looked at you project websitehttp://sourceforge.net/projects/pracrand/but can't find 0.85 anywhere
>> for download.
>
> Yeah... I still haven't managed to trim out enough stuff to get it out
> the door. Part of the problem is that I started updating
> documentation for features that I haven't finished yet... just forget
> about 0.85 for the moment, 0.84 should work with gcc versions that old
> once config.h is adjusted to use gcc instead of MSVC. Compile it with
> the command line from my last post, run the executable to make sure
> it's working, then modify class Raw_DummyRNG near the top of test/
> test.cpp to implement your RNG algorithm instead of the one it does by
> default. IIRC test.cpp should be set up to benchmark a bunch of RNGs
> including DummyRNG, then do statistical tests on DummyRNG.
>

I tried to change test/test.cpp as recommended and placed my basic pqRNG
in. After newly compling it's still showing all infos as before my
change and doing endless tests with a hugh bunch of PRNG's. Next I tried
to configure that only my RNG would undergo the tests ... but after
reading the the documentation and opened the listing of
src/RNGs/other/simple.cpp or src/RNGs/lcg.cpp I realised that
overwhelming work you have done ... and that this is much to much for me
to understand for now. Maybe something similar like the function
"unif01_CreateExternGenBits" in TestU01 would help. Sorry to say, but
currently I'm not able to fit my RNG into PractRand for testing.

>>> Speaking of Dieharder, that's another test suite to check out. So far
>>> it has not given me very good results though, finding flaws in very
>>> few of the RNGs that I've tried so far.
>>
>> Currently I am please with diehard, ENT, Cryptool and recently TestU01.
>> Now wating for PractRand 0.85 ... maybe I will take a look at dieharder
>> some time in the future.
>
> I'm not familiar with CrypTool, but I know the rest. diehard and ENT
> and now dieharder have not managed to impress me any though.

CrypTool is really more a teaching tool than a test suite. for me it is
quite practical because it help me on some very very basic tests.

orz

unread,

Dec 9, 2011, 4:11:58 PM12/9/11

to

OS X / gcc not supporting TLS? Weird. TLS is pretty important for
high performance multithreaded programs.

>
> >>> Meanwhile I've got to get 0.85 out the door... I've had a bit too much
> >>> feature creep on it, going to have to trim back a bunch of stuff to
> >>> get it out the door. The new recommended RNGs, the multithreaded test
> >>> manager, the new tests, the revised test interface, the addition of
> >>> Dieharder results to the results page... going to have to delay most
> >>> or all of it till 0.86 or later.
>

> >> I looked at you project websitehttp://sourceforge.net/projects/pracrand/butcan't find 0.85 anywhere

> >> for download.
>
> > Yeah... I still haven't managed to trim out enough stuff to get it out
> > the door. Part of the problem is that I started updating
> > documentation for features that I haven't finished yet... just forget
> > about 0.85 for the moment, 0.84 should work with gcc versions that old
> > once config.h is adjusted to use gcc instead of MSVC. Compile it with
> > the command line from my last post, run the executable to make sure
> > it's working, then modify class Raw_DummyRNG near the top of test/
> > test.cpp to implement your RNG algorithm instead of the one it does by
> > default. IIRC test.cpp should be set up to benchmark a bunch of RNGs
> > including DummyRNG, then do statistical tests on DummyRNG.
>
> I tried to change test/test.cpp as recommended and placed my basic pqRNG
> in. After newly compling it's still showing all infos as before my
> change and doing endless tests with a hugh bunch of PRNG's. Next I tried
> to configure that only my RNG would undergo the tests ... but after
> reading the the documentation and opened the listing of
> src/RNGs/other/simple.cpp or src/RNGs/lcg.cpp I realised that
> overwhelming work you have done ... and that this is much to much for me
> to understand for now. Maybe something similar like the function
> "unif01_CreateExternGenBits" in TestU01 would help. Sorry to say, but
> currently I'm not able to fit my RNG into PractRand for testing.

I'm looking at 0.84's test.cpp, these are the potential complications
I see:
1. It's configured to test sfc32, not DummyRNG by default. This was
an error on my part, I intended to change it back to DummyRNG before
releasing it. Go to line 270 (which creates an sfc32 instance),
comment it out, uncomment line 271 (which creates a
Polymorphic_DummyRNG instance instead).
2. PractRand is distinctly C++, not C. It's probably possible to
figure out the changes to Raw_DummyRNG needed without knowing C++, but
maybe not. Also, the changes can get more complex if you want to
output a number of random bits other than 32 at once, or if you want
to enforce an invariant on your RNG state, or use a non-automatic
seeding function.
3. test.cpp does two things: speed tests, and statistical tests. The
speed tests it does on every recommended RNG, plus DummyRNG, but
they're fast - probably a couple of RNGs per second. The statistical
tests it does on only one RNG (see #2 above for how it determines
which RNG).

Karl-Uwe Frank

unread,

Dec 12, 2011, 1:44:17 PM12/12/11

to

Well, maybe it's because my Desktop is quite old and running Mac OSX
10.4. I will compile it on my MacBook Pro as soon as I am able to
include my own PRNG.

Okay, after those changes it is testing the DummyRNG. I have placed the
minimalistic listing of pqRNG in there, but I'm unable to set my own
initial values for P and R.

Would it be possible to pass over only the output of my own PRNG
somehow? You see, I have created an extended version of pqRNG as well as
a complete different approach as basis for further development which
uses an array of values. But how to implement such a listing the easy
way? From my point of view I only need to test the output of my PRNG's
with PractRand, right? Maybe a simple example on how to include an
external PRNG would be of great help. Please take look at
http://adverteam.co.uk/xqRNG32/Source_Code/TestU01_xqRNG32.c where I use
a way of testing my PRNG with TestU01 in simply passing over the 32-bit
output.

Karl-Uwe Frank

unread,

Dec 12, 2011, 8:05:16 PM12/12/11

to

This is the extended Algorithm of pqRNG named pqRNGr2

Q[n] = ((Q[n-1] xor R1) + (P * s) + (R2 + (P xor (s-1))) mod M
s = s + 2 + (s * 2^3)

An unbiased complete Cycle Length of all Integers between 0 and 2n will
be achieved if the following Conditions are complied

P mod 4 = 3
R1 mod 8 = 5 or R1 mod 8 = 7
R2 mod 8 = 5 or R2 mod 8 = 7
s mod 2 = 1
M = 2^n >= 16

An ANSI-C Source Code is avialable which allows Anyone to verify that my
Formula is working correctly in generating always all Integer Modulo 2^n
in a full Cycle, in this Example Source Code all unsigned 16-bit Integer
(http://freecx.co.uk/pqRNG/#pqRNGr2)

Cheers,
Karl-Uwe

Karl-Uwe Frank

unread,

Dec 21, 2011, 3:38:20 PM12/21/11

to

On 09.12.11 21:11, orz wrote:

> On Dec 7, 3:26 pm, Karl-Uwe Frank<karl.fr...@freecx.co.uk> wrote:
>> On 06.12.11 06:44, orz wrote:
>>> On Dec 5, 5:39 am, Karl-Uwe Frank<karl.fr...@freecx.co.uk> wrote:
>>>> On 01.12.11 17:46, orz wrote:
>>

>> /*2. Thread-local-storage: choose one, or neither */
>> //#define PRACTRAND_THREAD_LOCAL_STORAGE __declspec(thread)
>> //#define PRACTRAND_THREAD_LOCAL_STORAGE __thread
>>
>> it compiled a nice "rng_test" which show some test info while running.
>>
>
> OS X / gcc not supporting TLS? Weird. TLS is pretty important for
> high performance multithreaded programs.
>

I now got SnowLeopard and will update my MacBook over Christmas, maybe
it will support TLS.

> I'm looking at 0.84's test.cpp, these are the potential complications
> I see:
> 1. It's configured to test sfc32, not DummyRNG by default. This was
> an error on my part, I intended to change it back to DummyRNG before
> releasing it. Go to line 270 (which creates an sfc32 instance),
> comment it out, uncomment line 271 (which creates a
> Polymorphic_DummyRNG instance instead).

Just managed that and now PractRand give me test results for DummyRNG
after the speed test.

> 2. PractRand is distinctly C++, not C. It's probably possible to
> figure out the changes to Raw_DummyRNG needed without knowing C++, but
> maybe not. Also, the changes can get more complex if you want to
> output a number of random bits other than 32 at once, or if you want
> to enforce an invariant on your RNG state, or use a non-automatic
> seeding function.
>

I have tried to place my new algorithm tt32 in test.cpp which uses 3
different seeds. Now it seems that I get proper results and I have
double checked that it works so far by changing the algorithm that it
throws errors, which occur than, but I am unsure if q and s also get
random seeds or always be 0. In fact the algorithm works with all 3 seed
0, but I really want to make sure that all 3 seeds get different values
on every run. Could you please be so kind and give me a hint how to
enable that?

Here's the listing

class Raw_DummyRNG {
public:
enum {
OUTPUT_TYPE = PractRand::RNGs::OUTPUT_TYPES::NORMAL_1,
OUTPUT_BITS = 32,
FLAGS = PractRand::RNGs::FLAG::NEEDS_GENERIC_SEEDING
};

//-----------------------------------------
//
// tt32 (#5)
//

//RNG state goes here:
Uint32 x, q, s;

Uint32 raw32() {
//RNG algorithm goes here:
uint32_t t1,t2,t3,tz;
uint16_t tt;

// take parts from last x and q
tt = (x >> 17);
tz = (q >> 16 | q << 11);

// generate new s and q
s = (s + tt + (s >> 14));
q = ((q << 5) + tt ^ s) ^ (tz ^ (s-1));

// generate new x
t1 = ((x << 11) ^ (s >> 13));
t2 = ((tz << 9) ^ (t1 >> 19));
t3 = ((x + tt) + (x >> 23)) << 16;
x = ((t3 + (t1 ^ (t2 >> 15)) + (t2 ^ (t1 >> 17)) + (s >> 5)));

return x;
}

//-----------------------------------------

void walk_state(StateWalkingObject *walker) {
walker->handle(x);
walker->handle(q);
walker->handle(s);
}
};
class Polymorphic_DummyRNG : public PractRand::RNGs::vRNG32 {
public:
Raw_DummyRNG implementation;
Uint32 raw32() {return implementation.raw32();}
void walk_state(StateWalkingObject *walker)
{implementation.walk_state(walker);}
std::string get_name() const {return std::string("tt32");}
};

I am very grateful for all your help, because it seems that PractRand is
able to find errors very fast, which would be a big aid on my future work.

Cheers,
Karl-Uwe

---
Algorithm tt32, Copyright (c) 2011 Adverteam Limited (UK), Karl-Uwe
Frank. All rights reserved.

Thomas Pornin

unread,

Dec 25, 2011, 2:10:36 PM12/25/11

to

According to orz <cdh...@gmail.com>:

> OS X / gcc not supporting TLS?

gcc under OS X supports thread-local storage through the Posix / Single
Unix API, i.e. the pthread_setspecific() and pthread_getspecific()
functions. Moreover, the implementations of these functions are provided
as inlinable functions in the system headers, so they are well optimized
and not substantially slower than what is activated with the '__thread'
keyword on other systems. gcc does not support '__thread' on OS X (this
keyword is not part of any C standard; it is a gcc extension).

So gcc / OS X supports TLS, but with the Posix standard API, not with
the gcc-specific extension keyword.

--Thomas Pornin

orz

unread,

Jan 25, 2012, 3:33:31 AM1/25/12

to

Sorry, I kind of lost track of this thread somehow, I'm back now.

PractRand 0.85 is now out, fixing a number of issues. And I released
it with test.cpp testing some other RNG by default *again*, dammit.
I'll find some excuse to release a 0.86 quickly this time to fix
that.

On Dec 21 2011, 12:38 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk>

The default seeding method simply writes random bits over everything
that walk_state calls handle() on. So you'll get random bits in x, q,
and s, every time that gets seeded.

> Here's the listing
>
> class Raw_DummyRNG {
> public:
> enum {
> OUTPUT_TYPE = PractRand::RNGs::OUTPUT_TYPES::NORMAL_1,
> OUTPUT_BITS = 32,
> FLAGS = PractRand::RNGs::FLAG::NEEDS_GENERIC_SEEDING
> };
>
> //-----------------------------------------
> //
> // tt32 (#5)
> //
>
> //RNG state goes here:
> Uint32 x, q, s;
>
> Uint32 raw32() {
> //RNG algorithm goes here:
> uint32_t t1,t2,t3,tz;
> uint16_t tt;
>
> // take parts from last x and q
> tt = (x >> 17);
> tz = (q >> 16 | q << 11);

Are you sure about that line there? "(q >> 16 | q << 11)" looks
pretty strange to me. It looks like it's intended to be a barrel
shift, but the parenthesis do not make it clear, and C/C++ operator
precedence gets weird for bitwise operations so it might be doing a
variable shift instead. And 16 and 11 do not add up to 32 either, so
if it's a barrel shift then it's broken even without the precedence
weirdness. If that was intended to be a variable shift then I would
advise you to add more parenthesis to make that clearer. If that was
intended to be a barrel shift I would advise you to fix the constants
so that they properly add up to 32, and add more parenthesis to make
it clearer.
ie if that's a barrel shift then it should look like either "(q >> 16)
| (q << 16)" or "(q >> 21) | (q << 11)" depending upon which constant
was correct (and a set of paranthesis around the whole thing would be
needed if it weren't immediately getting assigned to tz). If it was a
variable shift then it should look like either "q >> (16 | (q << 11))"
or "(q >> (16|q)) << 11" depending upon what the intent was there
(and, similar to the barrel shift, a set of paranthesis around the
whole thing would be needed if it weren't immediately getting assigned
to a variable).

> // generate new s and q
> s = (s + tt + (s >> 14));
> q = ((q << 5) + tt ^ s) ^ (tz ^ (s-1));
>
> // generate new x
> t1 = ((x << 11) ^ (s >> 13));
> t2 = ((tz << 9) ^ (t1 >> 19));
> t3 = ((x + tt) + (x >> 23)) << 16;
> x = ((t3 + (t1 ^ (t2 >> 15)) + (t2 ^ (t1 >> 17)) + (s >> 5)));
>
> return x;

The overall algorithm looks kinda funky to me. Do you have a...
conceptual description of what it's doing? I don't think I've seen
anything similar to that before. It does not look reversible to me,
though I'm not sure just looking at it. It is usually advised that
small RNGs use reversible state transition functions if possible -
meaning that every valid state should have exactly one valid state
that could have come before it.

> }
>
> //-----------------------------------------
>
> void walk_state(StateWalkingObject *walker) {
> walker->handle(x);
> walker->handle(q);
> walker->handle(s);
> }};
>
> class Polymorphic_DummyRNG : public PractRand::RNGs::vRNG32 {
> public:
> Raw_DummyRNG implementation;
> Uint32 raw32() {return implementation.raw32();}
> void walk_state(StateWalkingObject *walker)
> {implementation.walk_state(walker);}
> std::string get_name() const {return std::string("tt32");}
>
> };
>
> I am very grateful for all your help, because it seems that PractRand is
> able to find errors very fast, which would be a big aid on my future work.
>
> Cheers,
> Karl-Uwe
>
> ---
> Algorithm tt32, Copyright (c) 2011 Adverteam Limited (UK), Karl-Uwe
> Frank. All rights reserved.

BTW, there's not much point on that RNG, but on slow RNG you can go
down the line that reads:
Tests::ListOfTests tests = Tests::Batteries::get_standard_tests(rng);
// Tests::ListOfTests tests =
Tests::Batteries::get_expanded_standard_tests(rng);
and comment out the first of those lines, and uncomment the second.
That will make it use a wider variety of tests that can find bias
using fewer bits sometimes. On 99.999% of algorithms that's not cost
effective though unless you're testing a very slow RNG, as the
additional tests just aren't anywhere near as good as the standard
tests.

Also, when using gcc as a compiler, you shouldn't compare the speed in
the "raw" (on 0.84) or "lightweight" (on 0.85) column for your RNG
with the speeds of the recommended RNGs, because it will inline yours
if optimization is enabled, but not the others, so it's not an apples
to apples situation. On MSVC and (I think) Intel-C it will inline all
of them to it is apples to apples.

Planned for the next few PractRand releases:
1. an example program that tests an RNG using multiple threads at once
for better speed on multi-core CPUs
2. some templates to make it easier generate a polymorphic RNG from a
raw RNG without having to adjust the code when the number of bits
outputted by the raw RNG gets changed
3. some changes to how test results get handled in the library to make
it easier to figure out the nature of the bias detected by the tests.
For instance, currently if it says that you failed BCFN-13, that means
that long range linear correlation was detected... but that doesn't
say whether "long range" means at a scale of 8 bytes or at a scale of
8 gigabytes, or at all scales in between.

On Dec 25 2011, 11:10 am, Thomas Pornin <por...@bolet.org> wrote:
> According to orz <cdh...@gmail.com>:

>
> > OS X / gcc not supporting TLS?
>

That's not gcc support for TLS, that's pthreads support for TLS.
Though I suppose it is a TLS implementation available on OS X + gcc,
so maybe that's just semantics. Anyway, I'm not thrilled with the
idea of adding pthreads as a dependency.

Karl-Uwe Frank

unread,

Jan 26, 2012, 8:06:32 AM1/26/12

to

On 25.01.12 08:33, orz wrote:
> Sorry, I kind of lost track of this thread somehow, I'm back now.
>
> PractRand 0.85 is now out, fixing a number of issues. And I released
> it with test.cpp testing some other RNG by default *again*, dammit.
> I'll find some excuse to release a 0.86 quickly this time to fix
> that.
>

Thanks so much for getting back.

... and perhaps you may consider integrating a function that just take a
32-bit or 64-bit result for the test cycles? That way it would be easy
to have an external PRNG which pipes in it's results to PractRand and
therefore give a greater flexibility and no need to compile PractRand
all the time completely even when the algorithm that has to be tested
got some minor changes. What do your think about such a possibility to
pipe in 32-bit or 64-bit result for the test cycles?

[...snip...]

>> I have tried to place my new algorithm tt32 in test.cpp which uses 3
>> different seeds. Now it seems that I get proper results and I have
>> double checked that it works so far by changing the algorithm that it
>> throws errors, which occur than, but I am unsure if q and s also get
>> random seeds or always be 0. In fact the algorithm works with all 3 seed
>> 0, but I really want to make sure that all 3 seeds get different values
>> on every run. Could you please be so kind and give me a hint how to
>> enable that?
>
> The default seeding method simply writes random bits over everything
> that walk_state calls handle() on. So you'll get random bits in x, q,
> and s, every time that gets seeded.
>

The problem here is that I am unable to seed the PRNG with seeds of
choice. For example I have compiled a table containing 1440 lines with 3
different seeds each line. This table will be used by a simple shell
script to seed the PRNG for testing 1440 runs. Otherwise for example I
would be unable to compare the output quality of two different
formula's. Furthermore using a list of pre-compiled seeds enables me to
make sure on checking how changes off my algorithm influence the test
results.

Well thanks so much for your question and the elaborated explanation.
Yes it should really read

tz = ((q >> 16) | (q << 11));

My intention is to have an unequal and partial shifting of bits in only
taking minor parts of the variable. You mentioned that I should fix it
so that they result of "q" properly add up to 32. Would you please be so
kind and explain why you think that it might be necessary?

>> // generate new s and q
>> s = (s + tt + (s>> 14));
>> q = ((q<< 5) + tt ^ s) ^ (tz ^ (s-1));
>>
>> // generate new x
>> t1 = ((x<< 11) ^ (s>> 13));
>> t2 = ((tz<< 9) ^ (t1>> 19));
>> t3 = ((x + tt) + (x>> 23))<< 16;
>> x = ((t3 + (t1 ^ (t2>> 15)) + (t2 ^ (t1>> 17)) + (s>> 5)));
>>
>> return x;
>
> The overall algorithm looks kinda funky to me. Do you have a...
> conceptual description of what it's doing? I don't think I've seen
> anything similar to that before. It does not look reversible to me,
> though I'm not sure just looking at it. It is usually advised that
> small RNGs use reversible state transition functions if possible -
> meaning that every valid state should have exactly one valid state
> that could have come before it.
>

The idea behind "tt32" - which is the acronym for "TinyTwister 32bit" -
is generating a proper mean distribution of values by an unorthodox,
excessive splitting, adding, shifting and xoring of values in order to
break any relationship apart as much as possible, but without degrading
the quality of random output. Also it should make a cryptanalysis as
difficult as possible, because this algorithm will be used later as the
seeder to initialise the array of a possible cryptographically secure PRNG.

> BTW, there's not much point on that RNG, but on slow RNG you can go
> down the line that reads:
> Tests::ListOfTests tests = Tests::Batteries::get_standard_tests(rng);
> // Tests::ListOfTests tests =
> Tests::Batteries::get_expanded_standard_tests(rng);
> and comment out the first of those lines, and uncomment the second.
> That will make it use a wider variety of tests that can find bias
> using fewer bits sometimes. On 99.999% of algorithms that's not cost
> effective though unless you're testing a very slow RNG, as the
> additional tests just aren't anywhere near as good as the standard
> tests.
>

Thanks for the hint. I have just gave it a try and this was the result!?

rng=tt32, seed=0xc5bda5df, length=2^25 bytes, time=30 seconds
no anomalies in 30 subtest(s)

rng=tt32, seed=0xc5bda5df, length=2^26 bytes, time=50 seconds
no anomalies in 30 subtest(s)

rng=tt32, seed=0xc5bda5df, length=2^27 bytes, time=81 seconds
BCFN-14/1 +8.0 suspicious?
...and 29 subtest(s) without anomalies

rng=tt32, seed=0xc5bda5df, length=2^28 bytes, time=133 seconds
no anomalies in 30 subtest(s)

rng=tt32, seed=0xc5bda5df, length=2^29 bytes, time=228 seconds
no anomalies in 30 subtest(s)

rng=tt32, seed=0xc5bda5df, length=2^30 bytes, time=408 seconds
no anomalies in 30 subtest(s)

Quite odd, isn't it?

> Also, when using gcc as a compiler, you shouldn't compare the speed in
> the "raw" (on 0.84) or "lightweight" (on 0.85) column for your RNG
> with the speeds of the recommended RNGs, because it will inline yours
> if optimization is enabled, but not the others, so it's not an apples
> to apples situation. On MSVC and (I think) Intel-C it will inline all
> of them to it is apples to apples.
>

Thanks for the remark, currently I am not looking at speed, more on
reliability and random quality.

> Planned for the next few PractRand releases:
> 1. an example program that tests an RNG using multiple threads at once
> for better speed on multi-core CPUs
> 2. some templates to make it easier generate a polymorphic RNG from a
> raw RNG without having to adjust the code when the number of bits
> outputted by the raw RNG gets changed
> 3. some changes to how test results get handled in the library to make
> it easier to figure out the nature of the bias detected by the tests.
> For instance, currently if it says that you failed BCFN-13, that means
> that long range linear correlation was detected... but that doesn't
> say whether "long range" means at a scale of 8 bytes or at a scale of
> 8 gigabytes, or at all scales in between.
>

And please consider the possibility to pipe in results from external or
linked functions as mentioned above.

Cheers,
Karl-Uwe

orz

unread,

Jan 26, 2012, 11:21:59 AM1/26/12

to

I'll get to the other stuff later, but for now:

> tz = ((q >> 16) | (q << 11));

> My intention is to have an unequal and partial shifting of bits in only
> taking minor parts of the variable. You mentioned that I should fix it
> so that they result of "q" properly add up to 32. Would you please be so
> kind and explain why you think that it might be necessary?

1. A barrel shift means that bits that get shifted off of one end get
shifted on to the other end. When implemented in C that way, that
means that the two shift amounts must add up to the size of integer
being operated on. Specifically, a 32 bit left barrel shift in C is
implemented as: Uint32 left_rotate32(Uint32 value, int bits) {return
(value << bits) | (value >> (32-bits));} the rightward equivalent is
the same except with the shift amounts swapped. Otherwise it is
simply not a barrel shift, but something else.

2. A barrel shift is a single fast opcode on x86 (and many other
architectures). I think it's ROL for left rotate, ROR for right
rotate in x86 asm. That non-barrel-shift something else is a slower
sequence of 3 or 4 or 5 opcodes.

3. That non-barrel-shift something else is pretty much always worse
than a barrel shift. In particular, used in a chaotic RNG like that
it is less "random" because if "q" is random then 5 of the bits of
"tz" will have a 75% chance of being 1s (and a 25% chance of being 0s)
instead of the desired 50/50. If you want a not-exactly-barrel-shift
that accomplishes the same basic thing, instead try replacing the
bitwise-or with a bitwise-xor: (q >> 16) ^ (q << 11). That's not as
fast as a barrel shift on many architectures, but it's slightly better
than a barrel shift in terms of what it accomplishes for a chaotic
RNG. In that form of not-barrel-shift the two shift amounts need to
add up to less than the size of the integer.

For breaking up patterns in chaotic RNGs, in my experience the fastest
way is mixing up addition, barrel shifts, and rightward xor-shift
combinations, as in my RNG sfc (version 3, 16 bit variant):
Uint16 a, b, counter;
Uint16 raw16() {
Uint16 tmp = a + b + counter++;
a = b ^ (b >> 2);
b = ((b << 5) | (b >> (16-5))) + tmp;
return tmp;
}
That's a barrel shift, a rightward xor-shift pair, some addition, and
a counter. That fails my standard tests after 128 GB (half an hour),
and passes TestU01 BigCrush.

or my current candidate for version 4 of that:
Uint16 a, b, c, counter;
Uint16 raw16() {
Uint16 old = a + b + counter++;
a = b ^ (b >> 3);
b = c + (c << 2);
c = ((c << 7) | (c >> (16-7))) + tmp;
return old;
}
Mostly the same stuff, but with 3 state variables besides the counter
now. Note that "c + (c << 2)" is equivalent to "c * 5", but is faster
than multiplication on most architectures. Also, on x86 specifically,
"c + (c << 2)" happens to be doable as an address calculation which
can be advantageous for bizarre reasons.
That version passed 32 terabytes of my standard tests (and passes
TestU01 BigCrush). Which seems very good for a few simple operations
on 16 bit integers. I tried removing each non-essential operation in
it to see which ones were most important...
changing "a + b + counter++" to "a + b" caused it to fail after 64
GB.
changing "a + b + counter++" to "a + counter++" caused it to fail
after 8 TB (still very very good)
changing "b ^ (b >> 3)" to "b" caused it to fail after 128 GB
changing "c + (c << 2)" to "c" caused it to fail after 1 TB
Those were the operations I considered non-essential.
I test that at 64 bit as it's too slow / impossible to find flaws
empirically testing at 32 bit.

Another combination that has gained popularity lately in some hash
functions (namely murmurhash) and is starting to be used in a few RNGs
is alternating multiplication by an odd constant with rightward xor-
shifts by shift amounts equal to approximately half the number of bits
in the integer:
// K is any odd number that, when looked at in binary, has no obvious
repeating pattern and no large region of only 1s or 0s
const Uint32 K = 1782361823;
a ^= a >> 16; // if a is a 32 bit number
a = a * K;
a ^= a >> 16;
a = a * K; // repeating for improved quality
a ^= a >> 16;
a = a * K;
a ^= a >> 16;

If "a" starts as zero then it will end as zero, but asside from that
doing 3 or 4 of those multiply + xorshift combinations will do a very
good job of scrambling the value thoroughly. On high end 64 bit CPUs
it may in fact be the most efficient possible way to scramble a
value. I usually avoid that method though, because while it's very
efficient on some types hardware it can be very inefficient on other
types of hardware.

Cryptography folks often use other combinations for breaking up
patterns, mostly involving either table lookups (aka s-boxes) or
arrangements of bitwise operators that seem a bit slow to me relative
to the amount of mixing done, like "a = (b & c) | (~b & d)".

Karl-Uwe Frank

unread,

Jan 27, 2012, 8:21:29 PM1/27/12

to

First of all let my say thank you again for your very profound
information and advice.

> I'll get to the other stuff later, but for now:
>
>> tz = ((q>> 16) | (q<< 11));
>
>> My intention is to have an unequal and partial shifting of bits in only
>> taking minor parts of the variable. You mentioned that I should fix it
>> so that they result of "q" properly add up to 32. Would you please be so
>> kind and explain why you think that it might be necessary?
>

> 2. A barrel shift is a single fast opcode on x86 (and many other
> architectures). I think it's ROL for left rotate, ROR for right
> rotate in x86 asm. That non-barrel-shift something else is a slower
> sequence of 3 or 4 or 5 opcodes.
>

I can confirm that changing this line into

tz = (q >> 16) | (q << 16);

speed up the whole process without loosing anything. So thanks for
noticing and explaining so well.

> 3. That non-barrel-shift something else is pretty much always worse
> than a barrel shift. In particular, used in a chaotic RNG like that
> it is less "random" because if "q" is random then 5 of the bits of
> "tz" will have a 75% chance of being 1s (and a 25% chance of being 0s)
> instead of the desired 50/50.

That was not really my intention. I made up a little Script to visualise
the binary output of each variable and what you describe was clearly to
observe.

> If you want a not-exactly-barrel-shift
> that accomplishes the same basic thing, instead try replacing the
> bitwise-or with a bitwise-xor: (q>> 16) ^ (q<< 11). That's not as
> fast as a barrel shift on many architectures, but it's slightly better
> than a barrel shift in terms of what it accomplishes for a chaotic
> RNG. In that form of not-barrel-shift the two shift amounts need to
> add up to less than the size of the integer.
>

I will give that a try, let's see how it inflects the results.

> For breaking up patterns in chaotic RNGs, in my experience the fastest
> way is mixing up addition, barrel shifts, and rightward xor-shift
> combinations, as in my RNG sfc (version 3, 16 bit variant):
> Uint16 a, b, counter;
> Uint16 raw16() {
> Uint16 tmp = a + b + counter++;
> a = b ^ (b>> 2);
> b = ((b<< 5) | (b>> (16-5))) + tmp;
> return tmp;
> }
> That's a barrel shift, a rightward xor-shift pair, some addition, and
> a counter. That fails my standard tests after 128 GB (half an hour),
> and passes TestU01 BigCrush.
>

Just amazing how that could pass BigCrush, not a good reputation for
TestU01 though ... and in fact one more reason to use PractRand ;-)

> or my current candidate for version 4 of that:
> Uint16 a, b, c, counter;
> Uint16 raw16() {
> Uint16 old = a + b + counter++;
> a = b ^ (b>> 3);
> b = c + (c<< 2);
> c = ((c<< 7) | (c>> (16-7))) + tmp;
> return old;
> }
> Mostly the same stuff, but with 3 state variables besides the counter
> now. Note that "c + (c<< 2)" is equivalent to "c * 5", but is faster
> than multiplication on most architectures.

Yepp, more state variable that change on each run are a essential from
my point of view, even sometimes a bit difficult to handle for reliable
results.

> That version passed 32 terabytes of my standard tests (and passes
> TestU01 BigCrush). Which seems very good for a few simple operations
> on 16 bit integers. I tried removing each non-essential operation in
> it to see which ones were most important...
> changing "a + b + counter++" to "a + b" caused it to fail after 64
> GB.
> changing "a + b + counter++" to "a + counter++" caused it to fail
> after 8 TB (still very very good)
> changing "b ^ (b>> 3)" to "b" caused it to fail after 128 GB
> changing "c + (c<< 2)" to "c" caused it to fail after 1 TB
> Those were the operations I considered non-essential.
> I test that at 64 bit as it's too slow / impossible to find flaws
> empirically testing at 32 bit.
>

Still unbelievable how a 16-bit PRNG can pass BigCrush? Do you
concatenate two 16-bit outputs into one 32-bit value before testing?

Regarding the counter, I have changed one line from

s = (s + tt + (s >> 14));

into

s = (s + tt + ((s << 13) ^ (s >> 19)));

because when I visualise s (which is the "s"tepper) I become clear that
it was increasing to slow but than falling back to fast.

q (which is the e"q"ualiser) perform quite well so far.

> Another combination that has gained popularity lately in some hash
> functions (namely murmurhash) and is starting to be used in a few RNGs
> is alternating multiplication by an odd constant with rightward xor-
> shifts by shift amounts equal to approximately half the number of bits
> in the integer:
> // K is any odd number that, when looked at in binary, has no obvious
> repeating pattern and no large region of only 1s or 0s

By the way is there any program you know about to visualize binary
values of a running PRNG?

> const Uint32 K = 1782361823;
> a ^= a>> 16; // if a is a 32 bit number
> a = a * K;
> a ^= a>> 16;
> a = a * K; // repeating for improved quality
> a ^= a>> 16;
> a = a * K;
> a ^= a>> 16;
>
> If "a" starts as zero then it will end as zero, but asside from that
> doing 3 or 4 of those multiply + xorshift combinations will do a very
> good job of scrambling the value thoroughly. On high end 64 bit CPUs
> it may in fact be the most efficient possible way to scramble a
> value. I usually avoid that method though, because while it's very
> efficient on some types hardware it can be very inefficient on other
> types of hardware.
>

Never heard of murmurhash before, but it looks extremly simple, I will
give it s try with TestU01 just to see how it performs.

> Cryptography folks often use other combinations for breaking up
> patterns, mostly involving either table lookups (aka s-boxes) or
> arrangements of bitwise operators that seem a bit slow to me relative
> to the amount of mixing done, like "a = (b& c) | (~b& d)".

That's a very interesting hint, thanks a lot, very useful perhaps
because I am currently developing a cipher where it is intended to uses
tt32 at the initialisation stage, but the internal permutation needs
some tweaks. Perhaps what you suggest here might help.

Cheers,
Karl-Uwe

orz

unread,

Jan 28, 2012, 1:40:21 AM1/28/12

to

On Jan 26, 5:06 am, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
> On 25.01.12 08:33, orz wrote:> Sorry, I kind of lost track of this thread somehow, I'm back now.
>
> > PractRand 0.85 is now out, fixing a number of issues. And I released
> > it with test.cpp testing some other RNG by default *again*, dammit.
> > I'll find some excuse to release a 0.86 quickly this time to fix
> > that.
>
> Thanks so much for getting back.
>
> ... and perhaps you may consider integrating a function that just take a
> 32-bit or 64-bit result for the test cycles? That way it would be easy
> to have an external PRNG which pipes in it's results to PractRand and
> therefore give a greater flexibility and no need to compile PractRand
> all the time completely even when the algorithm that has to be tested
> got some minor changes. What do your think about such a possibility to
> pipe in 32-bit or 64-bit result for the test cycles?

PractRand doesn't need to get recompiled all the time. test.cpp (or
anything else in the test/ folder) can be recompiled independently of
everything else. I just don't have a makefile to do that properly
with gcc, as I have difficulty dealing with gnu makefiles. If you
want, you can make Raw_DummyRng::raw32() just call any function you
want (though you'll have to figure out seeding on your own if you do
that, as PractRand won't be able to find any state that walk_state
doesn't tell it about). For instance, you could have it say:
Uint32 raw32() {
Uint32 return_value;
if (1 == std::fread(&return_value, sizeof(return_value), 1, stdin))
return return_value;
std::fprintf(stderr, "error reading input\n");
std::exit(1);
}
And it would take piped in input. Well, on unix... when I try that on
windows I get strange results.

> tz = ((q >> 16) | (q << 11));
>
> My intention is to have an unequal and partial shifting of bits in only
> taking minor parts of the variable. You mentioned that I should fix it
> so that they result of "q" properly add up to 32. Would you please be so
> kind and explain why you think that it might be necessary?
>
>
>
>
>
>
>
>
>

> >> // generate new s and q

> >> s = (s + tt + (s>> 14));

About what I'd expect. Except the speed... that looks like
optimization is disabled (ie the "-O3" part of the command line was
omitted). Note that for most of the tests in the expanded test set it
won't even guestimate p-values, just classify results as one of
"suspicious?", "very suspicious?", or "FAIL?".

> > Also, when using gcc as a compiler, you shouldn't compare the speed in
> > the "raw" (on 0.84) or "lightweight" (on 0.85) column for your RNG
> > with the speeds of the recommended RNGs, because it will inline yours
> > if optimization is enabled, but not the others, so it's not an apples
> > to apples situation. On MSVC and (I think) Intel-C it will inline all
> > of them to it is apples to apples.
>
> Thanks for the remark, currently I am not looking at speed, more on
> reliability and random quality.
>
> > Planned for the next few PractRand releases:
> > 1. an example program that tests an RNG using multiple threads at once
> > for better speed on multi-core CPUs
> > 2. some templates to make it easier generate a polymorphic RNG from a
> > raw RNG without having to adjust the code when the number of bits
> > outputted by the raw RNG gets changed
> > 3. some changes to how test results get handled in the library to make
> > it easier to figure out the nature of the bias detected by the tests.
> > For instance, currently if it says that you failed BCFN-13, that means
> > that long range linear correlation was detected... but that doesn't
> > say whether "long range" means at a scale of 8 bytes or at a scale of
> > 8 gigabytes, or at all scales in between.
>
> And please consider the possibility to pipe in results from external or
> linked functions as mentioned above.
>
> Cheers,
> Karl-Uwe

It kind of amazes me that something that simple can do well on tests.
Failing PractRand at 128 GB really isn't that bad by most standards.
The 32 bit version passes PractRand, at least as far out as I've
tested it (about 2 TB IIRC), as does the 64 bit version.

Generally, if an RNG fails PractRand at 64 GB or less then BigCrush
has a good chance of catching it, at 128 to 256 GB then BigCrush has a
fair chance of catching it, and at 512 GB to 2 TB only a small chance
of catching it, and beyond 2 TB almost no chance of catching it.
BigCrush does slightly better than that on non-chaotic / single
cycle / linear RNGs, slightly worse than that on chaotic RNGs, and
much worse on cyclic-buffer / fibonacci-style RNGs other than GFSRs.
There are a few exceptions though, so for important RNGs testing with
both is a good idea.

Instead of "I test that at 64 bit" I meant "I test that at 16 bit",
but you already knew that.

> Still unbelievable how a 16-bit PRNG can pass BigCrush? Do you
> concatenate two 16-bit outputs into one 32-bit value before testing?

Yes. I concatenate 2 16 bit values or 4 8 bit values to make each 32
bit value for TestU01. For 64 bit values I've changed back and fort a
bit, sometimes returning first the lower 32 bits then the upper 32
bits, sometimes returning only the lower 32 bits and discarding the
rest, but really, I often don't bother testing the 64 bit version on
TestU01 if the 16 and 32 bit versions both pass. RNGs that produce
anything other than 8, 16, 32, or 64 bits at once I generally refuse
to look at, but sometimes I'll scale something down to 4 or 2 or 1 bit
math and of course pack them together to produce 32 bits for TestU01
purposes or 8 bits for PractRand purposes.

Note that there is no fundamental need for large integer sizes to
produce good quality (though there is a need for at least a minimal
state size), though it usually helps. For instance, my efiix RNG
algorithm intended to be cryptographically secure (though I'm not 100%
sure on that, and the crypto community certainly wouldn't accept it),
should easily pass any and all statistical tests even when scaled down
to operate on 4 bit integers with a total state size of just 112 bits,
despite having a normal state size of 24832 bits when operating on 64
bit integers.

> Regarding the counter, I have changed one line from
>
> s = (s + tt + (s >> 14));
>
> into
>
> s = (s + tt + ((s << 13) ^ (s >> 19)));
>
> because when I visualise s (which is the "s"tepper) I become clear that
> it was increasing to slow but than falling back to fast.

Note that that is equivalent to:
s += (s << 13) ^ (s >> 19);
s += tt;
Further note that the first line is not reversible. Similarly the old
s + (s >> 14) was not reversible either. There are multiple initial
values of s that will produce the same result. I don't know how to
express that well. At the state size you are operating at, 96 bits,
reversibility is somewhat important. It's critical for RNGs with ~64
bits or so of state, very important up to 96 bits or so, somewhat
important up to 128 bits or so of state, and nice for RNGs up to 384
bits or so in size. Even for larger RNGs it still has some
significant theoretical implications.

Maybe try reading:
http://burtleburtle.net/bob/rand/talksmall.html
or
http://burtleburtle.net/bob/rand/smallprng.html

> q (which is the e"q"ualiser) perform quite well so far.
>
> > Another combination that has gained popularity lately in some hash
> > functions (namely murmurhash) and is starting to be used in a few RNGs
> > is alternating multiplication by an odd constant with rightward xor-
> > shifts by shift amounts equal to approximately half the number of bits
> > in the integer:
> > // K is any odd number that, when looked at in binary, has no obvious
> > repeating pattern and no large region of only 1s or 0s
>
> By the way is there any program you know about to visualize binary
> values of a running PRNG?
>

Nothing I know of.

>
>
>
>
>
>
> > const Uint32 K = 1782361823;
> > a ^= a>> 16; // if a is a 32 bit number
> > a = a * K;
> > a ^= a>> 16;
> > a = a * K; // repeating for improved quality
> > a ^= a>> 16;
> > a = a * K;
> > a ^= a>> 16;
>
> > If "a" starts as zero then it will end as zero, but asside from that
> > doing 3 or 4 of those multiply + xorshift combinations will do a very
> > good job of scrambling the value thoroughly. On high end 64 bit CPUs
> > it may in fact be the most efficient possible way to scramble a
> > value. I usually avoid that method though, because while it's very
> > efficient on some types hardware it can be very inefficient on other
> > types of hardware.
>
> Never heard of murmurhash before, but it looks extremly simple, I will
> give it s try with TestU01 just to see how it performs.

If you just do that with a single 32 bit integer and return the
integer it will fail, but mainly due to the small state size. If you
do that with a 64 bit integer it would probably pass BigCrush, but
could possibly fail more stringent tests due to simple state-size
issues. But you can use that as one step in a chaotic RNG, and turn
the quality down if there's other stuff going on... with the right
arrangement of other stuff you can get a reasonable state size and a
fast speed and still keep enough quality to pass both BigCrush and
PractRand easily.

Karl-Uwe Frank

unread,

Jan 28, 2012, 6:34:12 PM1/28/12

to

On 28.01.12 06:40, orz wrote:

[...snip...]

> PractRand doesn't need to get recompiled all the time. test.cpp (or
> anything else in the test/ folder) can be recompiled independently of
> everything else. I just don't have a makefile to do that properly
> with gcc, as I have difficulty dealing with gnu makefiles.

Don't worry, I use a small shell script now for calling the compile process

---------------------------------------
#! /bin/bash
#

HOME=$(echo ~)
THIS=$(pwd)

cp tt32_practrand.cpp $HOME/usr/PractRand_0.84/test/

cd $HOME/usr/PractRand_0.84/

g++ -Iinclude -O3 -o $THIS/tt32_practrand test/tt32_practrand.cpp
src/*.cpp src/RNGs/*.cpp src/RNGs/entropy_pools/*.cpp src/RNGs/other/*.cpp

cd $THIS

exit 0
---------------------------------------

> want, you can make Raw_DummyRng::raw32() just call any function you
> want (though you'll have to figure out seeding on your own if you do
> that, as PractRand won't be able to find any state that walk_state
> doesn't tell it about). For instance, you could have it say:
> Uint32 raw32() {
> Uint32 return_value;
> if (1 == std::fread(&return_value, sizeof(return_value), 1, stdin))
> return return_value;
> std::fprintf(stderr, "error reading input\n");
> std::exit(1);
> }
> And it would take piped in input. Well, on unix... when I try that on
> windows I get strange results.
>

Thanks so far for that clever code snippet. I will give feedback soon
how it behave on Mac OSX.

> About what I'd expect. Except the speed... that looks like
> optimization is disabled (ie the "-O3" part of the command line was
> omitted). Note that for most of the tests in the expanded test set it
> won't even guestimate p-values, just classify results as one of
> "suspicious?", "very suspicious?", or "FAIL?".
>

Now I never forget paramater "-O3" because it's in the compile script :-)

>>> If you want a not-exactly-barrel-shift
>>> that accomplishes the same basic thing, instead try replacing the
>>> bitwise-or with a bitwise-xor: (q>> 16) ^ (q<< 11). That's not as
>>> fast as a barrel shift on many architectures, but it's slightly better
>>> than a barrel shift in terms of what it accomplishes for a chaotic
>>> RNG. In that form of not-barrel-shift the two shift amounts need to
>>> add up to less than the size of the integer.
>>
>> I will give that a try, let's see how it inflects the results.
>>

I leave it to tz = (q >> 16) | (q << 16);

>>> or my current candidate for version 4 of that:
>>> Uint16 a, b, c, counter;
>>> Uint16 raw16() {
>>> Uint16 old = a + b + counter++;
>>> a = b ^ (b>> 3);
>>> b = c + (c<< 2);
>>> c = ((c<< 7) | (c>> (16-7))) + tmp;
>>> return old;
>>> }

By the way, shouldn't it read

c = ((c << 7) | (c >> (16-7))) + old;

instead of

c = ((c << 7) | (c >> (16-7))) + tmp;

But nonetheless it is quite compact and elegant.

>>> Mostly the same stuff, but with 3 state variables besides the counter
>>> now. Note that "c + (c<< 2)" is equivalent to "c * 5", but is faster
>>> than multiplication on most architectures.

I see, for that I have learned now trying to avoid "normal"
multiplication on variables for speed, in fact I'm don't use it since
pqRNGr2.

>> Still unbelievable how a 16-bit PRNG can pass BigCrush? Do you
>> concatenate two 16-bit outputs into one 32-bit value before testing?
>
> Yes. I concatenate 2 16 bit values or 4 8 bit values to make each 32
> bit value for TestU01. For 64 bit values I've changed back and fort a
> bit, sometimes returning first the lower 32 bits then the upper 32
> bits, sometimes returning only the lower 32 bits and discarding the
> rest, but really, I often don't bother testing the 64 bit version on
> TestU01 if the 16 and 32 bit versions both pass. RNGs that produce
> anything other than 8, 16, 32, or 64 bits at once I generally refuse
> to look at, but sometimes I'll scale something down to 4 or 2 or 1 bit
> math and of course pack them together to produce 32 bits for TestU01
> purposes or 8 bits for PractRand purposes.
>

Hopefully the piping of values into PractRand work, because as I
mentioned earlier it find problems much faster than TestU01.

> Note that there is no fundamental need for large integer sizes to
> produce good quality (though there is a need for at least a minimal
> state size), though it usually helps. For instance, my efiix RNG
> algorithm intended to be cryptographically secure (though I'm not 100%
> sure on that, and the crypto community certainly wouldn't accept it),

Now I remember that reading that efiix as one of the PRNG's which are
listed first for speed comparison. Some of them are your work, didn't
really realised that before.

Any reason why the crypto community refuse to take a closer look at it?

> should easily pass any and all statistical tests even when scaled down
> to operate on 4 bit integers with a total state size of just 112 bits,
> despite having a normal state size of 24832 bits when operating on 64
> bit integers.

Sounds quite astonishing. Did you ever published it for review?

>> Regarding the counter, I have changed one line from
>>
>> s = (s + tt + (s>> 14));
>>
>> into
>>
>> s = (s + tt + ((s<< 13) ^ (s>> 19)));
>>
>> because when I visualise s (which is the "s"tepper) I become clear that
>> it was increasing to slow but than falling back to fast.
>
> Note that that is equivalent to:
> s += (s<< 13) ^ (s>> 19);
> s += tt;

Yes, that's true but does it really matter, for speed for example?

Just to mention, I have changed tt for 16-bit to 32-bit and take the
parts from x also different now

// take parts from last x and q

tt = x ^ (x >> 14);

The results are somewhat better now.

> Further note that the first line is not reversible. Similarly the old
> s + (s>> 14) was not reversible either. There are multiple initial
> values of s that will produce the same result. I don't know how to
> express that well. At the state size you are operating at, 96 bits,
> reversibility is somewhat important. It's critical for RNGs with ~64
> bits or so of state, very important up to 96 bits or so, somewhat
> important up to 128 bits or so of state, and nice for RNGs up to 384
> bits or so in size. Even for larger RNGs it still has some
> significant theoretical implications.

Due to the fact that I changed "tt" from uint16_t into uint32_t that
problem should have gone, right? Furthermore I really hope that most
parts of the algorithm are not reversible, as I suppose it as a strength
against a cryptanalyse - please don't get me wrong that my intention on
"tt32" was to build a cryptographically secure PRNG.

> Maybe try reading:
> http://burtleburtle.net/bob/rand/talksmall.html
> or
> http://burtleburtle.net/bob/rand/smallprng.html
>

Thanks for these links. I am quite amazed how easy it seems to build a
cryptographically secure PRNG, and FLEA is quit astonishing also! The
last few weeks I was occasionally reading in "Cryptanalytic Attacks on
Pseudorandom Number Generators" and "KISS: A Bit Too Simple" but also,
as you recommended, about RC4. What I am reading lead me to the idea of
expanding "tt32" into an array, using "tt32" as a seeder and also add a
function for changing the complete array values and therefore the
complete internal state periodically. Only I am not quite sure if the
later use of "tt32" for the internal state change would make it
vulnerable for a Divide-and-Conquer-Attack. Basically similar to xqRNG32
but with a state change and a more complex formula. Furthermore I
prevent revealing the internal state and had the idea of using an output
buffer where to draw the random values of, in order to prevent revealing
when in state change, because if there are no random numbers flowing
constantly back an attacker could possibly figure something out mounting
an timing attack.

###########################################################

static uint32_t xx, w[256], sm[256], sc;
static uint32_t bi, br, buffer[4096];
static unsigned char ai;

static change_state = FALSE;

//-----------------------------------------
//
int wsm32_Init()
{
//
// Fill the Array w[256] and sm[256] with Random Values
//
int i, f, ff;
for (i=0; i<256; i++) {
// fast forward
ff = x & 0xffff;
for (f=0; f<ff; f++) x += tt32();

// Fill the PRNG Array
w[i] = tt32();

// fast forward
ff = x & 0xffff;
for (f=0; f<ff; f++) x += tt32();

// Fill the PRNG Array
sm[i] = tt32();
}

// Initialise the Array Index
ff = (x >> 27) * 0x71F;
for (f=0; f<ff; f++) x += tt32();
ai = (x >> 24);

// Initialise the Sequence Counter
ff = (x >> 27) * 0x71F;
for (f=0; f<ff; f++) x += tt32();
sc = x;
}

//-----------------------------------------
//
// wsm32#9 (***** #1 * 12.12.2011 *****)
//
unsigned int wsm32() {
unsigned char i = ai, j, n, k;

uint32_t tt,tz,t1,t2,t3;

uint32_t rv; // ReturnValue

// global sequence counter
sc++;

// take parts from last xx and w[i]
tt = xx ^ (xx >> 14);
tz = ((w[i] >> 16) | (w[i] << 16));

// calculate the other array element indices
j = (w[i] + sc) & 0xff;
n = (w[i] + tt) & 0xff;
k = (w[i] + tz) & 0xff;

// calculate some new array element values
sm[n] = (sm[n] + tt + ((sm[n] << 13) ^ (sm[n] >> 19)));
w[i] = ((w[i] << 5) + tt ^ sm[n]) ^ (tz ^ (sm[n]-1));

// calulate partial values of the array elements
t1 = ((xx << 11) ^ (sm[n] >> 13));

t2 = ((tz << 9) ^ (t1 >> 19));

t3 = ((xx + tt) + (xx >> 23)) << 16;

// calulate the global carry on value
xx = ((t3 + (t1 ^ (t2 >> 15)) + (t2 ^ (t1 >> 17)) + (sm[n] >> 5)));

// calculate the return value
rv = xx ^ (w[i] + sm[k]) ^ (w[n] + sm[j]);

// take 8-bit value as new global carry on array index
ai = (t1 + t2 + t3 + (sc >> 16)) & 0xff;

// periodically change the internal state
if (change_state == FALSE) {
if ((sc & 0xfff) == 0) change_internal_state();
}

return rv;
}

//-----------------------------------------
//
// Loop through all array elements of w[256]
// and sm[256] and calculate new value for
// each element in order to rebuild and change
// the internal state completely
//
void change_internal_state()
{
state_change = TRUE;

int j;
for (j=0; j<256; j++) {
q = w[j];
s = sm[j];

w[j] = tt32();
x = q ^ ((x >> 16) + (s >> 16));
sm[j] = tt32();
}

state_change = FALSE;
Change_Count++;
}

###########################################################

In fact when I now look at it in comparison to ISAAC my impression is
that it's to bloated :-)

Obviously I need a more lean and mean algorithm for my potentially
CSPRNG. ;-)

Cheers,
Karl-Uwe

---
Algorithm wsm32, Copyright (c) 2011 Adverteam Limited (UK), Karl-Uwe
Frank. All rights reserved.

orz

unread,

Jan 31, 2012, 7:15:37 PM1/31/12

to

PractRand 0.86 is ready for release now, just packaging things up
now. It includes fixes for the trivium RNG and the jsf16 RNG, and a
tweak to the seeding of the efiix RNGs. It also adds a multithreaded
test program to improve performance on multicore CPUs. The
multithreaded test program works only on pthreads and WIN32, platforms
that offer neither can't build that one.

On Jan 28, 3:34 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
> On 28.01.12 06:40, orz wrote:
>
> [...snip...]
>
> > PractRand doesn't need to get recompiled all the time. test.cpp (or
> > anything else in the test/ folder) can be recompiled independently of
> > everything else. I just don't have a makefile to do that properly
> > with gcc, as I have difficulty dealing with gnu makefiles.
>
> Don't worry, I use a small shell script now for calling the compile process
>
> ---------------------------------------
> #! /bin/bash
> #
>
> HOME=$(echo ~)
> THIS=$(pwd)
>
> cp tt32_practrand.cpp $HOME/usr/PractRand_0.84/test/
>
> cd $HOME/usr/PractRand_0.84/
>
> g++ -Iinclude -O3 -o $THIS/tt32_practrand test/tt32_practrand.cpp
> src/*.cpp src/RNGs/*.cpp src/RNGs/entropy_pools/*.cpp src/RNGs/other/*.cpp
>
> cd $THIS
>
> exit 0
> ---------------------------------------

I think that script will recompile everything all the time, while a
proper makefile would only recompile stuff that needed to be
recompiled. Of course, a modified script could be made to emulate the
behavior of a makefile.

> > want, you can make Raw_DummyRng::raw32() just call any function you
> > want (though you'll have to figure out seeding on your own if you do
> > that, as PractRand won't be able to find any state that walk_state
> > doesn't tell it about). For instance, you could have it say:
> > Uint32 raw32() {
> > Uint32 return_value;
> > if (1 == std::fread(&return_value, sizeof(return_value), 1, stdin))
> > return return_value;
> > std::fprintf(stderr, "error reading input\n");
> > std::exit(1);
> > }
> > And it would take piped in input. Well, on unix... when I try that on
> > windows I get strange results.
>
> Thanks so far for that clever code snippet. I will give feedback soon
> how it behave on Mac OSX.
>
> > About what I'd expect. Except the speed... that looks like
> > optimization is disabled (ie the "-O3" part of the command line was
> > omitted). Note that for most of the tests in the expanded test set it
> > won't even guestimate p-values, just classify results as one of
> > "suspicious?", "very suspicious?", or "FAIL?".
>
> Now I never forget paramater "-O3" because it's in the compile script :-)

On my computer, the expanded standard test set takes about 70 seconds
per GB with optimization enabled (compared to about 13 seconds per GB
on the standard test set), using a single core. Your numbers there
were on the order of 400 seconds per gigabyte for the same set of
tests. Two basic possibilities come to mind:
1. Your build was somehow much lower performance, perhaps due to
optimization not being enabled somehow or some other difference in
compilers or compiler settings. But my performance varies very little
across the compilers I can easily test on, so disabled optimization
seems more likely than any other compiler differences.
2. Your hardware is much worse for the purposes of these tests than
mine. On the surface this seems unlikely, since I get similar (maybe
10% slower) results on my old Core2, and not that much worse on my
older still Athlon 64 x2 (which I can no longer test on as it
literally exploded two years ago). However... these tests speeds can
be extremely sensitive to cache size in some cases, it's possible that
if your CPU has a smaller cache than the CPUs I've tested it on it
could drastically reduce performance. I've tried to tune the
parameters to the stuff in the standard test set to minimize
sensitivity to cache sizes where possible, but I have made much less
effort on the expanded test set, and I haven't touched any of that up
lately so it could have been detuned a bit somehow.

> >>> If you want a not-exactly-barrel-shift
> >>> that accomplishes the same basic thing, instead try replacing the
> >>> bitwise-or with a bitwise-xor: (q>> 16) ^ (q<< 11). That's not as
> >>> fast as a barrel shift on many architectures, but it's slightly better
> >>> than a barrel shift in terms of what it accomplishes for a chaotic
> >>> RNG. In that form of not-barrel-shift the two shift amounts need to
> >>> add up to less than the size of the integer.
>
> >> I will give that a try, let's see how it inflects the results.
>
> I leave it to tz = (q >> 16) | (q << 16);
>
> >>> or my current candidate for version 4 of that:
> >>> Uint16 a, b, c, counter;
> >>> Uint16 raw16() {
> >>> Uint16 old = a + b + counter++;
> >>> a = b ^ (b>> 3);
> >>> b = c + (c<< 2);
> >>> c = ((c<< 7) | (c>> (16-7))) + tmp;
> >>> return old;
> >>> }
>
> By the way, shouldn't it read
>
> c = ((c << 7) | (c >> (16-7))) + old;

Yes. My mishmash of retyped code and cut-and-pasted code sometimes
ends up with silly things like that in it.

> instead of
>
> c = ((c << 7) | (c >> (16-7))) + tmp;
>
> But nonetheless it is quite compact and elegant.
>
> >>> Mostly the same stuff, but with 3 state variables besides the counter
> >>> now. Note that "c + (c<< 2)" is equivalent to "c * 5", but is faster
> >>> than multiplication on most architectures.
>
> I see, for that I have learned now trying to avoid "normal"
> multiplication on variables for speed, in fact I'm don't use it since
> pqRNGr2.

Note that smart compilers will usually automatically pick the
implementation best for the hardware they are targeting. Which may or
may not be the hardware that you actually run on. But for things
intended to potentially be used on platforms without fast
multiplication choosing a constant with a simple implementation can be
very important. Though do realize that constants with simple
implementations do a lot less mixing than those that require fast
multiplication. In this case it doesn't matter much as even a bad
constant is good enough.

> >> Still unbelievable how a 16-bit PRNG can pass BigCrush? Do you
> >> concatenate two 16-bit outputs into one 32-bit value before testing?
>
> > Yes. I concatenate 2 16 bit values or 4 8 bit values to make each 32
> > bit value for TestU01. For 64 bit values I've changed back and fort a
> > bit, sometimes returning first the lower 32 bits then the upper 32
> > bits, sometimes returning only the lower 32 bits and discarding the
> > rest, but really, I often don't bother testing the 64 bit version on
> > TestU01 if the 16 and 32 bit versions both pass. RNGs that produce
> > anything other than 8, 16, 32, or 64 bits at once I generally refuse
> > to look at, but sometimes I'll scale something down to 4 or 2 or 1 bit
> > math and of course pack them together to produce 32 bits for TestU01
> > purposes or 8 bits for PractRand purposes.
>
> Hopefully the piping of values into PractRand work, because as I
> mentioned earlier it find problems much faster than TestU01.
>
> > Note that there is no fundamental need for large integer sizes to
> > produce good quality (though there is a need for at least a minimal
> > state size), though it usually helps. For instance, my efiix RNG
> > algorithm intended to be cryptographically secure (though I'm not 100%
> > sure on that, and the crypto community certainly wouldn't accept it),
>
> Now I remember that reading that efiix as one of the PRNG's which are
> listed first for speed comparison. Some of them are your work, didn't
> really realised that before.

HC-256 and Trivium are standard crypto RNGs, mt19937 is a standard
RNG, jsf and ISAAC are from Bob Jenkins (though he never named jsf, so
I named it that, for Jenkin's Small Fast RNG). I wrote sfc, efiix,
xsm, sha2_based_pool (though that's mostly just SHA2-512, a standard
crypto hash function), and arbee (though that's mostly just jsf).

Efiix is not particularly fast in any absolute sense (ie it's
significantly slower than jsf or sfc), though it is faster than
similar algorithms like ISAAC, HC-256, or RC4 (which are all dynamic s-
box crypto RNG algorithms... which generally tends to be the fastest
category of crypto RNG algorithms that I've seen in software) and
common algorithms like mt19937. It's the fastest not-obviously-broken
intended-for-crypto algorithm that I know of.

> Any reason why the crypto community refuse to take a closer look at it?

1. I don't actually know how to find or talk to the crypto
community. This usenet group is about all I know about. Though I did
recently find this forum, haven't tried posting there yet as I think
it's specifically for ecrypt algorithms: http://www.ecrypt.eu.org/stream/phorum/list.php?1
2. The cypto community has very high standards. Which I like, for
the most part - I wish the non-crypto PRNG community had standards as
high. But in the context of their desire for security and
provability, this ends up meaning that modern crypto RNGs much be
designed concurrently with proofs about them, a practice I am not
particularly fond of and would have some trouble following myself.
3. My impression is that the crypto community is not terribly fond of
algorithms by others. Possibly for good reason (see the thread in
this group titled "BulletProof Random Number Generator"), but still.
I look at the response to Jenkins ISAAC for instance: two papers that
I can find by people who appear to be members of the crypto community,
one of treats ISAAC seriously but doesn't manage to accomplish much,
the other of which is rather rude to ISAAC and seems to make some
significant conceptual flaws in its analysis rending the whole paper
meaningless. And a third paper in some asian thing that I haven't
managed to find, apparently based upon a mis-scan or typo, attacking
an RNG that isn't ISAAC but that they mistook for ISAAC. That's over
like a decade and a half. I doubt I could get any better response
than he got without a ton of effort, plus I'm still revising efiix a
bit.

> > should easily pass any and all statistical tests even when scaled down
> > to operate on 4 bit integers with a total state size of just 112 bits,
> > despite having a normal state size of 24832 bits when operating on 64
> > bit integers.
>
> Sounds quite astonishing. Did you ever published it for review?

Um... sort of? I've posted earlier versions of it in this usenet
group, and its source code is included in PractRand. Note that the
technical merits I described there don't actually mean much on their
own, though in context of the other details on the algorithm it comes
out looking pretty good for its speed.

> >> Regarding the counter, I have changed one line from
>
> >> s = (s + tt + (s>> 14));
>
> >> into
>
> >> s = (s + tt + ((s<< 13) ^ (s>> 19)));
>
> >> because when I visualise s (which is the "s"tepper) I become clear that
> >> it was increasing to slow but than falling back to fast.
>
> > Note that that is equivalent to:
> > s += (s<< 13) ^ (s>> 19);
> > s += tt;
>
> Yes, that's true but does it really matter, for speed for example?

That wasn't supposed to be an independent point, just a poorly worded
lead in to the part about reversibility.

Note:
1. FLEA is actually very similar to earlier versions of my efiix
algorithm, though it predates mine IIRC. Note the Jenkins does not
think well of its security. It does not meet the current standards of
the crypto community.
2. RC4 is not secure, though it functionally is if you skip the first
few hundred outputs. Even then though, it does not meet (or even come
close to) the current standards of the crypto community.
3. The current standards of the crypto community are very high,
particularly with regard to seeding functions. The only RNG I have
examined that meets those standards is HC-256. Assuming I'm
interpreting what they say correctly.

Karl-Uwe Frank

unread,

Feb 3, 2012, 1:54:30 PM2/3/12

to

On 01.02.12 00:15, orz wrote:
> PractRand 0.86 is ready for release now, just packaging things up
> now. It includes fixes for the trivium RNG and the jsf16 RNG, and a
> tweak to the seeding of the efiix RNGs. It also adds a
> multithreaded test program to improve performance on multicore CPUs.
> The multithreaded test program works only on pthreads and WIN32,
> platforms that offer neither can't build that one.
>

Do you suppose I should use 0.86 if available for download? As I am
still not capable understanding C++ listings maybe it's the best
sticking with 0.85.

> On Jan 28, 3:34 pm, Karl-Uwe Frank <karl.fr...@freecx.co.uk> wrote:
> > On 28.01.12 06:40, orz wrote:
> >
> > [...snip...]
> >
> >> PractRand doesn't need to get recompiled all the time. test.cpp
> >> (or anything else in the test/ folder) can be recompiled
> >> independently of everything else. I just don't have a makefile
> >> to do that properly with gcc, as I have difficulty dealing with
> >> gnu makefiles.
> >
> > Don't worry, I use a small shell script now for calling the compile
> > process
> >
> > --------------------------------------- #! /bin/bash #
> >
> > HOME=$(echo ~) THIS=$(pwd)
> >
> > cp tt32_practrand.cpp $HOME/usr/PractRand_0.84/test/
> >
> > cd $HOME/usr/PractRand_0.84/
> >
> > g++ -Iinclude -O3 -o $THIS/tt32_practrand test/tt32_practrand.cpp
> > src/*.cpp src/RNGs/*.cpp src/RNGs/entropy_pools/*.cpp
> > src/RNGs/other/*.cpp
> >
> > cd $THIS
> >
> > exit 0 ---------------------------------------
>
> I think that script will recompile everything all the time, while a
> proper makefile would only recompile stuff that needed to be
> recompiled. Of course, a modified script could be made to emulate
> the behavior of a makefile.
>

Yes it compiles everything all the time, but I will try the variable
piping option soon and if that works as expected there will be no need
to compile PractRand after each minor tweak of my algorithms.

>
> >> About what I'd expect. Except the speed... that looks like
> >> optimization is disabled (ie the "-O3" part of the command line
> >> was omitted). Note that for most of the tests in the expanded
> >> test set it won't even guestimate p-values, just classify results
> >> as one of "suspicious?", "very suspicious?", or "FAIL?".
> >
> > Now I never forget paramater "-O3" because it's in the compile
> > script :-)
>
> On my computer, the expanded standard test set takes about 70
> seconds per GB with optimization enabled (compared to about 13
> seconds per GB on the standard test set), using a single core. Your
> numbers there were on the order of 400 seconds per gigabyte for the
> same set of tests.

My Mac running PractRand 0.85 is quite a very very old machine, perhaps
more than 6 years now, but still doing his job. Also OS X 10.4 is not
state of the art either, but for compatibility of some software I need
to run I have to stick with is also.

> Two basic possibilities come to mind: 1. Your build was somehow much
> lower performance, perhaps due to optimization not being enabled
> somehow or some other difference in compilers or compiler settings.
> But my performance varies very little across the compilers I can
> easily test on, so disabled optimization seems more likely than any
> other compiler differences.

Maybe I should try some different settings other than "-O3"?

> 2. Your hardware is much worse for the purposes of these tests than
> mine. On the surface this seems unlikely, since I get similar
> (maybe 10% slower) results on my old Core2, and not that much worse
> on my older still Athlon 64 x2 (which I can no longer test on as it
> literally exploded two years ago). However... these tests speeds
> can be extremely sensitive to cache size in some cases, it's possible
> that if your CPU has a smaller cache than the CPUs I've tested it on
> it could drastically reduce performance. I've tried to tune the
> parameters to the stuff in the standard test set to minimize
> sensitivity to cache sizes where possible, but I have made much less
> effort on the expanded test set, and I haven't touched any of that
> up lately so it could have been detuned a bit somehow.
>

Unfortunately the MacBook Pro has a defective RAM, so I can't update it
to Snow Leopard and try PractRand on this machine now. Have to be
patient until I can afford buying new memory. Hopefully this computer
will run properly than.

> >> Note that there is no fundamental need for large integer sizes
> >> to produce good quality (though there is a need for at least a
> >> minimal state size), though it usually helps. For instance, my
> >> efiix RNG algorithm intended to be cryptographically secure
> >> (though I'm not 100% sure on that, and the crypto community
> >> certainly wouldn't accept it),
> >
> > Now I remember that reading that efiix as one of the PRNG's which
> > are listed first for speed comparison. Some of them are your work,
> > didn't really realised that before.
>
> HC-256 and Trivium are standard crypto RNGs, mt19937 is a standard
> RNG, jsf and ISAAC are from Bob Jenkins (though he never named jsf,
> so I named it that, for Jenkin's Small Fast RNG). I wrote sfc,
> efiix, xsm, sha2_based_pool (though that's mostly just SHA2-512, a
> standard crypto hash function), and arbee (though that's mostly just
> jsf).
>

Um, never heard of HC-256 or Trivium before, but now it was worth to
look at. Mostly funny because as mentioned earlier I am working on a
more lean and mean PRNG that uses two secret arrays, like HC-256.
Somehow there seems to be some basic and common ways to implement such
CSPRNG, perhaps all of them inspired by RC4. Also what I read about
ISAAC it's looks quite as RC4 was Bob Jenkins' inspiration.

I have taken a look at the sample PRNG you included with PractRand, but
I must confess that I did not understand how the work because of the C++
listing. Currently I can read a ANSI-C source code and that of HC-256
was quit easy to understand. Also those listings on the Bob Jenkins web
pages you recommended. So these were of great help and an inspiration,
thanks to your recommendation.

> Efiix is not particularly fast in any absolute sense (ie it's
> significantly slower than jsf or sfc), though it is faster than
> similar algorithms like ISAAC, HC-256, or RC4 (which are all dynamic
> s- box crypto RNG algorithms... which generally tends to be the
> fastest category of crypto RNG algorithms that I've seen in software)
> and common algorithms like mt19937. It's the fastest
> not-obviously-broken intended-for-crypto algorithm that I know of.
>

From my point of view the priority for speed is lower than security
when it comes to a CSPRNG, and when I take a look at the HC-256 source
it's quite amazing to realise how many arithmetic operations where used.

Well regarding Efiix I can't say anything, perhaps later when I'm able
to read the source.

> > Any reason why the crypto community refuse to take a closer look at
> > it?
>
> 1. I don't actually know how to find or talk to the crypto
> community. This usenet group is about all I know about. Though I
> did recently find this forum, haven't tried posting there yet as I
> think it's specifically for ecrypt algorithms:
> http://www.ecrypt.eu.org/stream/phorum/list.php?1

Yes, it seems that only in these news groups there is some kind of
communication for non-pro-cryptographer's. The ecrypt forum looks like
only for pro-cryptographer's at the first glance, but definitively worth
reading.

> 2. The cypto community has very high standards. Which I like, for
> the most part - I wish the non-crypto PRNG community had standards
> as high. But in the context of their desire for security and
> provability, this ends up meaning that modern crypto RNGs much be
> designed concurrently with proofs about them, a practice I am not
> particularly fond of and would have some trouble following myself.

I absolutely agree with you.

> 3. My impression is that the crypto community is not terribly fond
> of algorithms by others. Possibly for good reason (see the thread
> in this group titled "BulletProof Random Number Generator"), but
> still. I look at the response to Jenkins ISAAC for instance: two
> papers that I can find by people who appear to be members of the
> crypto community, one of treats ISAAC seriously but doesn't manage to
> accomplish much, the other of which is rather rude to ISAAC and seems
> to make some significant conceptual flaws in its analysis rending the
> whole paper meaningless. And a third paper in some asian thing that
> I haven't managed to find, apparently based upon a mis-scan or typo,
> attacking an RNG that isn't ISAAC but that they mistook for ISAAC.
> That's over like a decade and a half. I doubt I could get any better
> response than he got without a ton of effort, plus I'm still revising
> efiix a bit.
>

Perhaps it's true that they don't want "outsiders" generating some-thing
useful, so therefore they will either ignore it or perhaps trying to be
harsh or rude, or turn everything into a joke. My opinion is that if
some-one trying to invent a potentially CSPRNG the first is to learn how
to give the necessary proof. That way you learn to understand where the
real flaws are - and there are always some, even in AES. I made the
experience of a rude member of this newsgroup myself, but finally it was
a very constructive communication. One can learn a lot, even from
comments that are rude in the first place. My hope is still that there
are crypto-pro's around here willing to guide a novice on his way.

Regarding ISAAC and the cryptanalysis you mentioned I suppose there are
some people more like-to-be-cryptanalyst trying to stir up attention at
the cryptanalyst circle on their work in publishing even misleading
analysis, but of course these are just hypothetical thoughts of mine.
Finally it seems that we are doomed to figure it out and convince
ourselves first on how strong a SPRNG or cipher we invented is - but one
can learn everything and people helping people is quite powerful stuff,
so never give up.

About the "BulletProof Random Number Generator" and "Bitmap
Steganography freeware" it seems that jmorton123 is quite a novice to,
even after all those years he's offering his freeware, but from my point
of view he seems to be a bit resistant in accepting some basic facts and
also resistant to learning. Maybe he will listen to all those trying to
help him and trying to be such good guides for him. If not perhaps he
will end up like this http://kryptochef.net/indexh2e.htm - more on that
issue here
http://www.schneier.com/blog/archives/2006/06/the_doghouse_kr.html

> >> should easily pass any and all statistical tests even when scaled
> >> down to operate on 4 bit integers with a total state size of just
> >> 112 bits, despite having a normal state size of 24832 bits when
> >> operating on 64 bit integers.
> >
> > Sounds quite astonishing. Did you ever published it for review?
>
> Um... sort of? I've posted earlier versions of it in this usenet
> group, and its source code is included in PractRand. Note that the
> technical merits I described there don't actually mean much on their
> own, though in context of the other details on the algorithm it
> comes out looking pretty good for its speed.
>

Well, in the end there was some discussion as I found out, but finally
drifting away from your PRNG.

> >> Note that that is equivalent to: s += (s<< 13) ^ (s>> 19); s +=
> >> tt;
> >
> > Yes, that's true but does it really matter, for speed for example?
>
> That wasn't supposed to be an independent point, just a poorly
> worded lead in to the part about reversibility.
>

All right. Maybe you can post a link where this "reversibility" is
described and why it is important? Sadly I can't find anything useful on
the matter. From my point of view a CSPRNG should not be "reversible" in
terms of submitting a random output as seed and turning back to the IV -
but maybe I totally misinterpret "reversibility".

> Note: 1. FLEA is actually very similar to earlier versions of my
> efiix algorithm, though it predates mine IIRC. Note the Jenkins does
> not think well of its security. It does not meet the current
> standards of the crypto community.

Here we are again ;-) ... and if it's creator - who seems to have some
profound knowledge of crypto security and build such a strong one as
ISAAC - doesn't count FLEA as cryptographically secure we eventually
should take his word.

> 2. RC4 is not secure, though it functionally is if you skip the
> first few hundred outputs. Even then though, it does not meet (or
> even come close to) the current standards of the crypto community.

And there is where I am heading for. For me RC4 and ISAAC are easy to
understand why they are considered cryptographically secure. Now isn't
it obvious taking their principles to heart and work on something
similar? Perhaps not wsm32#9, because it looks rather like madness, but
nonetheless could be stronger than supposed on the first glance :-)

> 3. The current standards of the crypto community are very high,
> particularly with regard to seeding functions. The only RNG I have
> examined that meets those standards is HC-256. Assuming I'm
> interpreting what they say correctly.

Oh yes, the seeding function or the key schedule are of great
importance, because there it is where the adversary would probably start
his attack. My first impression of HC-256 is that it uses some really
clever techniques and I need to read the paper and source code again in
order to understand it's proposition.

Cheers,
Karl-Uwe