Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

statistical tests for random number generators

86 views

Skip to first unread message

orz

unread,

Aug 13, 2010, 2:42:24 PM8/13/10

This thread is intended for discussion of statistical tests for random
number generators and software packages that include such tests.

Specifically, the software packages that I'm aware of:

TestU01, by Richard Simard and Pierre L'Ecuyer
http://www.iro.umontreal.ca/~simardr/testu01/tu01.html

RaBiGeTe, by Cristiano
http://www.webalice.it/cristiano.pi/rabigete/

Diehard, by George Marsaglia
http://www.stat.fsu.edu/pub/diehard/

ENT, by John Walker
http://www.fourmilab.ch/random/

Dieharder, by Robert G. Brown (rgb) and Dirk Eddelbuettel and David
Bauer
http://www.phy.duke.edu/~rgb/General/dieharder.php

PractRand, by myself, not yet released but I intend to release this
weekend.

orz

unread,

Aug 13, 2010, 2:43:17 PM8/13/10

On my list of software packages I forgot to include the NIST
package.

On Aug 13, 8:20 am, "Cristiano" <cristiaN...@gmail.com> wrote:
> orz wrote:
> > I have not been very impressed with Diehard or the NIST stuff or RaBiGeTe
> > or ENT.
>
> As I told you, I got little feedback for RaBiGeTe. What should I do to get
> RaBiGeTe more "impressive"?
>
> Cristiano

Well... things I'd like to see:
1. A clearly defined standard set of tests. Or several with clear
and simple definitions of what they're specialized for. ie a set
optimized for finding bias efficiently on a per-time basis and another
for finding bias efficiently on a per-bit basis.

2. A list of how RaBiGeTes standard batteries of tests perform on a
long list of different RNGs. And preferably a comparison to how those
RNGs perform on other batteries of tests. Grab the TestU01 paper and
take a look at pages 28 and 29 - there's a list there of like 100
different RNGs, with the number of subtests failed and the number of
subtests that gave suspicious results for each RNG on SmallCrush,
Crush, and BigCrush. Failure there is defined as p < 10^-10 (or >
1-10^-10), suspicious as p<10^-6 (or > 1-10^-6). You can glance back
a few posts to see an informal poorly formated semi-equivalent for my
test suite. Hopefully your list of RNGs would include a great deal of
diversity... I'd like to see a spectrum of low to high quality RNGs
for each of simple arithmatic/bitwise type RNGs, simple multiplicative
RNGs, small medium and large cyclic buffer (fibonaci-style) RNGs,
small and medium indirection-based RNGs, and complex / special RNGs.
Preferably including single-cycle reversible RNGs, multicycle
reversible RNGs, and multicycle irreversible RNGs in each category,
and maybe a few single-cycle irreversible RNGs.

3. Hopefully the stuff documented in #2 above shows you in a good
light. TestU01, for instance, finds flaws in a variety of RNGs,
including some that are considered good quality by some folks like
MT19937, one version of Marsaglia's KISS, etc. Some of those bias
detections seem a bit brittle - I initially failed to reproduce the
MT19937 bias detection because I accidentally walked the buffer
backwards in my implementation of MT19937 and TestU01 couldn't find
flaws with that tiny change made, but still, any bias detection in
MT19937 is pretty good. My own tests fail to find flaws in MT19937
(though they will if the tempering is removed from MT19937, or if the
GFSR size is reduced, and those bias detections are not brittle), but
they find flaws in everything else that TestU01 BigCrush does that
I've tested so far (though in some cases it takes up to twice as
long), and my stuff finds flaws in quite a bit of stuff that TestU01
misses, including some that my stuff finds quickly.

4. The ability to pass random bits directly to the tests without the
overhead or limitations of writing them to disk first. You say that's
supported by RaBiGeTe, but I find no mention of it in the
documentation accompanying the latest release and I didn't see it
mentioned on the website. There is source code there, so I considered
adapting it to patch my RNG output straight in to your tests, but
there were problems with that:
4 A. I did not see documentation on the interfaces for calling your
tests or test suites directly. Maybe I should have looked harder?
4 B. When I glanced at your code, I came away with the impression
that it was GPLed. Software that must be linked with to be useful
does not mix well with viral licensing. On further look now, the
license picture appears not so bad.
4 C. If you're intending other people to link their code with you,
perhaps you should be compiling to a library instead of or in addition
to an executable? That's what TestU01 does anyway, and what I'm doing
in the package I'm cleaning up for release (though not what I do in
the package I'm using for research).

5. A standard output that is easy to interpret even if you've never
seen it before. TestU01 does really well in that respect. RaBiGeTe
does okay, not great. My own stuff... not so well, though not nearly
as bad as some I've seen. Hopefully I'll get that improved soon.

orz

unread,

Aug 13, 2010, 2:47:10 PM8/13/10

For my own package, I consider my strengths relative to TestU01 to be:

1. Most of the RNGs incorporated are actually intended for real world
practical use, not research. Meaning both a nicer interface for
mundane software unrelated to RNG research, and a set of RNG
algorithms that seem more appropriate for real world usage - some are
significantly faster than any published RNG that I can find of
comarable bias level and significantly lower bias than any published
RNG I can find of comparable speed. In other words, it works (or at
least is intended to work) as a random number generation library for
normal RNG users, not just for researchers.

2. Original tests. TestU01 seems to implement pretty much every test
that has ever appeared in prominent literature, which is pretty nice,
and use smart parameterizations of them in its main test batteries,
which is even nicer. But my test suite mostly focuses on original
tests, particularly ones that in my limited testing can distinguish a
wider variety of RNGs than commonly used tests, especially RNGs that
do well on other tests.

3. A clearer focus on testing binary data rather than floating point
data. This is both a strength and a weakness, but I consider TestU01s
decision to completely ignore the bottom bit of output in its best
supported test batteries to be just bizarre (TestU01 Rabbit tests all
the bits, but Rabbit crashes for me if I give it a long sequence of
data - I think some of the other batteries test all bits as well, but
their descriptions left me with the impression that I should mainly
stick to SmallCrush/Crush/BigCrush).

Dann Corbit

unread,

Aug 13, 2010, 4:06:59 PM8/13/10

In article <1206e4fe-074b-48bf-acbf-a44b30172cc6
@x24g2000pro.googlegroups.com>, cdh...@gmail.com says...

NIST statistical test:
http://csrc.nist.gov/groups/ST/toolkit/rng/documentation_software.html

Cristiano

unread,

Aug 15, 2010, 11:47:37 AM8/15/10

orz wrote:
> PractRand, by myself, not yet released but I intend to release this
> weekend.

Well, I'll try it with my new huge state generator. :-)

Cristiano

unread,

Aug 15, 2010, 12:30:49 PM8/15/10

orz wrote:
> On Aug 13, 8:20 am, "Cristiano" <cristiaN...@gmail.com> wrote:
>> orz wrote:
>>> I have not been very impressed with Diehard or the NIST stuff or
>>> RaBiGeTe or ENT.
>>
>> As I told you, I got little feedback for RaBiGeTe. What should I do
>> to get RaBiGeTe more "impressive"?
>>
>> Cristiano
>
> Well... things I'd like to see:
> 1. A clearly defined standard set of tests. Or several with clear
> and simple definitions of what they're specialized for. ie a set
> optimized for finding bias efficiently on a per-time basis and another
> for finding bias efficiently on a per-bit basis.

I think that your point is partly fulfilled by the new version of RaBiGeTe
(still in beta).
Look at the bottom left of this image:
http://www.webalice.it/cristiano.pi/rabigeteV3/1_Params.tif

there is the combo box "Presets".
I created 5 presets which include 1 preset for a set of good and fast tests
and 1 preset for a set of effective but slow tests (the user can save as
many presets as he like).

> 2. A list of how RaBiGeTes standard batteries of tests perform on a
> long list of different RNGs. And preferably a comparison to how those
> RNGs perform on other batteries of tests.

It's hard to do that with RaBiGeTe, beacuse it can run with many
combinations of tests and each test can be configured in many ways (look at
the above image).

> 3. Hopefully the stuff documented in #2 above shows you in a good light.

I fear I'll stay in a bad light. :-)

> TestU01, for instance, finds flaws in a variety of RNGs,
> including some that are considered good quality by some folks like

> MT19937, one version of Marsaglia's KISS, etc. [...]

The only way for RaBiGeTe to find flaws in MT19937 is to run an high order
rank test, but the time and the memory needed would be much too high.

Please, would you post the code (or a link) of KISS which fails TestU01?

> 4. The ability to pass random bits directly to the tests without the
> overhead or limitations of writing them to disk first. You say that's
> supported by RaBiGeTe, but I find no mention of it in the
> documentation accompanying the latest release and I didn't see it
> mentioned on the website. There is source code there, so I considered
> adapting it to patch my RNG output straight in to your tests, but
> there were problems with that:
> 4 A. I did not see documentation on the interfaces for calling your
> tests or test suites directly. Maybe I should have looked harder?

I didn't write anything about that because there are some examples in the
module RBG.cpp with the function "void RBG(void)".
Anyway, you're right: not easy to find. :-)

> 4 B. When I glanced at your code, I came away with the impression
> that it was GPLed. Software that must be linked with to be useful
> does not mix well with viral licensing. On further look now, the
> license picture appears not so bad.

In the very first version, I didn't write any license, but a good guy told
me to write something and he gave me that example.
To be honest, I don't know exactly what that license means. :-)

> 4 C. If you're intending other people to link their code with you,
> perhaps you should be compiling to a library instead of or in addition
> to an executable?

Very good idea!
If I publish the new version of RaBiGeTe, I think to compile only to a
library.

> 5. A standard output that is easy to interpret even if you've never
> seen it before. TestU01 does really well in that respect. RaBiGeTe
> does okay, not great. My own stuff... not so well, though not nearly
> as bad as some I've seen. Hopefully I'll get that improved soon.

I improved also this point with the new version of RaBiGeTe.
The program now shows a table of p-values while the tests run and when the
testing phase ends:
http://www.webalice.it/cristiano.pi/rabigeteV3/2_Table.tif

Even if the table is easy to read, I don't think that it's easy to interpret
without an explanation, but what I need when I test a generator is some
usefull informations to see whether the generator seems good or bad without
wasting my time waiting the end of the test.

Talking about your point #2, notice how my AMLS test is the only test in the
suite which finds a weakness in the generator under test (it's a generator
which I'm still developing).

When the testing phase ends, we have the above table (the most important
report, I think), the Pearson's chi-squared report:
http://www.webalice.it/cristiano.pi/rabigeteV3/3_Pearson.tif

and a more usefull graph (I'll fix the p_KS and p_AD values for .99):
http://www.webalice.it/cristiano.pi/rabigeteV3/4_Graph.tif

There is also a page for messages (warnings and errors):
http://www.webalice.it/cristiano.pi/rabigeteV3/5_Msg.tif

Notice the good usage of the 2 cores of my E6400 (Running time: 1:40.76,
time taken by the tests: 3:14.91, around 97% of CPU usage). I guess that the
program scales very well on any multi-core CPU, but I only have dual-core
CPUs.

Cristiano

orz

unread,

Aug 16, 2010, 4:56:48 PM8/16/10

On Aug 15, 9:30 am, "Cristiano" <cristiaN...@gmail.com> wrote:
> orz wrote:
> > Well... things I'd like to see:
> > 1. A clearly defined standard set of tests. Or several with clear
> > and simple definitions of what they're specialized for. ie a set
> > optimized for finding bias efficiently on a per-time basis and another
> > for finding bias efficiently on a per-bit basis.
>
> I think that your point is partly fulfilled by the new version of RaBiGeTe
> (still in beta).
> Look at the bottom left of this image:http://www.webalice.it/cristiano.pi/rabigeteV3/1_Params.tif
>
> there is the combo box "Presets".
> I created 5 presets which include 1 preset for a set of good and fast tests
> and 1 preset for a set of effective but slow tests (the user can save as
> many presets as he like).
>
> > 2. A list of how RaBiGeTes standard batteries of tests perform on a
> > long list of different RNGs. And preferably a comparison to how those
> > RNGs perform on other batteries of tests.
>
> It's hard to do that with RaBiGeTe, beacuse it can run with many
> combinations of tests and each test can be configured in many ways (look at
> the above image).

TestU01 can run with many combinations of tests and each test can be
configured in many ways. To a lesser extent the same is true of
PractRand. Neither have any trouble producing such lists except the
amount of CPU time needed and figuring out which RNGs should be on the
list. Similarly it shouldn't be hard to do on your presets.

Take a glance at pages 28 and 29 of http://www.iro.umontreal.ca/~lecuyer/myftp/papers/testu01.pdf
That is the kind of thing I'd really expect to see with a battery of
tests - a reasonable amount of empirical data about just how good
those tests are, organized in a way that's not too hard to
understand. Unfortunately it takes quite a bit of CPU time to put
together that kind of table for the higher quality PRNGs, so my own
comparisons are still a bit incomplete.

> > 3. Hopefully the stuff documented in #2 above shows you in a good light.
>
> I fear I'll stay in a bad light. :-)

Well, at the very least, few other test suites out there have GUIs,
the only one I can recall of is the NIST stuff. Hopefully you'll have
some other advantages as well, or at least lack some disadvantages.
PractRand certainly has enough disadvantages, and I've found several
in TestU01 in my limited use of TestU01:
Problems with PractRand:
1. No actually release yet. There should be a preliminary release
really really soon, but likely a lot of important things in will be
inferior or just missing (ie documentation, sample programs, etc) in
the preliminary release. Probably each of those should be its own
separate problem, but then the list would go on forever. And I'm sure
there will be plenty of bugs in the initial release.
2. Few tests, and not very good bias detection on a per-bit basis.
PractRand normally makes up for that by testing a lot of bits quickly
with tests that detect a broad variety of biases (the basic set of
PractRand tests can manage a few gigabytes per minute on my
computer). However, if the RNG being tested is very slow, it may be
impractical to do that. Or if the thing being tested is not really an
RNG then there may only be a finite number of bytes available to
test.
3. No equations to calculate (either exactly or approximately) the
distributions on most of the tests. PractRand normally makes up for
this with large empirically determined distribution tables and a few
educated guesses, but those have limited resolution for determining
proper p-values.

Things that annoy me in TestU01 so far:
1. The interface does not allow client programs to call an entire
test suite in a multithreaded fashion. I'm not sure how much work it
would be to change that.
2. If you want to call a preset battery of tests on an RNG, either
each test in the battery must generate its own set of random numbers
or all of the numbers to be tested have to be written in to memory or
on to disk in advance. Really obnoxious for longer tests on slower
RNGs.
3. The "SmallCrush", "Crush", and "BigCrush" test batteries
documentation claims they do not use the bottom bit out of each 32
bits, and only occasionally use the 2nd lowest bit. That seems
bizarre and counter-productive to me.
4. The "Rabbit" battery of tests crashes for me if I try to test too
long a sequence.
5. It has a tendency to print out annoying messages when an RNG fails
some tests particularly badly (if, say, the RNG fails some test in
some astronomical way that violates some of the assumptions that
TestU01 makes to figure out the distribution for that test).
6. I've found bugs in one of the RNGs included... and I haven't tried
to look for bugs, so there could be plenty more where that came
from.

> > TestU01, for instance, finds flaws in a variety of RNGs,
> > including some that are considered good quality by some folks like
> > MT19937, one version of Marsaglia's KISS, etc. [...]
>
> The only way for RaBiGeTe to find flaws in MT19937 is to run an high order
> rank test, but the time and the memory needed would be much too high.
>
> Please, would you post the code (or a link) of KISS which fails TestU01?

TestU01 lists it as KISS93, I believe meaning a combination RNG
published by Marsaglia in 1993. I suspect TestU01 named it that, as
Marsaglia seems to like leaving them unnamed (asside from calling them
all KISS) or naming them after arbitrary large constants in the
implementation. I use the TestU01 implementation of it (I have some
wrappers that convert TestU01 format RNGs to my format RNGs and vice
versa). The code is in TestU01's umarsa.c, and is this:
typedef struct {
unsigned int S1, S2, S3, S4, carry;
} KISS_state;
static unsigned long KISS93_Bits (void *junk, void *vsta)
{
KISS_state *state = vsta;
unsigned int b;

state->S1 = 69069 * state->S1 + 23606797;
b = state->S2 ^ (state->S2 << 17);
state->S2 = (b >> 15) ^ b;
b = ((state->S3 << 18) ^ state->S3) & MASK31;
state->S3 = (b >> 13) ^ b;
return state->S1 + state->S2 + state->S3;
}
unif01_Gen *umarsa_CreateKISS93 (unsigned int s1, unsigned int s2,
unsigned int s3)
{
unif01_Gen *gen;
KISS_state *state;
size_t leng;
char name[LEN + 1];

if (s3 > 2147483647)
util_Error ("umarsa_CreateKISS93: s3 >= 2^31");

gen = util_Malloc (sizeof (unif01_Gen));
state = util_Malloc (sizeof (KISS_state));

strcpy (name, "umarsa_CreateKISS93:");
addstr_Uint (name, " x0 = ", s1);
addstr_Uint (name, ", y0 = ", s2);
addstr_Uint (name, ", z0 = ", s3);
leng = strlen (name);
gen->name = util_Calloc (leng + 1, sizeof (char));
strncpy (gen->name, name, leng);

state->S1 = s1;
state->S2 = s2;
state->S3 = s3;

gen->GetBits = &KISS93_Bits;
gen->GetU01 = &KISS93_U01;
gen->Write = &WrKISS93;
gen->param = NULL;
gen->state = state;
return gen;
}

That RNG passes all tests in SmallCrush, fails a single test in Crush,
and fails a single test in BigCrush. It's a relatively fast RNG. If
I recall correctly it fails my tests around 2 TB (which IIRC took a
lot longer than Crush, and significantly longer than BigCrush), mainly
due to the lowest bit being of poorer quality than the rest.

If you're just looking for RNGs to test, consider Sapparot. Sapparot
is a small simple RNG. It's of low enough quality that a number of
tests can break it very quickly. It fails the PractRand standard test
set in a few seconds, and fails TestU01 Rabbit after about 1 minute.
On the other hand it is of high enough quality that some entire test
batteries seem to fail or take a long time against it. It passes
TestU01 SmallCrush and Crush. I have not run BigCrush on it yet.
Sapparot: http://www.literatecode.com/2004/10/18/sapparot/

> > 4. The ability to pass random bits directly to the tests without the
> > overhead or limitations of writing them to disk first. You say that's
> > supported by RaBiGeTe, but I find no mention of it in the
> > documentation accompanying the latest release and I didn't see it
> > mentioned on the website. There is source code there, so I considered
> > adapting it to patch my RNG output straight in to your tests, but
> > there were problems with that:
> > 4 A. I did not see documentation on the interfaces for calling your
> > tests or test suites directly. Maybe I should have looked harder?
>
> I didn't write anything about that because there are some examples in the
> module RBG.cpp with the function "void RBG(void)".
> Anyway, you're right: not easy to find. :-)

I'll take a look. Sometime soon anyway, I'm currently trying to
figure out what will and won't being in the initial release of
PractRand and make it happen.

> > 5. A standard output that is easy to interpret even if you've never
> > seen it before. TestU01 does really well in that respect. RaBiGeTe
> > does okay, not great. My own stuff... not so well, though not nearly
> > as bad as some I've seen. Hopefully I'll get that improved soon.
>
> I improved also this point with the new version of RaBiGeTe.
> The program now shows a table of p-values while the tests run and when the
> testing phase ends:http://www.webalice.it/cristiano.pi/rabigeteV3/2_Table.tif
>
> Even if the table is easy to read, I don't think that it's easy to interpret
> without an explanation, but what I need when I test a generator is some
> usefull informations to see whether the generator seems good or bad without
> wasting my time waiting the end of the test.
>
> Talking about your point #2, notice how my AMLS test is the only test in the
> suite which finds a weakness in the generator under test (it's a generator
> which I'm still developing).
>
> When the testing phase ends, we have the above table (the most important
> report, I think), the Pearson's chi-squared report:http://www.webalice.it/cristiano.pi/rabigeteV3/3_Pearson.tif
>
> and a more usefull graph (I'll fix the p_KS and p_AD values for .99):http://www.webalice.it/cristiano.pi/rabigeteV3/4_Graph.tif
>
> There is also a page for messages (warnings and errors):http://www.webalice.it/cristiano.pi/rabigeteV3/5_Msg.tif
>
> Notice the good usage of the 2 cores of my E6400 (Running time: 1:40.76,
> time taken by the tests: 3:14.91, around 97% of CPU usage). I guess that the
> program scales very well on any multi-core CPU, but I only have dual-core
> CPUs.
>
> Cristiano

It seems pretty clear that hardware multi-threading (ie multicore and/
or hyperthreading) is here, here in a big way, and here to stay.
PractRand is going to have a slightly harder time with that than other
testing packages because the smaller number of tests typically enabled
mean fewer things that can easily be done at once.

orz

unread,

Aug 18, 2010, 10:02:31 AM8/18/10

And it's finally up. Still needs more work of course, and I'm sure I
added plenty of bugs when cleaning it up for release and it probably
already had a fair number of bugs. But it's there:
https://sourceforge.net/projects/pracrand/

Cristiano

unread,

Aug 18, 2010, 4:52:18 PM8/18/10

It's easy, but there are two problems: the time needed and the little
significance of that list, because I can test a generator using a
configuration which finds nothing, but changing the configuration a little,
maybe the test finds a weakness.

> Problems with PractRand:

> 2. Few tests, and not very good bias detection on a per-bit basis.
> PractRand normally makes up for that by testing a lot of bits quickly
> with tests that detect a broad variety of biases (the basic set of
> PractRand tests can manage a few gigabytes per minute on my
> computer).

It could be useful. Some tests in my suite (which works mostly at bit level)
is much slower.

> 3. No equations to calculate (either exactly or approximately) the
> distributions on most of the tests. PractRand normally makes up for
> this with large empirically determined distribution tables and a few
> educated guesses, but those have limited resolution for determining
> proper p-values.

It could be a big problem. You should try, at least, to find an empirical
equation; using only tables, the p-values can be too binned.

>>> TestU01, for instance, finds flaws in a variety of RNGs,
>>> including some that are considered good quality by some folks like
>>> MT19937, one version of Marsaglia's KISS, etc. [...]
>>
>> The only way for RaBiGeTe to find flaws in MT19937 is to run an high
>> order
>> rank test, but the time and the memory needed would be much too high.
>>
>> Please, would you post the code (or a link) of KISS which fails
>> TestU01?
>
> TestU01 lists it as KISS93, I believe meaning a combination RNG
> published by Marsaglia in 1993. I suspect TestU01 named it that, as
> Marsaglia seems to like leaving them unnamed (asside from calling them
> all KISS) or naming them after arbitrary large constants in the
> implementation. I use the TestU01 implementation of it (I have some
> wrappers that convert TestU01 format RNGs to my format RNGs and vice
> versa). The code is in TestU01's umarsa.c, and is this:

[...]

>
> That RNG passes all tests in SmallCrush, fails a single test in Crush,
> and fails a single test in BigCrush. It's a relatively fast RNG. If
> I recall correctly it fails my tests around 2 TB (which IIRC took a
> lot longer than Crush, and significantly longer than BigCrush), mainly
> due to the lowest bit being of poorer quality than the rest.

That generator miserably fails the test I called windowed autocorrelation.
RaBiGeTe takes less than 2 seconds to show the systematic failure, after 17
3-Mbit sequences have been tested:
http://www.webalice.it/cristiano.pi/rabigeteV3/KISS93_failure.tif

> If you're just looking for RNGs to test, consider Sapparot. Sapparot
> is a small simple RNG. It's of low enough quality that a number of
> tests can break it very quickly. It fails the PractRand standard test
> set in a few seconds, and fails TestU01 Rabbit after about 1 minute.
> On the other hand it is of high enough quality that some entire test
> batteries seem to fail or take a long time against it. It passes
> TestU01 SmallCrush and Crush. I have not run BigCrush on it yet.
> Sapparot: http://www.literatecode.com/2004/10/18/sapparot/

RaBiGeTe finds nothing (I tested 1280 Mbits excluding some slow tests).
The transition function of the generator seems good to me, its only problem
is the small state. I can guess that you tested many numbers compared to the
small period; how many numbers did you test?

Cristiano

orz

unread,

Aug 19, 2010, 1:02:31 AM8/19/10

On PractRand (currently pracrand instead of PractRand on
sourceforge.net ... hopefully I can get that changed before too
terribly long):
The sample program tests a dummy RNG built in to the sample program.
The dummy RNG is one I made up off the top of my head; it is very
fast, passes TestU01 SmallCrush, fails a single subtest in Crush
unambiguously, and several subtests in BigCrush.
The sample program keeps testing the dummy RNG indefinitely (actually
until 512 terabytes are tested IIRC). It's intended to be aborted
after a little bit by pressing control-break. It does not do any I/O
or take command line options or anything. I may add that kind of
thing in a future version of the sample program.
You can replace the dummy RNG at the top of the sample program with
any other RNG of your choice. If your RNG produces a number of bits
per call other than 32 then you'll need to change the portions of it
that include "32" (which is the return type (Uint32), the method name
(raw32), and the base class (vRNG32)) to include the new number of
bits, which must be 8, 16, 32, or 64.
Note that the test selection there cheats a tiny tiny bit - it
examines the metadata for the RNG to figure out what number of bits
the RNG produces at a time, and then tweaks the test set specifically
to focus portions of the output that RNGs that produce that many bits
at a time tend to screw up.
The current set of tests in PractRand is just the core 3 and the
transforms on that. Those are the tests that I normally run with. In
the next version I'll move more of my tests in, but the sample program
won't use the rest because those 3 are the only ones that are
recommended for normal RNG testing, at least on fast RNGs.
PractRand 0.80: http://sourceforge.net/projects/pracrand/

On Aug 18, 1:52 pm, "Cristiano" <cristiaN...@gmail.com> wrote:
> > TestU01 can run with many combinations of tests and each test can be
> > configured in many ways. To a lesser extent the same is true of
> > PractRand. Neither have any trouble producing such lists except the
> > amount of CPU time needed and figuring out which RNGs should be on the
> > list. Similarly it shouldn't be hard to do on your presets.
>
> It's easy, but there are two problems: the time needed and the little
> significance of that list, because I can test a generator using a
> configuration which finds nothing, but changing the configuration a little,
> maybe the test finds a weakness.

If you can't figure out what configuration is best on an RNG, then
your users certainly won't be able to either. My approach is more or
less:
1. Pick a set of criteria for when you might use various kinds of
tests. Like, you might use slow tests on slow RNGs, because you'll be
waiting for the RNG output for so long you might as well spend more
cycles testing them. Or you might use some tests only on PRNGs with
more than a kilobyte of state.
2. Review your criteria. Try to bring it down to a small simple set
of criteria that even the clueless can apply. Users are likely to be
clueless relative to yourself because they'll always have less
experience with your code and its uses, and because of Murphy's Law.
3. For each category of RNGs distinguished by your criterias, put
together a list of candidate tests.
4. For each category of RNGs distinguished by your criterias, put
together a list of RNGs of various quality.
5. For each category of RNGs distinguished by your criterias, try the
list of RNGs on the list of candidate tests out to various lengths.
Put together a chart that shows how long each candidate test took on
each RNG. This may take a while... one solution is to fill up your
lists of RNGs with ringers, deliberately flawed ultra-low quality RNGs
of various types, but this biases the results unfortunately.
6. Rate each test according to the number of RNGs they find bias in
that no other test finds bias in or no other test finds bias in nearly
as quickly, and the number of RNGs they find bias in total, and things
like that.
7. Look through the higher rated tests for minimal subsets capable of
finding bias in as many things as possible as quickly as possible.
Put those tests in as your presets.
8. Put together simplified versions of your charts from step 5 that
show how your presets compare to other test suites presets, like
PractRand and TestU01.

Here's a sample chart produced during this process... I'm pretty sure
that the formatting on it will get horribly mangled in posting it here
though, as it was never intended for transmission:
1234 56 7 89ABC DEF
* * * *
slcg32_32a C6A5 3B 2 12466 51D
lcg32a DAEC 47 8 AEDDD 7AA
clcg64_16 EB-C 68 9 ADDDE 4BC
bin48a16 DDDB DD D 479CC CCD
jlcg64 FFFC FF E DFFDE FFF
cmwc CBAE 95 C 88CFF FC-
ibaa8_2 CCCC CC C BCDDD CCD
binbjx16a ---C -- E 68BDE ---
sclcg64_32 ---- -- - H---- ?--
jsf16 ---- -- - ----- ---
mt19937_unh. ---- -- F ----- ---
slcg64_32a ---- -- - ----- ---
slcg64_16a ---- -- - ----- ---
tlcg64 ---- -- - ----- ---
sapparot -H-F -- H 79EHH -D-
lcg64a16 ---- -- - ----- ---
lcg64a ---- -- - ----- ---
ibaa8_4 ---- -- - H---- ---
clcg96 ---- -- - ----- ---
mm FDG- G- E ----- ---
mm2 ---- -- - ----- ---
fran EEI- FF I FI--- IG-
mwlc ---E -- A --DFF 0--
mwlac ---- -- - ----- ---
tmp16a ---- -- I 8CF-- ---
tmp16b ---- -- - EI--- ---

The vertical axi is different RNGs, the horizontal axi is different
tests. At the intersection of each tests column with each RNGs row
the number is the log based 2 of the number of megabytes of RNG output
needed to find bias. If that number was too high to fit in a single
digit then I replaced it with a letter according to 10=A,11=B,12=C,
13=D,14=E,15=F,16=G,17=H,18=I. A '-' means that that RNG did not fail
during my testing, though I stopped testing early on binbjx16a IIRC,
the ones below that were mostly all tested to 256 GB or so, the ones
above that mostly failed all tests quickly so didn't need more
testing. ? indicates a highly inconsistent result.
The tests marked with asterisks there, 4, 5, 7, and 8, are the 4 I was
considering at the time for inclusion in my standard test set, but I
eventually settled on 6, 7, and 8. 5 was initially prefered over 6 as
it was a bit faster and it found bias in several RNGs using many fewer
MB than 6 did, but after further analysis 6 was chosen over 5 because
it was theoretically more stringent and it was found that anything 5
caught 6 would eventually catch, and the reverse was not true. And 6
wasn't that much slower than 5. 4 was dropped despite being very fast
and pretty good there because it tended to fare poorly against higher
quality RNGs and because anything it caught would also be caught by at
least one of 6, 7, or 8. 6 corresponds to Gap-16, 7 to BCFN-13/4, and
8 to DC6:9x1Byte:1. Different parameterizations of the same test code
are grouped next to each other, otherwise tests are separated by
tabs.
That set of tests did not include the standard data transforms that do
things like separate the lowest bit or byte out of each output to form
a 2nd data stream and test the new stream. With standard transforms
added I think all of those RNGs fail tests except for jsf16 and
mwlac32. Test columns D, E, and F did use some transforms
internally.
Those RNG names followed earlier naming schemes and do not exactly
correspond with names in PractRand.

> > Problems with PractRand:
> > 2. Few tests, and not very good bias detection on a per-bit basis.
> > PractRand normally makes up for that by testing a lot of bits quickly
> > with tests that detect a broad variety of biases (the basic set of
> > PractRand tests can manage a few gigabytes per minute on my
> > computer).
>
> It could be useful. Some tests in my suite (which works mostly at bit level)
> is much slower.

I have only a single bit-level test (a sort of bastardized overlapping
frequency test applied to overlapping sequences of 28 bits), and it's
not included in the PractRand 0.80 release, hopefully it will make it
in to the next release. I considered using it in the standard set
with a transform that applied it to only the lowest bit of each byte,
but I found that even with that transform it was far too slow and
failed to find bias in a lot of RNGs that I knew had problems in their
lowest bit of output. I think that was the column labelled "D" in the
chart earlier.

> > 3. No equations to calculate (either exactly or approximately) the
> > distributions on most of the tests. PractRand normally makes up for
> > this with large empirically determined distribution tables and a few
> > educated guesses, but those have limited resolution for determining
> > proper p-values.
>
> It could be a big problem. You should try, at least, to find an empirical
> equation; using only tables, the p-values can be too binned.

It is a big problem, but I wouldn't really trust a formulaic version
either. A very big problem considering that even the approximated p-
values printed by the sample program are only available when the
standard parameterizations are used.
All of the tests I use these days have a very strong tendency to
return similar distributions to each other... on all of them a 15 or
above or -15 or below is a failure and an 8 or -8 is at the very least
suspicious. But somehow I don't think users would appreciate being
told to just assume that about any tests they were using.
I'm probably going to add some interpolation between the thresholds in
the current tables, and try to get more precision in the tables as
well. Beyond that, I don't think there's all that much I can do,
unless I can figure out the math proper for this stuff. L'Ecuyer
talked about math that was better than chi-squared tests for most of
the kinds of things I used and produced more exact p-values than the
primitive correction schemes on chi-squared tests that some academics
have used on tests similar to my own, but I haven't found anything in
his code that looked like that, let alone managed to understand it.

> >>> TestU01, for instance, finds flaws in a variety of RNGs,
> >>> including some that are considered good quality by some folks like
> >>> MT19937, one version of Marsaglia's KISS, etc. [...]
>
> >> The only way for RaBiGeTe to find flaws in MT19937 is to run an high
> >> order
> >> rank test, but the time and the memory needed would be much too high.
>
> >> Please, would you post the code (or a link) of KISS which fails
> >> TestU01?
>
> > TestU01 lists it as KISS93, I believe meaning a combination RNG
> > published by Marsaglia in 1993. I suspect TestU01 named it that, as
> > Marsaglia seems to like leaving them unnamed (asside from calling them
> > all KISS) or naming them after arbitrary large constants in the
> > implementation. I use the TestU01 implementation of it (I have some
> > wrappers that convert TestU01 format RNGs to my format RNGs and vice
> > versa). The code is in TestU01's umarsa.c, and is this:
> [...]
>
> > That RNG passes all tests in SmallCrush, fails a single test in Crush,
> > and fails a single test in BigCrush. It's a relatively fast RNG. If
> > I recall correctly it fails my tests around 2 TB (which IIRC took a
> > lot longer than Crush, and significantly longer than BigCrush), mainly
> > due to the lowest bit being of poorer quality than the rest.
>
> That generator miserably fails the test I called windowed autocorrelation.
> RaBiGeTe takes less than 2 seconds to show the systematic failure, after 17
> 3-Mbit sequences have been tested:http://www.webalice.it/cristiano.pi/rabigeteV3/KISS93_failure.tif

Interesting. And pretty good, and conceivably not just for that KISS
variant... most of the later KISS variants have 3 components, where 2
of them are pretty similar to the original KISS93. If you have a
class of test that find bias in that pair quickly, it might be
possibly to find bias in the whole thing as well for later versions
too.

> > If you're just looking for RNGs to test, consider Sapparot. Sapparot
> > is a small simple RNG. It's of low enough quality that a number of
> > tests can break it very quickly. It fails the PractRand standard test
> > set in a few seconds, and fails TestU01 Rabbit after about 1 minute.
> > On the other hand it is of high enough quality that some entire test
> > batteries seem to fail or take a long time against it. It passes
> > TestU01 SmallCrush and Crush. I have not run BigCrush on it yet.
> > Sapparot:http://www.literatecode.com/2004/10/18/sapparot/
>
> RaBiGeTe finds nothing (I tested 1280 Mbits excluding some slow tests).
> The transition function of the generator seems good to me, its only problem
> is the small state. I can guess that you tested many numbers compared to the
> small period; how many numbers did you test?

I spent the CPU time to test Sapparot on the full BigCrush, it failed
1 subtest and got suspicious results on 3 more.
PractRand finds bias in Sapparot after about 128 MB, sometimes 64 MB.
If I adjust the transforms a bit I can get it to fail quicker, but I
consider that to be more or less cheating.

I would say that the transition function of that RNG is not good.
There are other similar RNGs that pass all tests I've tried easily,
even with the same size of state. In PractRand 0.80, if you take a
look at src/RNGs/jsf.cpp, you'll find the code for jsf16 which looks
roughly like this:
Uint16 a, b, c, d;
Uint16 raw16() {
Uint16 e = a - ((b << 13) | (b >> 3));
a = b ^ ((c << 8) | (c >> 8));
b = c + d;
c = d + e;
d = e + a;
return b;
}
That passes all tests I've tried. And that's the reduced strength
version - the original version operates on 32 bit integers, so it's a
bit larger in state, but the 32 bit version is very very fast for its
statistical performance. The original version was written by Robert
Jenkins, the same guy who wrote ISAAC. IIRC he posted it on usenet a
year or three ago without a name. I named if JSF for Jenkins Small
Fast RNG. Even the 8 bit variant of that does okay, though its period
is too short to do very well and it fails a few tests shortly before
its period expires.
Another small fast RNG that does well on statistical tests is my own
mwlac, found in src/RNGs/mwlac.cpp in PractRand 0.80. PractRand 0.80
does not include the 16 bit variant, but the 16 bit variant looks like
this:
Uint16 a, b, c, d;
Uint16 raw16() {
Uint16 oa = a;
a = (b * 0x9785) ^ (a >> 7);
b = c + (oa >> 2);
c = d;
d += ~oa;
return c;
}
The 32 bit variant of that is very fast, and passes all statistical
tests I've tried. I consider it inferior to jsf because the 16 bit
variant fails statistical tests eventually and because it uses
multiplication, which is fast on high end CPUs but slow on some cheap
embedded CPUs. According to my notes the 16 bit variant fails default
PractRand tests after 128 gigabytes (well under an hour), but passes
SmallCrush / Crush / BigCrush.

Cristiano

unread,

Aug 19, 2010, 5:14:56 AM8/19/10

orz wrote:
> https://sourceforge.net/projects/pracrand/

I downloaded the code. Just few things to say.

I see that there is the code to write in the registry (which now is
disabled). Please, don't write anything in the registry. You can simply use
a local file.

You included PractRand.ncb, but it's not needed.

PractRand.lib e PractRand_Full.lib are really big (you also wrote that in
your "to_do.txt" file). It's worth trying to play with the compiler setting
to reduce the size.

You use:
Uint32 t;
__asm {
RDTSC
mov [t], eax
}
rng->add_entropy32(t);

You can just #include <intrin.h> and write rng->add_entropy32(__rdtsc());
but be carefull; when I read the TSC in my E6400, the 3 LSB are *always*
000. Check that out.

I ran your program and it's very fast compared to RaBiGeTe.
I'll try to use it to test some PRNGs which I wrote.

Cristiano

unread,

Aug 20, 2010, 5:01:32 AM8/20/10

orz wrote:
> On Aug 18, 1:52 pm, "Cristiano" <cristiaN...@gmail.com> wrote:
>>> TestU01 can run with many combinations of tests and each test can be
>>> configured in many ways. To a lesser extent the same is true of
>>> PractRand. Neither have any trouble producing such lists except the
>>> amount of CPU time needed and figuring out which RNGs should be on
>>> the list. Similarly it shouldn't be hard to do on your presets.
>>
>> It's easy, but there are two problems: the time needed and the little
>> significance of that list, because I can test a generator using a
>> configuration which finds nothing, but changing the configuration a
>> little, maybe the test finds a weakness.
>
> If you can't figure out what configuration is best on an RNG, then
> your users certainly won't be able to either.

Sure, nobody knows the best configuration.
One can only guess a reasonable configuration based on the RNG.

We know that some kinds of PRNG fail sistematically some kinds of test (GFSR
fail the rank test, LFSR fail the linear complexity test, LCG fail almost
all the test, and so on).
If there is no known weakness for the generator under test, then you need to
start with all the tests. The tables don't help.

> Here's a sample chart produced during this process... [...]

Well, now you have your table and you need to test a generator that you just
wrote.
Do you read the table to see the test to use? I don't think.

When I test a generator, I start with the first preset, no matter what
generator is. That preset will find almost any kind of correlation and any
significative "deviation" in the bit distribution (about the same number of
bit 0 and 1, 00, 01, 11, and so on).
If the first preset finds nothing, then I start to add some tests, changing
the configuration of the already used test. Again, no table needed.

>>> 3. No equations to calculate (either exactly or approximately) the
>>> distributions on most of the tests. PractRand normally makes up for
>>> this with large empirically determined distribution tables and a few
>>> educated guesses, but those have limited resolution for determining
>>> proper p-values.
>>
>> It could be a big problem. You should try, at least, to find an
>> empirical
>> equation; using only tables, the p-values can be too binned.
>
> It is a big problem, but I wouldn't really trust a formulaic version
> either. A very big problem considering that even the approximated p-
> values printed by the sample program are only available when the
> standard parameterizations are used.

You're right, it's a very big problem. I strongly encourage to find a better
way.

>>> If you're just looking for RNGs to test, consider Sapparot. Sapparot
>>> is a small simple RNG. It's of low enough quality that a number of
>>> tests can break it very quickly. It fails the PractRand standard
>>> test set in a few seconds, and fails TestU01 Rabbit after about 1
>>> minute. On the other hand it is of high enough quality that some
>>> entire test batteries seem to fail or take a long time against it.
>>> It passes TestU01 SmallCrush and Crush. I have not run BigCrush on
>>> it yet. Sapparot:http://www.literatecode.com/2004/10/18/sapparot/
>>
>> RaBiGeTe finds nothing (I tested 1280 Mbits excluding some slow
>> tests).
>> The transition function of the generator seems good to me, its only
>> problem
>> is the small state. I can guess that you tested many numbers
>> compared to the
>> small period; how many numbers did you test?
>
> I spent the CPU time to test Sapparot on the full BigCrush, it failed
> 1 subtest and got suspicious results on 3 more.
> PractRand finds bias in Sapparot after about 128 MB, sometimes 64 MB.

May be I should try other tests in my suite or I need to change some
parameters.
Or may be that RaBiGeTe doesn't have a test for that generator. :-)

Cristiano

orz

unread,

Aug 20, 2010, 11:27:02 AM8/20/10

On Aug 19, 2:14 am, "Cristiano" <cristiaN...@gmail.com> wrote:
> orz wrote:

> >https://sourceforge.net/projects/pracrand/
>
> I downloaded the code. Just few things to say.
>
> I see that there is the code to write in the registry (which now is
> disabled). Please, don't write anything in the registry. You can simply use
> a local file.

The registry code has been rendered obsolete by the code above it a
bit labeled "win32 crypto PRNG". I think. I'm not 100% sure yet
that I'm interpreting MSDN docs on that correctly, or what if anything
could cause it to fail.
For non-windows platforms I'm not yet sure what the correct way of
obtaining a decent (but not necessarily cryptographic) quality seed
is. I'd really like to read from /dev/random or /dev/urandom for
that, but I can't even figure out exactly what platforms that would be
a good idea on, let alone exactly what #ifdefs would identify those
platforms without false positives. In the past when I've tried to
write unixy code I've ended up needing very different #includes and
slightly different code for each different OS, and in some cases for
different distributions of the same OS.

> You included PractRand.ncb, but it's not needed.

Good point. Killing that should reduce my download size a bit.

> PractRand.lib e PractRand_Full.lib are really big (you also wrote that in
> your "to_do.txt" file). It's worth trying to play with the compiler setting
> to reduce the size.

Yeah. I think there was one recompile where the size went from 1-2 MB
to 6 MB, but I was focused on other stuff at the time and didn't pay
attention to it. Now I need to figure it out and fix it if
possible.

> You use:
> Uint32 t;
> __asm {
> RDTSC
> mov [t], eax
> }
> rng->add_entropy32(t);
>
> You can just #include <intrin.h> and write rng->add_entropy32(__rdtsc());
> but be carefull; when I read the TSC in my E6400, the 3 LSB are *always*
> 000. Check that out.

Might be cleaner on MSVC 2005 and later, but right next to that is
going to go code to do the same thing on gcc, so it won't make much
difference. Three less bits of resolution on the time is not good,
but it's not like there's any source for higher resolution time
available.

> I ran your program and it's very fast compared to RaBiGeTe.
> I'll try to use it to test some PRNGs which I wrote.

I've attempted to link the RaBiGeTe source code with my research
codebase (which is broadly similar to the released stuff, but bigger
and uglier). I disabled DFT to eliminate the dependency, moved all
symbols in to a namespace, and modified RBG.cpp to let my code specify
any of my RNGs for it at runtime, and adjusted a few details like the
locations it looked for some txt files in.
It's now more or less working inside my code. I've attempted to adapt
it to reset global and static variables back to initial values during
startup so that I can call it multiple times per run, but haven't
managed to succeed at that yet. I'm having trouble with longer tests
on it... if I tell it 4 Gb it complains about 0 not being valid, and
if I tell it 511 Mb then it crashes partway through. Probably I broke
something(s) when connecting it to my code and whatnot.

On Aug 20, 2:01 am, "Cristiano" <cristiaN...@gmail.com> wrote:
> orz wrote:
> > On Aug 18, 1:52 pm, "Cristiano" <cristiaN...@gmail.com> wrote:
> >>> TestU01 can run with many combinations of tests and each test can be
> >>> configured in many ways. To a lesser extent the same is true of
> >>> PractRand. Neither have any trouble producing such lists except the
> >>> amount of CPU time needed and figuring out which RNGs should be on
> >>> the list. Similarly it shouldn't be hard to do on your presets.
>
> >> It's easy, but there are two problems: the time needed and the little
> >> significance of that list, because I can test a generator using a
> >> configuration which finds nothing, but changing the configuration a
> >> little, maybe the test finds a weakness.
>
> > If you can't figure out what configuration is best on an RNG, then
> > your users certainly won't be able to either.
>
> Sure, nobody knows the best configuration.
> One can only guess a reasonable configuration based on the RNG.
>
> We know that some kinds of PRNG fail sistematically some kinds of test (GFSR
> fail the rank test, LFSR fail the linear complexity test, LCG fail almost
> all the test, and so on).
> If there is no known weakness for the generator under test, then you need to
> start with all the tests. The tables don't help.

Start with all tests? There's an infinite number of possible tests, a
large number of already implemented tests, and a practically infinite
number of distinct parameterizations of tests that have been
implemented. Depth-first searching for bias is not the way to go:
http://xkcd.com/761/

I think this is kind of important so I'll try to say this stuff again
in a different way:
While it's very difficult to define objectively, I believe that some
statistical tests are better than others. I broadly evaluate tests on
qualities like speed (how quickly they can reject a significant number
of RNGs), breadth (how wide a variety of RNGs they can reject), and
orthogonality/originality (to what degree their merits are redundant
give the broader set of tests available - their marginal contribution
to the breadth of an intelligently chosen set of tests). I suggest
systematically evaluating those kinds of metrics in the most objective
way you can when deciding which tests with which parameters should be
included in a preset.

> >>> 3. No equations to calculate (either exactly or approximately) the
> >>> distributions on most of the tests. PractRand normally makes up for
> >>> this with large empirically determined distribution tables and a few
> >>> educated guesses, but those have limited resolution for determining
> >>> proper p-values.
>
> >> It could be a big problem. You should try, at least, to find an
> >> empirical
> >> equation; using only tables, the p-values can be too binned.
>
> > It is a big problem, but I wouldn't really trust a formulaic version
> > either. A very big problem considering that even the approximated p-
> > values printed by the sample program are only available when the
> > standard parameterizations are used.
>
> You're right, it's a very big problem. I strongly encourage to find a better
> way.

There's not that much I can do. There would be an extraordinary
amount of work involved in merely making a serious attempt at working
out the math to an approximate p-value formula. I could work
backwards from the distribution to an equation, but in 2 of the 3
important tests I don't think that would work any better than the
table.

> >>> If you're just looking for RNGs to test, consider Sapparot. Sapparot
> >>> is a small simple RNG. It's of low enough quality that a number of
> >>> tests can break it very quickly. It fails the PractRand standard
> >>> test set in a few seconds, and fails TestU01 Rabbit after about 1
> >>> minute. On the other hand it is of high enough quality that some
> >>> entire test batteries seem to fail or take a long time against it.
> >>> It passes TestU01 SmallCrush and Crush. I have not run BigCrush on
> >>> it yet. Sapparot:http://www.literatecode.com/2004/10/18/sapparot/
>
> >> RaBiGeTe finds nothing (I tested 1280 Mbits excluding some slow
> >> tests).
> >> The transition function of the generator seems good to me, its only
> >> problem
> >> is the small state. I can guess that you tested many numbers
> >> compared to the
> >> small period; how many numbers did you test?
>
> > I spent the CPU time to test Sapparot on the full BigCrush, it failed
> > 1 subtest and got suspicious results on 3 more.
> > PractRand finds bias in Sapparot after about 128 MB, sometimes 64 MB.
>
> May be I should try other tests in my suite or I need to change some
> parameters.
> Or may be that RaBiGeTe doesn't have a test for that generator. :-)

I think that many RNG test suites neglect the broad category of simple
reversible multicyclic RNGs like sapparot or jsf or mwlac or the dummy
RNG in the PractRand example program.

James Waldby

unread,

Aug 20, 2010, 1:39:28 PM8/20/10

On Fri, 20 Aug 2010 08:27:02 -0700, orz wrote:
> On Aug 19, 2:14 am, "Cristiano" <cristiaN...@gmail.com> wrote:
>> orz wrote:
>> >https://sourceforge.net/projects/pracrand/

...

>> You use:
>> Uint32 t;
>> __asm {
>> RDTSC
>> mov [t], eax
>> }
>> rng->add_entropy32(t);
>>
>> You can just #include <intrin.h> and write
>> rng->add_entropy32(__rdtsc()); but be carefull; when I read
>> the TSC in my E6400, the 3 LSB are *always* 000. Check that out.
>
> Might be cleaner on MSVC 2005 and later, but right next to that is going
> to go code to do the same thing on gcc, so it won't make much
> difference. Three less bits of resolution on the time is not good, but
> it's not like there's any source for higher resolution time available.

...

Do you know what's causing that problem with the low 3 bits of TSC?
It seems to me that the most likely explanations are (1) the cycle
counter is being incremented only every 8th cycle, or (2) the cycle
count is shifted up 3 bits.

In case (2), just right shift the EDX EAX register (where RDTSC stores
the 64-bit Time Stamp Counter) by 3 bits and use the low 32 bits of that.

In case (1), deduce the low 3 bits, via something like the following:
(a) write a set S of instructions that take 8k+1 cycles to read TSC and
store it in A[j++], where A is an array and j is an index that advances
once per execution of S, and k is any small integer.
(b) Routine R inlines S 8 times.
(c) After R executes with interrupts off, A[i+1]-A[i] is the same for
all i (i=0 to 7) except for one index, say i'. Use result (i' | A[0]).

--
jiw

Cristiano

unread,

Aug 21, 2010, 2:17:05 PM8/21/10

orz wrote:
> On Aug 19, 2:14 am, "Cristiano" <cristiaN...@gmail.com> wrote:
>> I ran your program and it's very fast compared to RaBiGeTe.
>> I'll try to use it to test some PRNGs which I wrote.
>
> I've attempted to link the RaBiGeTe source code with my research
> codebase (which is broadly similar to the released stuff, but bigger
> and uglier). I disabled DFT to eliminate the dependency, moved all
> symbols in to a namespace, and modified RBG.cpp to let my code specify
> any of my RNGs for it at runtime, and adjusted a few details like the
> locations it looked for some txt files in.
> It's now more or less working inside my code. I've attempted to adapt
> it to reset global and static variables back to initial values during
> startup so that I can call it multiple times per run, but haven't
> managed to succeed at that yet. I'm having trouble with longer tests
> on it... if I tell it 4 Gb it complains about 0 not being valid, and
> if I tell it 511 Mb then it crashes partway through. Probably I broke
> something(s) when connecting it to my code and whatnot.

The variable for the sequence length ('n') must be <= 2^32-1 (n is stored in
a 32 bits unsigned long).
Perhaps you forgot to call some test initializations or I forgot some "out
of memory error" messages.

Now that I'm trying to adapt the code for multiple calls in a run (because
of the GUI), the static variables are a big problem also for me.

LOL :-) I meant all the tests in the suite, not all the tests in the
universe. :-)

> While it's very difficult to define objectively, I believe that some
> statistical tests are better than others.

I agree, but sometimes happens that only 1 test in the suite finds a
weakness (the current version of RaBiGeTe includes 29 tests).
Some days ago, during the testing of a PRNG which I'm still developing, only
the AMLS test failed; it is slow and weak in many cases.

> I broadly evaluate tests on
> qualities like speed (how quickly they can reject a significant number
> of RNGs), breadth (how wide a variety of RNGs they can reject), and
> orthogonality/originality (to what degree their merits are redundant
> give the broader set of tests available - their marginal contribution
> to the breadth of an intelligently chosen set of tests). I suggest
> systematically evaluating those kinds of metrics in the most objective
> way you can when deciding which tests with which parameters should be
> included in a preset.

In my view, the time taken is much too high compared to the benefit you get.
Sorry, in this point we cannot agree.

Cristiano

orz

unread,

Aug 22, 2010, 8:15:04 PM8/22/10

The excessive file sizes for the static libraries is due to the
combination of "whole program optimization" with STL strings.

So either I disable whole program optimization, or I change
vRNG::get_name to accept a pointer at a buffer for C-style strings
instead. The former probably makes more sense though it may or may
not cost in performance, but I'm kind of tempted by the later due to
general dislike for STL strings. And STL strings are the only
templates left in the basic PractRand package.

progress on 0.81:
1. Figured out where the giant file sizes came from (see above)
2. Changed vRNG8, vRNG16, vRNG32, vRNG64 (base classes for 8, 16, 32,
& 64 bit polymorphic RNGs) to be compatible with Boost / C++0x TR1 RNG
distributions. I am considering changing vRNG (base class for all
polymorphic RNGs) itself to be compatible, but that would be
inherently inefficient. Though the distribution implementations in
TR1 look pretty inefficient, so maybe that's okay.
3. Reviewed license issues on the code & algorithms involved. It
looks like everything is completely free except the SHA-2
implementation used (which is mostly free, but requires copyright
notices in source and binaries), which I'll replace either with
another implementation, and possibly some of the mt19937 code used
(also mostly free, but I can't find the original source for it to
check the license) so I'll rewrite the last bits of the mt19937
implementation that are not my own.
The objective here is to get public domain implementations of every
recommended algorithm, and verify that the recommended algorithms
themselves are freely usable (ie unpatented) as well. This applies
only to the recommended RNGs - those in the basic PractRand package.
In later versions the full package will include extra lower quality
RNGs intended for research purposes only, and those RNGs I won't sweat
the licenses so much since no real product would want to ship with
them anyway.
4. Minor changes to the jsf algorithm: constants for the 64 bit
variant were adjusted to match Robert Jenkins 64 bit variant (for
improved statistical properties and sort of more standard
compliance).
5. Some documentation changes & fixes

additional intended changes:
1. Add a few more tests (the three already in are the good tests, but
a wider variety would help in some circumstances)
2. Revise my "efiix" algorithm again... I'm having a hard time making
it both secure and fast, and might end up splitting it in to one
focusing on speed and another focusing on security.
3. More documentation changes & fixes.

delayed till 0.82:
1. Put together a chart showing which RNGs fail which tests & test
suites in how many seconds.
2. Add a variant of the sample program that uses multiple threads (via
SDL multithreading).
3. Add some checks to verify that the RNGs and whatnot are behaving
correctly.
4. Add a few lower quality RNGs to the full version for demonstrating
the tests on.

orz

unread,

Aug 24, 2010, 1:57:12 PM8/24/10

PractRand Version 0.81 is up.
http://sourceforge.net/projects/pracrand/
I didn't get around to getting any more tests in to it.

The changes are something like:
fixes:
- fixed a missing symbol "randi_fast_implementation"
- fixed the bloated file sizes
- fixed some rare issues with uniform floating point distribution
other:
- improved various bits of documentation
- removed various files that didn't belong
- now supports limited compatibility with Boost / C++0x TR1 RNG
distributions
- checked up on some license issues
tweaked some specific RNGs:
- jsf: some small changes to make it better match Robert Jenkins code
- arbee: improved slightly, eliminated short cycles
- efiix: reverted to an older, faster version

other stuff will have to wait for 0.82

orz

unread,

Sep 15, 2010, 8:17:31 PM9/15/10

PractRand is up to version 0.83 now, with version 0.84 likely heading
out the door sometime in the next week.

PractRand results on various RNGs vs TestU01 SmallCrush / Crush /
BigCrush:

RNG PractRand TestU01
standard *Crush
*the worst of the recommended RNGs
jsf16 > 2 TB pass
mt19937 > 4 TB 0/2/2
clcg96_32 > 4 TB pass
lcg64_32 32 GB 0/4/?
*some simple RNGs:
sapparot 128 MB 0/0/1
clcg64_16 4 GB 0/4/5
garthy16 32 MB 0/15/?
garthy32 256 GB pass
mwlac16 64 GB pass
*some cyclic buffer / fibonacci style RNGs:
mm32 4 MB 1/8/12
mm32_16 1 GB 0/0/2
mwc4691 1 GB pass
cbuf_accum 4 GB pass
dual_cbuf 128 GB pass
dual_cbuf_accum 1 TB pass
mt19937_unhashed 32 GB 0/2/2
*some indirection-based RNGs:
ibaa8x2 2 GB 0/8/19
ibaa8x4 128 GB ? pass
ibaa8x8 > 1 TB pass
ibaa16x2 16 GB pass
ibaa16x4 > 2 TB pass
ibaa32x2 2 TB pass
rc4_weakened 32 GB 0/1/1
rc4 4 TB pass

note on RNG selection criteria:
suggestions for RNGs to include would be welcome... the fact that I
made up many of these RNGs off the top of my head is a possible source
of bias.
requirements for RNGs to include are:
1. likely to fail *some* statistical test eventually (or tunable
quality that can be reduced until it is forced to fail a test)
2. fast enough to not slow testing down to a crawl
3. produces either 8, 16, 32, or 64 random bits at a time

note on comparing results between PractRand and TestU01 *Crush:
BigCrush takes about 5.5 hours on a fast RNG on the test computer.
Calling SmallCrush, Crush, and BigCrush all together takes over 6
hours on a fast RNG on the test computer.
PractRand takes about 4 hours to test 1 TB, or 1 minute to test 4 GB,
on a fast RNG on the test computer.
So, a result of "0/0/1" on *Crush implies that the total amount of CPU
time required to find that bias was over 6 hours, but in theory
probably only about 2.5 hours of that were actually necessary since in
theory an implementation could be made that aborted the test battery
as soon as a single failure was found.
So, in theory a result "0/0/1" is closest, in terms of computation
required, to a PractRand standard battery result of "512 GB" (which
takes about 2 hours). In practice however, it actually takes more
time than a PractRand standard battery result of "1 TB".
comparison:
*Crush - PractRand standard - time required
0/0/1 - 512 GB or 1 TB - 2.5 hours or 6 hours
0/1/? - 64 GB or 128 GB - 17 minutes or 35 minutes
1/?/? - 256 MB - 4.5 or 9 seconds

Failure is defined as p <= 1.0e-10 or p >= 1 - 1.0e-10, though I have
to be a bit fuzzy with that in the PractRand tests due to imprecise p-
value approximations.

0 new messages