Mersenne Twister

769 views
Skip to first unread message

steven

unread,
Jul 10, 2005, 11:28:17 AM7/10/05
to
Hi,
Can you explain in layman terms why the MT (e.g. MT19937) isn't suitable for
encryption purposes?
TIA

Steven
(I'm not a cryptologist, nor a mathematician, so please type slowly)


Wolfgang Ehrhardt

unread,
Jul 10, 2005, 1:37:43 PM7/10/05
to
On Sun, 10 Jul 2005 15:28:17 GMT, "steven"
<steven_s...@telenet.be> wrote:

>Hi,
>Can you explain in layman terms why the MT (e.g. MT19937) isn't suitable for
>encryption purposes?

Because it is a linear non-secure pseudorandom number generator.
Makoto Matsumoto et al have a submitted cryptographically improved
version of MT to the ECRYPT Stream Cipher Project:

http://www.ecrypt.eu.org/stream
http://www.ecrypt.eu.org/stream/ciphers/cryptmtfubuki/cryptmtsource.zip
http://www.ecrypt.eu.org/stream/ciphers/cryptmtfubuki/cryptmtfubuki.pdf

Hope that helps

Wolfgang

--
In order to e-mail me a reply to this message, you will have
to remove PLEASE.REMOVE from the address shown in the header
or get it from http://home.netsurf.de/wolfgang.ehrhardt
(Free AES, CRC, Hash, and HMAC source for Pascal/Delphi)

steven

unread,
Jul 10, 2005, 1:55:04 PM7/10/05
to

"Wolfgang Ehrhardt" <Wolfgang.Ehrhar...@munich.netsurf.de>
wrote in message news:42d15ca9...@news.individual.net...

> On Sun, 10 Jul 2005 15:28:17 GMT, "steven"
> <steven_s...@telenet.be> wrote:
>
> >Hi,
> >Can you explain in layman terms why the MT (e.g. MT19937) isn't suitable
for
> >encryption purposes?
>
> Because it is a linear non-secure pseudorandom number generator.
> Makoto Matsumoto et al have a submitted cryptographically improved
> version of MT to the ECRYPT Stream Cipher Project:
>
> http://www.ecrypt.eu.org/stream
> http://www.ecrypt.eu.org/stream/ciphers/cryptmtfubuki/cryptmtsource.zip
> http://www.ecrypt.eu.org/stream/ciphers/cryptmtfubuki/cryptmtfubuki.pdf
>
> Hope that helps
>

Not really :-(
Maybe I should have posed my question differently: Why is MT non-secure
(while e.g. BBS is secure)?
Thanx for replying however.

Steven


Paul Rubin

unread,
Jul 10, 2005, 2:01:25 PM7/10/05
to
"steven" <steven_s...@telenet.be> writes:
> Maybe I should have posed my question differently: Why is MT non-secure
> (while e.g. BBS is secure)?

Um, why are oranges orange (while e.g. lemons are yellow)?

MT was not designed as a cryptographic generator. There's a
straightforward way (described in the MT paper) to reconstruct the
internal state. Remember where it says something like it's
equidistributed in 623 dimensions? That means that if you start
counting in 624 or more dimensions, all bets are off. That's
different from a cryptographic generator, which can't be distinguished
from uniform no matter how many dimensions you use.

Herman Rubin

unread,
Jul 10, 2005, 3:41:55 PM7/10/05
to
In article <7xwtnyz...@ruckus.brouhaha.com>,

There can be no "cryptographic generator" in that sense
which uses a computer program. This includes BBS, which
to be sure has more dimensions, and is not linear.

Some of the seed programs for the Mersenne generator have
been too simple. Has anyone been able to crack it given
random seeding for all of the input, which would make it
more than 19,000 seed bits?
--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
hru...@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558

Paul Rubin

unread,
Jul 10, 2005, 3:53:46 PM7/10/05
to
hru...@odds.stat.purdue.edu (Herman Rubin) writes:
> >counting in 624 or more dimensions, all bets are off. That's
> >different from a cryptographic generator, which can't be distinguished
> >from uniform no matter how many dimensions you use.
>
> There can be no "cryptographic generator" in that sense
> which uses a computer program. This includes BBS, which
> to be sure has more dimensions, and is not linear.

Well, I mean in a feasible amount of time on a real computer. That
means there's no P-time distinguishing algorithm. Whether BBS fits
that description is an open problem.

> Some of the seed programs for the Mersenne generator have
> been too simple. Has anyone been able to crack it given
> random seeding for all of the input, which would make it
> more than 19,000 seed bits?

I believe the MT paper says how to do that.

David Wagner

unread,
Jul 10, 2005, 4:42:23 PM7/10/05
to
Herman Rubin wrote:
>Some of the seed programs for the Mersenne generator have
>been too simple. Has anyone been able to crack it given
>random seeding for all of the input, which would make it
>more than 19,000 seed bits?

Yes, I think so. I don't recall the details, but if I remember
correctly it was reasonably straightforward.
The cryptanalyst needs more than 19,000 bits of output in this case,
of course.

jsa...@ecn.ab.ca

unread,
Jul 11, 2005, 11:56:41 PM7/11/05
to
Herman Rubin wrote:
> Some of the seed programs for the Mersenne generator have
> been too simple. Has anyone been able to crack it given
> random seeding for all of the input, which would make it
> more than 19,000 seed bits?

Cracking it is trivial even in that case, provided you have just over
19,000 output bits.

John Savard

jsa...@ecn.ab.ca

unread,
Jul 12, 2005, 12:01:20 AM7/12/05
to
steven wrote:
> Can you explain in layman terms why the MT (e.g. MT19937) isn't suitable for
> encryption purposes?

> (I'm not a cryptologist, nor a mathematician, so please type slowly)

As has been noted, it's a linear shift register.

But that doesn't really answer the question. Why are linear shift
registers insecure?

It is because there is a simple relationship between the successive
32-bit words of pseudorandom data that the Mersenne Twister produces.
The relationship will allow those words to be in a sequence that does
not exactly repeat itself for a very long time, but it only takes a
short time to determine all the information you need to duplicate the
entire sequence.

The successive words of data give the entire internal state; in a
secure keystream generator, the internal state is nearly impossible to
derive from the outputs.

John Savard

steven

unread,
Jul 12, 2005, 4:25:53 AM7/12/05
to
<jsa...@ecn.ab.ca> wrote in message
news:1121140493.0...@g47g2000cwa.googlegroups.com...

Now that's a clear answer. Thanks John.
Silly question: what if the feedback polynomial would change after each
shift?

Steven


jsa...@ecn.ab.ca

unread,
Jul 13, 2005, 4:06:41 PM7/13/05
to
steven wrote:
> Silly question: what if the feedback polynomial would change after each
> shift?

Answer: that depends on how complicated the rule is that makes it
change.

John Savard
http://www.quadibloc.com/crypto/intro.htm

Cristiano

unread,
Jul 13, 2005, 4:40:48 AM7/13/05
to

What does "trivial" means in this case?

Cristiano


George Marsaglia

unread,
Jul 13, 2005, 7:04:22 PM7/13/05
to

"steven" <steven_s...@telenet.be> wrote in message
news:l8bAe.141612$NN6.7...@phobos.telenet-ops.be...

I have never had much interest in cryptography,
but some of the work I have done on producing
and testing randomness seems to be of interest to people
in that field. I have seen frequent mention of the
Mersenne Twister, which does very well in tests of
randomness such as in the Diehard battery. A good part
of its appeal seems to be that it has a very long period.

Far less complicated methods can be used to get much
longer periods and as good or better results in tests of
randomness, with simpler programs, and---at least
to me---more attractive theory.

Here is an example in C, with period some 10^33000
times as long as that of the Mersenne Twister:

static unsigned long Q[4096],c=362436;
unsigned long CMWC4096(void){
unsigned long long t, a=18782LL,b=4294967295LL;
static unsigned long i=4095;
unsigned long x,r=b-1;
i=(i+1)&4095;
t=a*Q[i]+c;
c=(t>>32); t=(t&b)+c;
if(t>r) {c++; t=t-b;}
return(Q[i]=r-t); }

The generator produces, in reverse order, the base
b=2^32-1 "digits" in the expansion of 1/p, where
p=18782b^4096+1 and the period of this CMWC RNG is
p-1>10^39460.
It requires 4096 seeds 0 <= x_i < 2^32-1 (the Q array)
and an initial seed 0 <= c < 18782.
For any such seed set, the RNG produces, in reverse
order, the base-b digits of the expansion of k/p
for some 0<k<p.
Because b is a primitive root of p, any two such
expansions are just circular rotations of one another.

An initial seed c < 18782 and the 4096 elements
in the static array Q[4096] should be assigned
values before calls to the function CMWC4096(),
otherwise the first few thousand returned values
will be zeros. (That is consistent with the view
that the choice of seeds merely chooses a random
starting point in a huge circle of over 10^39460
base-(2^32-1) digits, (all possible 32-bit integers
except for FFFFFFFF)).
Failing to initialize the Q[4096] array (set by
default to 0's) merely provides a starting point
at a long string of zeros, which should be occasionally
encountered in any random string.)

Is it possible to predict forthcoming values,
given a string of previous values?
Yes, but you need more than 4097 previous values to do it,
(better than the Mersenne Twister, which needs < 650).

But given, say, 2000 previous values, you only know
that you are somewhere in the immense loop of over
10^39460 "digits" of the expansion of 1/p, and that
particular string will appear over 2^2096 times in the
expansion.
So you would need a lot of computer time to find just
where it appears, and thus be able to predict coming values.

(I have, thanks to help of colleagues in Hong Kong,
a java version of CMWC4096, should readers ask for it.)

An answer to your question might be framed in terms
similar to that above:
Most random number generators produce a sequence of numbers
which can be likened to the loop of digits in the expansion of 1/p above,
(but with far smaller loops). Choosing random seed values, (the random
part of RNG), puts you somwhere in that loop, from where successive
numbers follow. If you have a certain number of observed values you may
be able to find their place in that loop, and thus figure out the rules of
the
RNG, or, knowing its type, find its parameters or rules of production.

But if the RNG's loop is very very long, you may have to work hard to
predict future values. The Mersenne Twister requires a pretty long
observed sequence, but far short of those based on the expansions
of 1/p, for which current abilities to find primes can be expected to
provide expansions that will require some 250000 32-bit numbers
in order to find your place in the loop.

George Marsaglia


John E. Hadstate

unread,
Jul 13, 2005, 7:21:39 PM7/13/05
to

"George Marsaglia" <g...@stat.fsu.edu> wrote in message
news:1-qdnU7JG-N...@comcast.com...

>
>
> Here is an example in C, with period some 10^33000
> times as long as that of the Mersenne Twister:
>
> static unsigned long Q[4096],c=362436;
> unsigned long CMWC4096(void){
> unsigned long long t, a=18782LL,b=4294967295LL;
> static unsigned long i=4095;
> unsigned long x,r=b-1;
> i=(i+1)&4095;
> t=a*Q[i]+c;
> c=(t>>32); t=(t&b)+c;
> if(t>r) {c++; t=t-b;}
> return(Q[i]=r-t); }
>

Thank you!

Jean-Luc Cooke

unread,
Jul 14, 2005, 2:32:12 PM7/14/05
to
George Marsaglia <g...@stat.fsu.edu> wrote:

> "steven" <steven_s...@telenet.be> wrote in message
> news:l8bAe.141612$NN6.7...@phobos.telenet-ops.be...
> > Hi,
> > Can you explain in layman terms why the MT (e.g. MT19937) isn't suitable
> > for
> > encryption purposes?
> > TIA
> >
> > Steven
> > (I'm not a cryptologist, nor a mathematician, so please type slowly)
> >

> I have never had much interest in cryptography,
> but some of the work I have done on producing
> and testing randomness seems to be of interest to people
> in that field. I have seen frequent mention of the
> Mersenne Twister, which does very well in tests of
> randomness such as in the Diehard battery. A good part
> of its appeal seems to be that it has a very long period.

> Far less complicated methods can be used to get much
> longer periods and as good or better results in tests of
> randomness, with simpler programs, and---at least
> to me---more attractive theory.

> Here is an example in C, with period some 10^33000
> times as long as that of the Mersenne Twister:

Do you have proff to such statements? Can you provide us with
a paper please?

JLC

Cristiano

unread,
Jul 14, 2005, 5:06:56 PM7/14/05
to
George Marsaglia wrote:

> Here is an example in C, with period some 10^33000
> times as long as that of the Mersenne Twister:

That could be interesting, but there is the proof that MT has a very good
k-distribution, for example it is:
(623,32)-distributed
(1246,16)-distributed
(2492,8)-distributed
(9968,2)-distributed
which could be useful in some application.
Please, could you post some value for your generator?

> static unsigned long Q[4096],c=362436;

You say that your generator requires an initial seed 0 <= c < 18782.
Can that c be changed in something like c=RND % 18782?

Thank you
Cristiano


George Marsaglia

unread,
Jul 14, 2005, 6:40:07 PM7/14/05
to

"Jean-Luc Cooke" <jlc...@engsoc.org> wrote in message
news:db6b3c$n90$1...@driftwood.ccs.carleton.ca...

Let b=2^32-1, p=18782*b^4096+1 and k=p-1.
It may take many hours to verify, (and many more
hours for the original search to find p),
but b^k=1 mod p and
b^(k/2) != 1 mod p and b^(k/9391) != 1 mod p
for 2 and 9391, the two prime divisors of p-1,

It follows that p is a prime for which b is a primitive root.
The period of the base-b expansion of 1/p is p-1;
more generally, the period of the base-b expansion of
k/p is the order of b mod p. (see, for example, Marsaglia
and Zaman, `` A new class of random number generators'', (1991)
Annals of Applied Probability} V 1, No. 3, 462--480, which merely
states what seems to have long been known in number theory.

Thus the base-b expansion of 1/p, for b=2^32-1,
has period 18782*b^4096 = 10^39460.877333...,
and this period is more than 10^33000 times as long as
that of the Mersenne Twister.

The C function CMWC4096 is a
complimentary-multiply-with-carry RNG, which generates
a sequence of pairs x_n,c_n by means of the recursions

x_n=(b-1)-(ax_{n-r}+c_{n-1} \bmod b,
c_n=\lfloor[(ax_{n-r}+c_{n-1})/b]\rfloor.

(TeX notation).

You will have to verify that the C function CMWC4096
does this for b=2^32-1.

A description of methods for producing, in reverse order,
the base-b digits of a rational k/p is in the above paper for
add-with-carry, and its extension to multipy-with-carry
is described, for example, in
http://tbf.coe.wayne.edu/jmasm/vol2_no1.pdf.

You might also want to read the postscript file
mwc1.ps in

The Marsaglia Random Number CDROM
including
The Diehard Battery of tests pf Randomness

http://www.cs.hku.hk/~diehard/.

George Marsaglia

unread,
Jul 14, 2005, 6:41:58 PM7/14/05
to

"Cristiano" <cristi...@NSquipo.it> wrote in message
news:QtABe.162196$75.69...@news4.tin.it...

Yes, ANY c in 0 <= c < 18782 will do, and ANY set of 32-bit values
in 0 <= x < 2^32-1 may be used as seeds in the Q array.

I don't know what (623,32), etc. means, but those who do should
be be able to run values to see whether the 10^49360 base-b "digits"
in the expansion of 1/p seen to have those properties. As with pi,e
and expansions of most irrationals, the expansions of rationals k/p
for large primes p are expected to pass extensive tests of randomness,
in the sense that they may be reasonably considered the output of
a sequence of iid variates taking values in [0,b-1) with equal
probabilities.
Extensive testing using Diehard and its the tough-test supplements
confirm this.
A paper verifying this for pi,e,sqrt(2) and various k/p is forthcoming

George Marsagliia


Cristiano

unread,
Jul 15, 2005, 8:19:35 PM7/15/05
to
George Marsaglia wrote:
> An initial seed c < 18782 and the 4096 elements
> in the static array Q[4096] should be assigned
> values before calls to the function CMWC4096(),
> otherwise the first few thousand returned values
> will be zeros. (That is consistent with the view
> that the choice of seeds merely chooses a random
> starting point in a huge circle of over 10^39460
> base-(2^32-1) digits, (all possible 32-bit integers
> except for FFFFFFFF)).
> Failing to initialize the Q[4096] array (set by
> default to 0's) merely provides a starting point
> at a long string of zeros, which should be occasionally
> encountered in any random string.)

I tested your generator using Qi=0 for 0 <= i <= 4095 and it fails the test
because I get many 0's followed by many 0xfffffffe's.
The test is failed also when I initialize Q with some sort of counter or
clock.
But if I discard the first 4096*10000 values or more, the generator seems to
pass the test.
Is there any method to initialize Q in the right way or the best thing is to
always discard the first few thousand returned values?

Thank you
Cristiano


Cristiano

unread,
Jul 16, 2005, 4:57:03 AM7/16/05
to

That is good, but the k-distribution is a "stronger" distribution.

This is what I mean with the "(n,b)-distributed" notation:
1) consider an n-dimensional hypercube;
2) take n consecutive b-bit numbers formed taking the b msb;
3) these n numbers are the coordinate of a point in the hypercube;
then, for the whole period, you will see exactly the same number of points
in each cell of the hypercube.

Cristiano


George Marsaglia

unread,
Jul 16, 2005, 6:06:47 AM7/16/05
to

"Cristiano" <cristi...@NSquipo.it> wrote in message
news:roYBe.167453$75.71...@news4.tin.it...
As I indicated in the original post, choosing a random seed set:

an initial carry c in 0 <= c < 18782 and 0 <= Q[i] < b=2^32-1,

merely places you at some unique random point in the immense cycle of
over 10^49360 digits in the base-b expansion of 1/p for
the prime p=18782*b^4096+1. There are exactly 18782*b^4096
ways to choose that initial random point, and ---just as, for example,
in the expansion of pi we must expect rare runs of many 0's---we should
expect that the expansion of 1/p should have rare sets of many 0's.
Choosing certain initial seed set puts you at one of those points. There are
others that yield the sequence 1,2,3,...,9999999999999 and others that will
give, in sequence, the birthdates and current weights of the entire
readership
of all Google newsgroups, in alphabetical order. But the rareness for
which
we get such output---or any particular output--- is consistent with our
mathematical model of randomness.

George Marsaglia


Cristiano

unread,
Jul 16, 2005, 8:42:25 AM7/16/05
to

I asked for the right seeding method because I *always* get bad results with
this seeding:
Q[i]= RDTSC() % 0xffffffff
the RDTSC() function reads the time-stamp counter (a 64-bit model specific
register which is incremented every clock cycle) of my AthlonXP 3000+
processor.
I get Q's like these:
D749D65 D749E1B D749E8C D749EFD ... D7BCD19 D7BCD89 D7BCDF9 D7BCE69
or
18B090EA 18B091A0 18B09211 18B09282 ... 18B7ECC0 18B7ED30 18B7EDA0 18B7EE10
we have many fixed high order bits.
I can seed your generator using many different sequences and
*sistematically* it will fail the test.

If I use Q[i]= RDTSC() * RDTSC() % 0xffffffff
which generates Q's like these:
8C84B61F 6287F9C5 D18A895D 408DCFD8 ... 34C63518 AF686DC2 2A0B5D4F A4AF03BD
I *always* get good results.

Why the latter method is "better" than the former?

Cristiano


Simon Johnson

unread,
Jul 16, 2005, 1:54:53 PM7/16/05
to
In the simplest possible terms there is a difference between something
producing a random looking distrubtion of bytes and producing a stream
of bytes that is indistinguishable from a random source.

The MT is an algorithm that passes the first test but not the second.
It is easy to distinguish MT from a random source and this fact alone
is considered a break of the MT. Fufilling the indistinguishability
criteria is neccessary and sufficient for producing a secure
stream-cipher.

Simon.

Cristiano

unread,
Jul 16, 2005, 3:00:24 PM7/16/05
to
Simon Johnson wrote:
> In the simplest possible terms there is a difference between something
> producing a random looking distrubtion of bytes and producing a stream
> of bytes that is indistinguishable from a random source.

To me, "random looking" seems = "indistinguishable from a random source".
What do you mean?

> The MT is an algorithm that passes the first test but not the second.
> It is easy to distinguish MT from a random source and this fact alone
> is considered a break of the MT.

No general purpose test can distinguish the MT sequences from a random
source.

> Fufilling the indistinguishability
> criteria is neccessary and sufficient for producing a secure
> stream-cipher.

It's necessary but *not* sufficient; a crypto (P)RNG must be also
unpredictable.

Cristiano


Paul Rubin

unread,
Jul 16, 2005, 3:13:48 PM7/16/05
to
"Cristiano" <cristi...@NSquipo.it> writes:
> To me, "random looking" seems = "indistinguishable from a random source".
> What do you mean?

"indistinguishable" means you have no practical way to tell which is which.

> No general purpose test can distinguish the MT sequences from a random
> source.

I don't know what you mean by "general purpose test", but
"indistinguishable" you can't tell the difference using ANY test, not
just general purpose tests. If, after studying the MT algorithm and
source code for a year or two, you can come up with an MT-specific
test based on the intimate details of how MT works, that distinguishes
an MT stream from a random stream, MT is distinguishable.

That said, MT is getting a bad rap here; it was designed be very fast
and to have reasonable statistical properties for physical simulations
and stuff like that. It was never intended to be a stream cipher and
its designers specifically advised people that it wasn't suitable as one.

> It's necessary but *not* sufficient; a crypto (P)RNG must be also
> unpredictable.

If it was predictable, you obviously could distinguish it from a
random source, so indistinguishabability is sufficient.

Message has been deleted

Unruh

unread,
Jul 16, 2005, 5:09:21 PM7/16/05
to
"Simon Johnson" <simon....@gmail.com> writes:

Well, I am not sure that the second is actually a criterion at all. How,
given simply the stream do you decide if it has been satisfied? If tomorrow
I break say RC4 or AES, do you then say that the test is no longer
satisfied but today it is? Ie, this would seem to be not a test but a
hope.


>Simon.

Herman Rubin

unread,
Jul 16, 2005, 5:25:26 PM7/16/05
to
In article <roYBe.167453$75.71...@news4.tin.it>,

I would always initialize any pseudo-random generator with
the output of at least a fair physical generator.

Paul Rubin

unread,
Jul 16, 2005, 5:30:24 PM7/16/05
to
Unruh <unruh...@physics.ubc.ca> writes:
> >It is easy to distinguish MT from a random source and this fact alone
> >is considered a break of the MT. Fufilling the indistinguishability
> >criteria is neccessary and sufficient for producing a secure
> >stream-cipher.
>
> Well, I am not sure that the second is actually a criterion at all. How,
> given simply the stream do you decide if it has been satisfied?

You're at the point where maybe you should look at a formal mathematical
treatment. Bellare and Rogaway's lecture notes are a good place to start:

http://www-cse.ucsd.edu/users/mihir/cse207/classnotes.html

> If tomorrow I break say RC4 or AES, do you then say that the test is
> no longer satisfied but today it is? Ie, this would seem to be not
> a test but a hope.

That's the best anyone can do with our current state of knowledge.
It's conjectured that AES can't be distinguished from a random
permutation, where "conjecture" means "unsolved math problem".
Sometimes an unsolved problem gets solved and then it's unsolved
yesterday and solved today.

So, three things can happen with AES:

1) Someone proves AES is distinguishable from random. That's
considered a break of the cipher even if it doesn't lead to practical
security compromises in fielded applications. It's happened with many
other ciphers (including RC4 and more recently SHA1) and might happen
with AES. That's bad for users but not that big a deal in a
theoretical sense. Nobody really expects this (especially for a very
fast and definite distinguisher) but it might happen.

2) Someone proves AES is indistinguishable. That's a HUGE result
because it means the P vs. NP problem is solved. Nobody expects this
and it will be a big surprise if it happens anytime soon.

3) Nobody proves anything. We keep on thinking and hoping, without
definite proof, that AES is strong. This is the most likely way
things will continue for a while.

Unruh

unread,
Jul 16, 2005, 5:35:06 PM7/16/05
to
"Cristiano" <cristi...@NSquipo.it> writes:

>Simon Johnson wrote:
>> In the simplest possible terms there is a difference between something
>> producing a random looking distrubtion of bytes and producing a stream
>> of bytes that is indistinguishable from a random source.

>To me, "random looking" seems = "indistinguishable from a random source".
>What do you mean?

>> The MT is an algorithm that passes the first test but not the second.
>> It is easy to distinguish MT from a random source and this fact alone
>> is considered a break of the MT.

>No general purpose test can distinguish the MT sequences from a random
>source.

Sure. If it can be broken then it can be distinguished from a random
source. It has patterns which a random source would not have. Note that ALL
prngs are different from random sources. They can be predicted, which a
random source cannot.

>> Fufilling the indistinguishability
>> criteria is neccessary and sufficient for producing a secure
>> stream-cipher.

>It's necessary but *not* sufficient; a crypto (P)RNG must be also
>unpredictable.

A random source cannot be predicted. And no PRNG is unpredictable. Just
give me the algorithm and the (short) key and I can predict it forever.

Now, can I figure out how to predict it just given the output alone? I do
not know but I do know that there are very very strong patterns in the
output based on the fact that it is generated by a PRNG.

>Cristiano


Paul Rubin

unread,
Jul 16, 2005, 6:04:10 PM7/16/05
to
Unruh <unruh...@physics.ubc.ca> writes:
> A random source cannot be predicted. And no PRNG is unpredictable. Just
> give me the algorithm and the (short) key and I can predict it forever.

When we talk about PRNG's as stream ciphers, the idea is that you're
given the algorithm but not the key. The distinguishability criterion
means you get two streams of data. One comes from the (known) PRNG
algorithm with an unknown key, and the other is random, and you're
asked which is which. If you can find a way to tell the difference
a significant amount of the time, you've broken the cipher. Again,
see Bellare and Rogaway's notes for formal definitions.

Unruh

unread,
Jul 16, 2005, 8:04:27 PM7/16/05
to

> http://www-cse.ucsd.edu/users/mihir/cse207/classnotes.html

It is distiguishable from random. Given a tiny bit of knowledge, the key,
all output is known. No random generator has that property.
I simply need to search through 2^128 bits of space ( a tiny amount
compared to all strings of length a google) and I WILL discover the key and
will discover that AES is completely predictable.


>2) Someone proves AES is indistinguishable. That's a HUGE result
>because it means the P vs. NP problem is solved. Nobody expects this
>and it will be a big surprise if it happens anytime soon.

No, it has absolutely nothing to do with P and NP.


>3) Nobody proves anything. We keep on thinking and hoping, without
>definite proof, that AES is strong. This is the most likely way
>things will continue for a while.

The strength of AES has nothing to do with either of the above.

Paul Rubin

unread,
Jul 16, 2005, 8:06:33 PM7/16/05
to
Unruh <unruh...@physics.ubc.ca> writes:
> It is distiguishable from random. Given a tiny bit of knowledge, the key,
> all output is known. No random generator has that property.
> I simply need to search through 2^128 bits of space ( a tiny amount
> compared to all strings of length a google) and I WILL discover the key and
> will discover that AES is completely predictable.

If we're talking about these terms as used in cryptography, the
criterion is polynomial-time tests in the key size. A 2^128 search is
considered infeasible. "Polynomial time" is where P vs. NP comes in.

Unruh

unread,
Jul 16, 2005, 8:06:27 PM7/16/05
to
Paul Rubin <http://phr...@NOSPAM.invalid> writes:

easy. Search the key space. It is small. (It may take a while but since
when did math care about human lifetimes). Find the key. show which one is
derived from teh PRNG.


David Wagner

unread,
Jul 16, 2005, 8:16:34 PM7/16/05
to
Unruh wrote:
>Paul Rubin <http://phr...@NOSPAM.invalid> writes:
>>You're at the point where maybe you should look at a formal mathematical
>>treatment. Bellare and Rogaway's lecture notes are a good place to start:
>
>> http://www-cse.ucsd.edu/users/mihir/cse207/classnotes.html
>
>It is distiguishable from random. Given a tiny bit of knowledge, the key,
>all output is known.

Argh. You really need to read Bellare and Rogaway. They make it
all formal. In particular, the usual notion of indistinguishability
for cryptography is *computational* indistinguishability: no computationally
bounded attacker should be able to distinguish. (e.g., no attacker using
more less than 2^80 clock cycles should be able to distinguish with
advantage better than 2^-32.)

It's all spelled out there. You really would benefit from reading it.
Right now, all I can say is that your comments make you appear to be a
little unaware of some of the basic notions of theoretical cryptography.
Those comments are missing key points, points which have been already
addressed in the basic literature.

Bodo Moeller

unread,
Jul 16, 2005, 10:28:43 PM7/16/05
to
Paul Rubin <>:
> Unruh <unruh...@physics.ubc.ca> writes:

You are, of course, right that the 2^128 search can be considered
infeasible so that the brute-force attack does not make AES
distinguishable from a random permutation.

However, this is not really about P and NP. The term "polynomial
time" only has a very fuzzy meaning here; this is not the strict
notion from complexity theory. There is no definition of AES for
arbitrarily long keys, so it is meaningless to talk about asymptotics:
there is no security parameter that can be increased to make the
adversary's task harder. While the 2^128 step search talks a very
long time indeed, it still is a constant-time attack.

Bodo Moeller

unread,
Jul 16, 2005, 10:32:46 PM7/16/05
to
Paul Rubin <>:
> Unruh <unruh...@physics.ubc.ca> writes:

You are, of course, right that the 2^128 search can be considered


infeasible so that the brute-force attack does not make AES
distinguishable from a random permutation.

However, this is not really about P and NP. The term "polynomial
time" only has a very fuzzy meaning here; this is not the strict
notion from complexity theory. There is no definition of AES for
arbitrarily long keys, so it is meaningless to talk about asymptotics:
there is no security parameter that can be increased to make the

adversary's task harder. While the 2^128 step search takes a very

Paul Rubin

unread,
Jul 16, 2005, 10:44:38 PM7/16/05
to
Bodo Moeller <bmoe...@acm.org> writes:
> However, this is not really about P and NP. The term "polynomial
> time" only has a very fuzzy meaning here; this is not the strict
> notion from complexity theory. There is no definition of AES for
> arbitrarily long keys, so it is meaningless to talk about asymptotics:
> there is no security parameter that can be increased to make the
> adversary's task harder. While the 2^128 step search talks a very
> long time indeed, it still is a constant-time attack.

Fair enough. However, AES reduces to (say) a SAT problem with at
worst a few thousand terms, a relatively small problem. We (perhaps
without justification) use "SAT is in P" to mean "SAT is tractable"
even though the P-time algorithm for recognizing SAT could have a
horrendously large exponent.

If someone proves AES has no better distinguisher than brute force, it
means that the corresponding SAT problem also has no faster solution
than searching the AES keyspace, which goes against what we normally
pretend it means for there to be a P-time SAT solver.

Obviously there is a gap in this logic, as you point out: there could
be a P-time SAT solver which is still so slow that using it to break
AES is slower than using traditional brute force. In this flavor of
complexity theory we just handwave that issue by only caring about the
asymptotics and pretending all the constants are small. So there is a
certain amount of hope involved about the constants, but the relevance
of P vs. NP comes from the existence of that SAT reduction, which is a
definite thing.

Joe Peschel

unread,
Jul 16, 2005, 11:49:39 PM7/16/05
to
Bodo Moeller <bmoe...@acm.org> wrote in news:dbcfor$fqf$1...@lnx107.hrz.tu-
darmstadt.de:

> While the 2^128 step search talks a very
> long time indeed, it still is a constant-time attack.
>

It's a lot quicker when we get it to shut up.

J

--
__________________________________________

http://www.impeach-bush-now.org

Joe Peschel
D.O.E. SysWorks
http://members.aol.com/jpeschel/index.htm
__________________________________________

steven

unread,
Jul 17, 2005, 2:00:38 AM7/17/05
to

"Paul Rubin" <http://phr...@NOSPAM.invalid> wrote in message
news:7xirzas...@ruckus.brouhaha.com...

> Sometimes an unsolved problem gets solved and then it's unsolved
> yesterday and solved today.

:-)


Cristiano

unread,
Jul 17, 2005, 4:24:35 AM7/17/05
to
Paul Rubin wrote:
> "Cristiano" <cristi...@NSquipo.it> writes:
>> To me, "random looking" seems = "indistinguishable from a random
>> source". What do you mean?
>
> "indistinguishable" means you have no practical way to tell which is
> which.
>
>> No general purpose test can distinguish the MT sequences from a
>> random source.
>
> I don't know what you mean by "general purpose test", but
> "indistinguishable" you can't tell the difference using ANY test, not
> just general purpose tests.

When I talk about generators, I always have in mind the general purpose
tests (RaBiGeTe, TESTU01, NIST, DH).
"Random looking" = "indistinguishable" = sequence which looks like random
because pass all the general purpose tests.
But with the "new" definition of "indistinguishable" ("new" for me)
obviously I totally agree with Simon Johnson.

> If, after studying the MT algorithm and
> source code for a year or two, you can come up with an MT-specific
> test based on the intimate details of how MT works, that distinguishes
> an MT stream from a random stream, MT is distinguishable.

I guess that it is always possible to come up with a specific test for *any*
non-crypto PRNG.

Cristiano


Cristiano

unread,
Jul 17, 2005, 4:28:38 AM7/17/05
to
Sebastian Gottschalk wrote:

> Cristiano wrote:
>
>> No general purpose test can distinguish the MT sequences from a
>> random source.
>
> Just take a dimension higher than the highest dimension of
> equistribution and you'll find something non-equidistributed.

No GP test can "take" the 624nd dimension of something; you need to write ad
hoc test.

Cristiano


Paul Rubin

unread,
Jul 17, 2005, 4:31:34 AM7/17/05
to
"Cristiano" <cristi...@NSquipo.it> writes:
> I guess that it is always possible to come up with a specific test
> for *any* non-crypto PRNG.

Yes, that's why writing crypto PRNG's is more demanding than writing
non-crypto PRNG's. Crypto means designing against an intelligent
adversary who knows what algorithms you're using and is trying to
beat you any way he can.

Cristiano

unread,
Jul 17, 2005, 4:43:12 AM7/17/05
to
Unruh wrote:
>
> A random source cannot be predicted. And no PRNG is unpredictable.
> Just give me the algorithm and the (short) key and I can predict it
> forever.

Eh! Eh! If I use a CPRNG I will keep secret the key!
You'll try to break the generator without any knowledge of the key.

> Now, can I figure out how to predict it just given the output alone?
> I do not know but I do know that there are very very strong patterns
> in the output based on the fact that it is generated by a PRNG.

Are you sure? I'd say that the patterns are very very weak because it's very
hard to write a test which is able to find a pattern.
Even if you test many Gbits of a CPRNG using the most effective tests, you
don't see any pattern.

Cristiano


ro...@nospam.com

unread,
Jul 17, 2005, 6:51:38 AM7/17/05
to
"Cristiano" <cristi...@NSquipo.it> wrote:
>George Marsaglia wrote:
>> "Cristiano" <cristi...@NSquipo.it> wrote in message
>> news:QtABe.162196$75.69...@news4.tin.it...

..... <snip> .....

>>> That could be interesting, but there is the proof that MT has a very
>>> good k-distribution, for example it is:
>>> (623,32)-distributed
>>> (1246,16)-distributed
>>> (2492,8)-distributed
>>> (9968,2)-distributed
>>> which could be useful in some application.
>>> Please, could you post some value for your generator?

What are you on about? The (623, 32) distribution implies the others
automatically. This is true whenever we compare larger with smaller word sizes
in a perfectly packed code, and binary is of course a perfectly packed code.

Secondly, you should not use this notation. There is a standard definition of
(m,k) distribution which goes back at least to the 1st edition of Knuth's
famous book, The Art of Computer Programming (1969), and it has nothing in
common with your notation.

..... <snip> .....

>That is good, but the k-distribution is a "stronger" distribution.

Nonsense.
In the book mentioned above, Knuth put a lot of emphasis on the idea that
infinite-distribution of a random variate was a strong criterion for
"randomness". This seems to have led to a widespread belief that high orders
of k-distribution are enough by themselves to say that a sequence of integers
is "as close as we can get" to random. It is easy to show that this is wrong.
(Note that the various theorems proved by Knuth applied to continuous variates
on [0,1), not to integers).

First of all, consider pure multiplicative linear-congruential generator (MLCG)
with a modulus P. It is trivial to find the parameters such that the output
is at least k-distributed for 32 bit integers. This is true for any member of
the family of LCGs with P = 2^(32*(k+m)) + 1. In your case, the mininum
desired k = 623, and with no effort at all (my freeware primality testing
program shows run times of < 0.2 sec) we find that the smallest member of the
family is m = 2, the next member is m=10, etc. We have an infinite family of
LCGs, every one of which is exactly distributed for 32 bit integers and every
one has k > 623 -- but every one is very non-random because of the failings of
LCGs as a class. LCGs have values which cluster on hyperplanes, they fail the
spectral test and other tests, their parameters can be discovered from only a
few values from the sequence, etc etc. They are very NON-random and useless
for cryptographic purposes: no matter how high the k-distribution (and this
can be extended to infinity).

Secondly, I'm sure it's not just LCGs. LCGs fail a variety of tests due to the
high degree of pattern in the sequence, which is in turn due to the simple
generator function. I assume that we can see the same scenario of
(detectable non-randomness despite good distribution properties) with other
PRNGs of simple structure: can someone else comment?

Thirdly, there is more important way in which LCGs are non-random. If the
sequence values are integer (or more generally, quantised to q levels), a true
random sequence should show some runs, with the probability of consecutive
repeats being 1/q, 1/q^2, etc. Generators that have outputs that come close to
this distribution are behaving much more like true random numbers than ones
that don't. In this respect, full-period LCGs are the worst - their
probability of a consecutive repeat of the same number is always exactly 0,
regardless of generator period, output sequence length, and distribution order.

The upshot is that the k-distribution is much weaker evidence of randomness
than the behaviour Prof. Marsaglia mentioned.

ross

John E. Hadstate

unread,
Jul 17, 2005, 10:10:08 AM7/17/05
to

"Joe Peschel" <jpes...@no.spam.org> wrote in message
news:Xns9695E7F0BEEFE...@216.168.3.44...

> Bodo Moeller <bmoe...@acm.org> wrote in
> news:dbcfor$fqf$1...@lnx107.hrz.tu-
> darmstadt.de:
>
>> While the 2^128 step search talks a very
>> long time indeed, it still is a constant-time attack.
>>
>
> It's a lot quicker when we get it to shut up.
>

All you have to do is compile it with debugging turned-off!


Cristiano

unread,
Jul 17, 2005, 10:55:48 AM7/17/05
to
ro...@nospam.com wrote:
> "Cristiano" <cristi...@NSquipo.it> wrote:
>> George Marsaglia wrote:
>>> "Cristiano" <cristi...@NSquipo.it> wrote in message
>>> news:QtABe.162196$75.69...@news4.tin.it...
>
> ..... <snip> .....
>
>>>> That could be interesting, but there is the proof that MT has a
>>>> very good k-distribution, for example it is:
>>>> (623,32)-distributed
>>>> (1246,16)-distributed
>>>> (2492,8)-distributed
>>>> (9968,2)-distributed
>>>> which could be useful in some application.
>>>> Please, could you post some value for your generator?
>
> What are you on about? The (623, 32) distribution implies the others
> automatically. This is true whenever we compare larger with smaller
> word sizes in a perfectly packed code, and binary is of course a
> perfectly packed code.

I know that.
In a first moment I thought to post the entire table about the
k-distribution of MT as in MT paper, then I posted just few values. What's
the problem?

> Secondly, you should not use this notation. There is a standard
> definition of (m,k) distribution which goes back at least to the 1st
> edition of Knuth's famous book, The Art of Computer Programming
> (1969), and it has nothing in common with your notation.

M. Matsumoto, in his paper, uses the correct terminology: "k-distribution
to v-bit accuracy", but for simplicity I used that "strange" notation.

>> That is good, but the k-distribution is a "stronger" distribution.
>
> Nonsense.
> In the book mentioned above, Knuth put a lot of emphasis on the idea
> that infinite-distribution of a random variate was a strong criterion
> for "randomness". This seems to have led to a widespread belief
> that high orders of k-distribution are enough by themselves to say
> that a sequence of integers is "as close as we can get" to random.
> It is easy to show that this is wrong. (Note that the various
> theorems proved by Knuth applied to continuous variates on [0,1), not
> to integers).

Nonsense? Try to understand why I wrote "stronger" distribution with the
quotation marks.
Prof. Marsaglia said:
"[...] the expansions of rationals k/p for large primes p are expected to
pass extensive tests of randomness, in the sense that they may be reasonably
considered the output of a sequence of iid variates taking values in [0,b-1)
with equal probabilities."
which should mean that in the whole period you see all the values from 0 to
b-2 the same number of times.
I said "stronger" because all the good generators output numbers which are
uniformly distributed, but there could be some generator that has a low
order k-distribution.

> The upshot is that the k-distribution is much weaker evidence of
> randomness than the behaviour Prof. Marsaglia mentioned.

Your main problem is that you think I said "high order k-distribution =
randomness", but I've never said that; the only thing I said is that an high
order k-distribution could be useful in some application.
My only mean to see if a sequence is random looking is to test it.


So, back to my question: can you tell me, please, the order of the
k-distribution of Prof. Marsaglia's generator?

Cristiano


Cristiano

unread,
Jul 17, 2005, 11:24:27 AM7/17/05
to

Why? Many times it's perfectly good to initialize a big state with a simple
generator or with a counter or with an high resolution timer (like the P4
TSC).

Anyway, here I'm trying to understand why CMWC4096 fails sistematically when
I use some particular method.
As I said, extensive testing showed that with the seeding method: Q[i]=
RDTSC() % 0xffffffff, CMWC4096 always fails, this means that there is an
incredibly big amount of non-valid sequences.

Cristiano


Unruh

unread,
Jul 17, 2005, 1:03:35 PM7/17/05
to
"Cristiano" <cristi...@NSquipo.it> writes:

Yes, I will. It is called the algorithm which is itself a test, with an
unknown element, the key. Since the key is far far shorter than the stream
I have, that key with the algorithm defines a pattern.
Now I agree that it is not easy to find that pattern. It takes perhaps
something like 2^128 attempts, but once I have found it the pattern is
completely predictable and deterministic, and can be extended to an
arbitrarily large piece of output.

Or to use the Chaitin/Kolmogorov definition of randomness, the output is
entirely non-random. The length of the shortest program (the algorithm plus
the key) is far far shorter than the output.
And no matter which sample I take, this is always true for that PRNG. Ie,
even the average length of the program required to output a large number of
samples is far shorter than the output (whcih is not true of a true RNG)


>Cristiano


Paul Rubin

unread,
Jul 17, 2005, 2:06:48 PM7/17/05
to
Unruh <unruh...@physics.ubc.ca> writes:
> Or to use the Chaitin/Kolmogorov definition of randomness,

In cryptography, we much don't care about that. We care about
computational tractability.

Cristiano

unread,
Jul 17, 2005, 2:37:11 PM7/17/05
to

Althought I substantially agree, I don't have any experience in testing a
generator using the generator itself.
You say that "it takes perhaps something like 2^128 attempts" (and this is
what I call "it is very hard to write a test"), but do you know if it is
just an academic question?

Cristiano


Unruh

unread,
Jul 17, 2005, 6:07:16 PM7/17/05
to
Paul Rubin <http://phr...@NOSPAM.invalid> writes:

And where is the conflict?

Paul Rubin

unread,
Jul 17, 2005, 10:04:22 PM7/17/05
to
Unruh <unruh...@physics.ubc.ca> writes:
> >> Or to use the Chaitin/Kolmogorov definition of randomness,
>
> >In cryptography, we much don't care about that. We care about
> >computational tractability.
>
> And where is the conflict?

Tractability is about what you can do on real computers using a
practical amount of resources. The Kolmogorov/Chaitin complexity of
CPRNG sequence is irrelevant if you don't know the key and have no
practical way to find it.

Reply all
Reply to author
Forward
0 new messages