i have a bit of a problem and am endeavouring to track down whether
it's my tests that are at fault. ha ha, of course there are no
problems with rijndael or serpent so it must be my use of gnuplot
or my code.
see: http://lkcl.net/crypto/key_as_data.tgz.
convenient build requirements: gcc and gnuplot.
(anything else you're on your own).
the basic premise of the test i am performing is to pick an initial
key with only one bit set, to encrypt some input data, to XOR in one
bit
into the output, then to use the output as the KEY for the next
encrypt,
repeat the process N times.
the outputs from this process i am chaining together and performing a
simple bit-count and statistical test to produce a p-value: the code
is ripped wholesale from STS, using erfc().
the STS bit-count test documentation notes two things: firstly, that
you shouldn't attempt to test less than 100 bits (okay - so that means
that if testing a 128-bit block cipher, we can test 1 block. hooray).
secondly, that a p-value of less than 0.01 indicates a bit of a problem
with your data. _especially_ if you end up with large numbers of <0.01
p-values.
this process i repeat 128x128 times - for initial keys with bit 0 set
all the way through to bit 127 (assuming a 128-bit cipher) and XORing
in one bit all the way from 0 to 127...
and _that_ process i repeat say 256, 512, 2560 times, with different
input
data.
i then build up a histogram, using the key bit-index and the data
bit-index,
of the number of times that the p-values "fail" - i.e. fall below 0.01.
i then use gnuplot in weird and wonderful ways to plot these histograms
in 3D.
i make three plots: one as you might expect: the key-bit-index as x,
the
data-bit-index as y, and the "bit-bucket" - number of p-value fails -
as z.
the other two are where i treat the "bit-bucket" as x, and make the
key-bit-index
and data-bit-index y/z and z/y respectively.
the results show patterns, where if everything was hunky-dory, there
would
be none. even plotting the raw data instead of creating histograms
from
it, and rotating the graph at some speed very clearly shows up what
look
like diffraction / interference patterns.
i initially started with rijndael-128, and was sufficiently puzzled by
the irregularities - or more specifically i should say regularities -
to try using serpent as a baseline, and found that serpent had even
worse
regular patterning than rijndael-128.
could somebody _please_ go over this code, and my use of gnuplot, and
point
out what i have done wrong.
for example: is the problem that i start off in each case with an
initial
key containing only one bit set?
am i just not generating enough test results? (yes, when i was
developing
an encryption algorithm i bought 8 AMD2000+ machines and ran NIST's STS
program on literally gigabytes of output for sometimes days on end, for
several months, fine-tuning the algorithm, so i do know how important
it is to test properly for randomness.)
i am beginning to understand that the "odd" graphs - where the
histogram
data is treated as the x-axis (and is therefore sorted) will show only
a
few data points at the extremes, such that the graphs cannot be
considered
reliable - but even with histogram values in the tens of thousands,
regularities still showed up.
overnight, i just ran a 5-hour run to take 256000*128*128*5 samples: at
the
extreme ends of the histogram spectrum - bars of size 421500 and bars
of
size 426000 - there are clear lines indicating correlations between key
bits around 90, 100, 16, 24... i think they're key bits: i can't
remember
which way round i did the plots :)
i'm placing the jpeg gnuplot window captures here:
http://lkcl.net/crypto/rjd.key.5.256000.jpeg
http://lkcl.net/crypto/rjd.data.5.256000.jpeg
advice much appreciated.
l.
What is the output specifically? The entire block of ciphertext?
Pseudo code would help here. From what I see it's
1. key = 2^(rand() % 128)
2. for i = 0 to n do
2.1 out[i] = encrypt(SOMEDATA[i], key)
2.2 key = out[i] xor 2^(rand() % 128)
Well first off, 128^3 is only 2 million bits. You should be doing
testing on larger sample sets.
That said, changing one key bit should have a drastic effect on the
input even if SOMEDATA[i] = SOMEDATA[j] for all i != j. So the
observed output entropy should be fairly high for 16384 blocks.
Tom
AES is supposed to act like a random permutation if the key is random
and uniformly distributed. If the keys are special and related, you
can get structure in the output (related-key attack). AES does not
claim to still act random when the keys are non-random. I'm not sure
if that applies to what you're doing, which I didn't read very
carefully. However, if you're not using random keys, maybe nothing is
wrong with either AES or your code.
note that there is _no_ PRNG input - i abandoned the
attempts to use a PRNG because i was concerned
that a (bad) PRNG would either introduce patterns
or repeat values, so instead i decided to use "counter"
mode as the initial input.
for k = 1 to 256000
{
zero(input)
(*(int*)input) = k; /* sort-of "CTR" mode. */
for i = 0 to 127
{
zero(keybit)
set_bit(keybit, i)
for j = 0 to 127
{
zero(databit)
set_bit(databit, j)
num_fails = analyse_data(5times, input, keybit, databit)
bitbucket[i][j] += num_fails
}
}
}
analyse_data(num_times, input, keybit, databit)
{
key = copy(keybit)
data = copy(input)
for (i = 1 to num_times)
{
output = aes_encrypt(key, data)
key = copy(output)
data = xor(data, xor(output, databit))
deviation = deviation_from_average_num_bits(output)
if p_value(deviation) < 0.01
num_fails++
}
return num_fails;
}
note that i am NOT doing a test which counts the "average number of
bits" and does a histogram, like STS does.
i am taking a note of the NUMBER OF FAILURES. in other words,
i am taking a note of the number of times that the encrypt contains
far more ones or zeros than there normally would be.
i am _not_ analysing the data directly.
i am statistically analysing the p-values (lots and lots of
them. 20 billion p-values, in the case of the
128x128x5x256000 test i ran for 5 hours last night).
very different.
hence my concern.
incidentally, if i perform 20 billion p-value tests, that's
20 billion Rijndael-128 calculations, so that's 2560
billion bits being analysed: there's _probably_ enough
there to get getting on with :)
i understand your concerns.
the slightly worrying thing is that, after selecting the
initial key with only one bit set (and the rest all zeros),
i am using the encrypted output as the key for a new
encrypt.
[okay - that's what i'm _trying_ to test - whether my code
actually does test that or not is what i am trying to
establish!]
okay ... i think i've spotted a flaw in the code... the
loop sets the initial key only the first time in the
analyse loop. will get back to you on what i find...
analyse_data(input, keybit, databit)
{
test_data = 4 x 128 blocks
for i = 1 to 4 (indexing test data)
{
key = copy(keybit)
data = copy(input)
output = aes_encrypt(key, data)
key = copy(output)
data = xor(data, xor(output, databit))
test_data[i] = output /* chain the 4 encrypts together */
}
deviation = deviation_from_average_num_bits(test_data)
if p_value(deviation) < 0.01
return 1
return 0
}
that's better.
will let you know what the output looks like...
http://lkcl/net/crypto/key_as_data.1.tgz
i just ran a test generating 128x128x102400 p-values,
and analysed the results, a graph of which can be seen
here:
http://lkcl.net/crypto/data.1.rjd.2.204800.histogram.256x32x4.jpeg
using gnuplot scatter.data.dem.
remember, in the plots, i've turned the histogram on its side.
so the x-value is the histogram value (and therefore gnuplot
sorts the data by histogram size along the x-axis), the y-value
is the bit being altered in the data (i think) and the z-value
is the bit being initially set in the key (i think).
i would expect there to be at least _some_ clear randomness
even in the low end of the histogram values: what i'm very
concerned about is seeing clearly distinct curves which very
bluntly show a correlation between the key bits and the data bits.
and, more worryingly, these correlations can be seen not just
at the low and high end of the (sorted) histogram values, but
curves can quite clearly be seen in the entire first and last
THIRDS of the histogram range.
only when you get into the middle range of the histogram
values is it no longer possible to see clearly whether
there's any correlations.
so, what are the options, here? what have i done wrong?
you mention that non-random keys are known in rijndael
to be problematic.
my hypothesis is this: that rijndael's data output is not as
non-random as it seems - it's just that we don't have any
tools to yet detect this.
the hypothesis can be tested by treating data outputted
from a rijndael encrypt as the key for a second encrypt.
intuitively, it seems clear to me, that if there are flaws
in rijndael's keying, there would equally be flaws in
rijndael's encryption. by exploiting the weakness in
the keys, it may be possible to demonstrate weakness
in the data input side, as well.
the implications for the development of encryption
algorithms in general should be quite clear: block
ciphers need to be "super-random" - i.e. change
one bit of the key _or_ change one bit of the data
and on average 50% of the bits of the output change,
randomly.
but this is jumping ahead quite a bit... i want to prove
that i am on to something, here, or i want to find the
flaw in my thinking or the implementation of the tests.
don't really mind which :)
I don't have any intuition about that--I don't understand precisely
what you mean by a flaw in rijndael's keying.
> by exploiting the weakness in the keys, it may be possible to
> demonstrate weakness in the data input side, as well.
The keys are supposed to be random and secret. What weakness is there
to exploit?
> the implications for the development of encryption algorithms in
> general should be quite clear: block ciphers need to be
> "super-random" - i.e. change one bit of the key _or_ change one bit
> of the data and on average 50% of the bits of the output change, randomly.
That hasn't traditionally been a design goal of most block ciphers
including Rijndael/AES. If you want a feature like that, try running
the key through a hash function before putting it into the block cipher.
> but this is jumping ahead quite a bit... i want to prove that i am
> on to something, here, or i want to find the flaw in my thinking or
> the implementation of the tests.
I think the flaw is that you're looking for properties of the block
cipher that it was never intended to have, and then being surprised
that it doesn't have them.
sorry i don't believe i have explained myself very clearly.
i apologise for this: your comments and questions have
already proved very helpful, so i owe you a clear
explanation.
you mention in an earlier message that with rijndael,
you are supposed to utilise random keys, and that
the use of related keys can result in correlations
in the output (non-randomness).
what people are expecting - and i believe unfortunately
taking for granted - is that irrespective of this,
if your input data is truly random, then even if you have
related keys, the output data should also be truly
random.
related keys _should_ be irrelevant, if the input is random.
my hypothesis is that the opposite is true.
regarding your statement that i am looking for properties
of the cipher that aren't there, and then being surprised
that they aren't there: if i was only testing for related
keying, i agree with you: that would be the case.
but i am not testing for related keying. i am testing
for related keying also being related to the data,
through this double-encrypt process (by using the
output from one encrypt as the key for the second
encrypt):
for all i=0..127 and for all j=0..127 and for lots of input,
K = 1<<i
B = 1<<j
output = E(E(K, input), B XOR E(K, input))
compute_p_value(count_bits( output ))
p_value < 0.01 => add 1 to histogram bucket [i][j].
analyse histogram: shows problems.
your suggestion about running through a hashing
function is a very good one. i will try this:
K' = SHA-1(K)
output = E(E(K', input), B XOR E(K', input))
yes i know that i am only testing with 128 keys on
the initial encrypt. that doesn't excuse an algorithm
from being non-random on its output, even if i'm only
using 128 keys.
... not twice in a row, anyway :)
i see your point. i like it. i will try to incorporate it.
i will see what happens.
thank you.
l.
using K' = SHA-1(K) and then using K' as the key
makes the correlations _more_ stark in the output.
hmm.... i think i should try a twist: introducing more
variation in the keys - i.e. using more than 128 keys...
Which people? Not people who know what the cipher is designed to do.
> related keys _should_ be irrelevant, if the input is random.
That was never the design intention.
> my hypothesis is that the opposite is true.
Maybe so, but not indicative of anything interesting even if correct.
> your suggestion about running through a hashing
> function is a very good one. i will try this:
>
> K' = SHA-1(K)
> output = E(E(K', input), B XOR E(K', input))
OK. I will be probably off-net for a while starting this evening so
might not see your results right away.
okay, well, every 16 inputs i change the set of keys used, by doing
the following in the outer loop:
for (k = 0 to 2048000)
{
if (k % 16 == 0)
create_keyset_with_seed(k);
for i=0-127, j=0-127
{
hist[i][j] += analyse(make_input(k), keyset[i], 1<<j);
}
}
where create_keyset_with_seed(k) does this:
for i=0-127
{
char key[16];
set_bit(key, i)
sha1_update(ctx, key, 16)
sha1_update(ctx, (int*)&k, 4)
keyset[i] = sha1_final(ctx)
}
yes i know i could simplify the key create function by just
calling sha1_update with the ints 0..127 and k :) heck
even a _byte_ 0..127 would do.
testing with 2048 loops (yes, very quick, i know) and therefore
2048/16 sets of keys still shows correlations.
you implicitly ask what am i trying to prove by doing this test?
well, what i suspect is that, if my test is correct rather than
based on some mistakes or turns out to have false assumptions,
that the test could be used as the basis for an alternative
type of differential analysis.
i only have some ideas on what to try out, so far on that score:
i haven't quite thought out the full implications, yet.
Sorry, I disagree. Most well-designed protocols try to avoid relying
no such assumptions.
That said, I don't know of any successful related-key attack on Rijndael.
If you have found one, that would be publishable news. But, I haven't seen
any evidence that you have found such an attack. If you think you have such
an attack, you might start by describing your attack algorithm, and what kind
of queries it needs to make to the block cipher, and why you think that an
ideal cipher would resist such an attack but Rijndael would not.
>for all i=0..127 and for all j=0..127 and for lots of input,
>K = 1<<i
>B = 1<<j
>output = E(E(K, input), B XOR E(K, input))
>
>compute_p_value(count_bits( output ))
>p_value < 0.01 => add 1 to histogram bucket [i][j].
>
>analyse histogram: shows problems.
Have you tried instantiating E() with some other block cipher, such as
IDEA (instead of Rijndael), to see if you detect the same non-random
properties? If so, then the problem is probably not with the block
cipher, but with your strange method of using the block cipher.
Another likely source of flaws is using a buggy implementation.
(For instance, I hope you are not using rand() or random() for
generating random inputs.)
I would think you should also try to do some debugging. For instance,
does the same bias appear if you replace the above by the following?
for lots of random X values
for all j=0..127
B = 1<<j
output = E(X, B xor X)
Shouldn't you be calling create_keyset_with_seed() on every iteration
of the outer loop? As it currently stands, it sounds like you are computing
the same time 16 times in a row.
I've never used Gnuplot, so I can't help you
there, nor do I understand what you're trying to
plot. But, having looked at your source code, I
have some comments to make.
First, things like this historically tend to be
related to bad random number generators. I see
that you appear to be using AES as your RNG, but
the comment in rand.c is wrong; this isn't counter
mode; it's more like Output Feedback Mode, but
instead of continuing to encrypt the output, you
xor the output with the previous input at each
stage. The effect of this is to turn the
guaranteed (for counter mode) or expected (for OFB
mode) long period of the generator into something
much smaller. But it's still expected to be about
2^64, so that shouldn't be a problem. Anyway, in
the line that says:
block[i] ^= out[i];
I think you should take out the '^'; then you have
OFB mode and an expected period of 2^126.
Also, I have no idea why you use the very
complicated key; what's wrong with the all-zeros
key for this application? It just scares me to see
this magic.
Lastly the variables "block" and "out" should be
static to the rand.c module. I haven't yet seen
anywhere that might tread on them, but you never
quite know just from reviewing the source code.
Having got to this point, I can't find any code in
the archive that appears to have anything to do
with what you've been talking about later in this
thread. So I can't review it and can't help
directly.
From your later posting, you say:
>for all i=0..127 and for all j=0..127 and for lots of input,
>K = 1<<i
>B = 1<<j
>output = E(E(K, input), B XOR E(K, input))
So I have a question for you; forget the inner
encryptions for a second, and for lots of random
inputs, and for j=0..127, calculate:
output = E(input, B XOR input).
Does this show your bias? Since E(K, input) should
have been something like random (for random input)
I'd expect that any bias should still show up.
One last comment; it always scares me when you do
a statistical test like:
>compute_p_value(count_bits( output ))
and then try to perform statistics on the outliers
because by definition you're throwing away most of
the data. I don't know how to correctly process
stuff like that. But you should be able to find
what you're looking for in the binomial
distribution of the outputs themselves.
If I have time, I'll try to run some statistics
along the lines you've suggested. But honestly, I
think it is some sort of bug in your code.
Greg.
--
Greg Rose
232B EC8F 44C6 C853 D68F E107 E6BF CD2F 1081 A37C
Qualcomm Australia: http://www.qualcomm.com.au
Ok, I've looked over the code, and actually ported it over to Java since
that's what I have available right now, also a port should turn up gross
errors. Here's what I noticed in going through it that may explain at least
part of the problem.
Near the end of the computation you have:
/* count the number of bits, create a p-value, blah blah */
count = num_bits(test_data, sizeof(test_data));
testval = (count - ((double)sizeof(test_data)*4)) * 2;
sobs = fabs(testval) / sqrtl(sizeof(test_data));
f = sobs / sqrt2;
pval = erfc(f);
I think this is where a bit part of the problem is. In fact I went ahead and
tested just that bit
of code, there are only 9 possible values of count that will have a p-value
>=0.01 [120,128] (assuming a single block), so I strongly suspect there is
something wrong with this section, unless I ported incorrectly (a
possibility). However since this would coincide with an inordinate failure
rate I feel it's work taking a look, I didn't bother trying to hard to
understand this section of code, so I'm not certain of the problem.
Joe
many comments, and dang this google groups web interface
for not providing an "indent" system like it used to, 5 years ago.
yes i have tried using a different cipher - serpent. serpent
shows up even _more_ correlations than rijndael. i _must_
be missing something, here.
i will have to think about your suggestion to use E(X, B xor X)
because the test algorithm uses three inputs - k = 1<<i,
d = 1<<j, input = <incrementingsequenceatthemoment>.
actually, at paul rubin's suggestion, the key is sha-1(i appended
with incrementingsequence).
... and your suggestion only recommends what to use
for two of the inputs :)
i made an assumption that for sufficient input, it wouldn't
make very much difference. sha-1 is _very_ expensive
to be computing on every 128x128 loop. i'm making up
for it by running very long runs of ... oh... 2048000 outer
loops :)
i'm also assuming that there will be no correlation between
the use of k in make_input() - which simply sets the first 4 bytes
equal to k by doing memcpy(key, (char*)&key, sizeof(key) - and
the use of k in create_keyset_with_seed(k).
that might turn out to be a bad assumption.
but, equally, by changing the key on _every_ value of k, doing
such would just as equally need to be noted / taken into
consideration.
it's lucky i'm not using the truncated PRNG, then, innit :)
i'm using instead a sequential counter as input.
l.
in my intial tests, i was running with number of blocks
equal to several thousand - sufficient to produce
1 million bits.
i found that even then, there were an inordinately
high number of p-values > 0.01.
this was what alerted me to the problem in the first
place.
actually, it was NIST's STS bombing out with
a coredump because the balance of 1s and 0s was
_so_ way outside the statistical 50% balance mark
that there was a buffer overrun (inside the "runs"
test iirc - this _was_ two years ago, with version 15).
i continued to reduce the block size, and have found
that it's even possible to have only one block, and for
there to be an inordinate number of failures - i mean
we're talking 3.5% here! an average of 70,000
p-values < 0.01 out of 2048000 tests.
hmmm... *thinks*.... okay, that's coincidentally close
to the value "4.5/128". .... hmmm...
okay, what's 4.5/128 * 2048000 that's 72000.
hmmm... okay, what's a better appx median in one
of the histogram outputs: it's 71300. hmmm....
perhaps i should try increasing the number of blocks
being tested, again...
Ok, I did a stupid here, I was using a 2 block test_data, but only going to
128 instead of 256. The actual range for > 0.01 is [120,136], 17 values,
still much too small, but not what I stated before. Looking at the graph of
these values the bell curve appears to be too narrow. I'd actually suggest
that given the small number of possible counts ([0,256] for 2 blocks,
perfectly linear for larger blocks) that you generate a perfect p-val
function (e.g. all zeros has probability 1/(2^256)), this will give better
results.
Joe
take a pseudo-random function, F, which has three
inputs, i,j and k, where i and j range from 0 to 127,
and k ranges from 0 to anything-you-like.
the output from the function F should produce enough
bits to be able to perform a "count" test on it. then,
count the number of times that the output from the count
test "fails" - i.e. a p-value (as defined in the NIST STS)
is less than 0.01.
in the case of testing only 128 bits, as joe has pointed
out there are only 9 possible values of the "count" test
with such a p-value (hey, joe! that would explain the
4.5 / 128 figure! so that explains why the expected
average is in fact 72,000 on a run of 2048000,
so if you get significantly LESS than that proportion,
it hints that there's a problem. and yes, i'm getting a median
of 71400).
anyway.
you then plot the number of occurrences of p-values < 0.01
as a histogram against your input values i and j.
when using rijndael/128 in the way that i suggest, then
straight away, in the graph, you can instantly see that
something is wrong: the minumum and maximum are
nowhere near 72000 +/- something, it's more like
71400 +/- 900.
but even leaving that aside.
what i am expecting is that if you take a subset of the
data - those values that fall outside oh say above 2.5
standard deviations from the mean, and you specifically
plot those as an [i,j] 3D histogram, i would STILL
expect them to show a random distribution.
this is, after all, supposed to be random data: the
randomness should still hold, even when you take
a very specific sub-sampling of the data.
what i _don't_ expect to see is curves indicating
very striking correlations between i and j as you
examine subsections of the histogram values
that fall inside a [relatively large!] band of
standard deviations.
anyway. like paul, i'm going offline for a bit: will
be back in a few days. i would be delighted for
someone to have trashed my expectations and
pointed out all the assumptions by the time i
get back :)
hi: your analysis allowed me to do some exploring, as my
previous message shows, to work out why there was a
magic number 4.5/128 appearing: it's half of 9 / 128 which
corresponds with the p-value being < 0.01.
i too increased the number of blocks a bit - up to 8 in fact,
and have set a run going ... actually, what the hell: i'm away
for a couple of days, let's make it a run of 2048000 with 8
blocks :)
one thing i considered it worthwhile to do: to throw away
the first block, so i'm actually doing 9 encrypts.
17 values for 2 blocks - so on a run of 2048000, you'd expect
the average value in the histograms to be 2048000 * (17/2)/256
which works out at 68,000.
if you get an average _different_ from that...
remember, the use of the p-value function is purely to
look for the data blocks which contain unusually high
numbers of 0s or 1s (and then plotting the _input_ which
created those data blocks).
*thinks*... ah... waitacottonpickingminute... nope, i
don't see the connection.
i was worried for a minute that the unusually high
proportion of 0s or 1s would skew the histograms
somehow, but it's a histogram plot of the _inputs_
i and j that are being used.
need to go and think about it!!
Oh, I understand very well that you have made some assumptions. What I'm
suggesting you do is test your assumptions. It is the scientific method.
When you make some assumptions and then come up with a very
surprising/unexpected result, the first thing to do is to double- and
triple-check your assumptions. I have suggested several things to try.
In this case, I suspect it is more likely that there is something wrong
with your assumptions or your implementation or your expectations than
that there is something wrong with Rijndael.
This makes it VERY likely that the "problem" is in your code or in
the way you are using the block cipher or in misplaced expectations
-- and NOT a flaw in Rijndael itself.
I was suggesting is that a perfect p-val generation function be created
because there are only a small number of possible inputs. Just to create
very short examples:
1-bit p-val
pval(count)
{
if(count == 1) return 0.5;
if(count == 0) return 0.5;
}
2-bit
pval(count)
{
if(count == 0) return 0.25;
if(count == 1) return 0.5;
if(count == 2) return 0.25;
}
etc.
This should give exactly correct p-value numbers, although it will require
more work on your part. I believe there's also a way to use an integer
variant of the p-val, based on this, but I don't know how to do it off-hand,
and I'm not sure if it would actually improve efficiency (needed for large
block counts)
Joe
I did some checking too. I ran only a small number of runs (128), but
increased the block count to 4096, the results are very interesting, the
failure rate drops to 3.55%, I am currently doing a single run at 262144
blocks, to see is any interesting results pop up. If interesting results pop
up, I'd suggest that you alter yours to do a large number of block (4096
should suffice) and process the data byte by byte, this should show a strong
correlation in your processing going from a failure rate over 50% to a
failure rate that is very low, this casts substantial doubt on the
methodology at small values because any flaw detected in this way should
repeat itself very quickly.
As a side note, I substituted the Java Random function for the bit set
functions, only because it was easier to implement than setting a single
bit, and I was in a hurry.
Joe
A slight note of caution: Java's Random is not cryptographically secure.
(SecureRandom is.)
I am well aware of that, I only used it to replace the bit twiddling
function, in a situation where the security of the system is completely
unimportant. Since this is a monte carlo type simulation Random suffices.
Joe
Is this the actual code? Executed once per loop? I would think one
would precompute the interesting range of count.
>
> I think this is where a bit part of the problem is. In fact I went ahead and
> tested just that bit
> of code, there are only 9 possible values of count that will have a p-value
> >=0.01 [120,128] (assuming a single block), so I strongly suspect there is
> something wrong with this section, unless I ported incorrectly (a
> possibility). However since this would coincide with an inordinate failure
> rate I feel it's work taking a look, I didn't bother trying to hard to
> understand this section of code, so I'm not certain of the problem.
--Mike Amling
At the end of this post I'll include simple Python code for
computing p-values for binomial distributions.
> because there are only a small number of possible inputs. Just
> to create very short examples:
>
> 1-bit p-val
>
> pval(count)
> {
> if(count == 1) return 0.5;
> if(count == 0) return 0.5;
> }
>
> 2-bit
> pval(count)
> {
> if(count == 0) return 0.25;
> if(count == 1) return 0.5;
> if(count == 2) return 0.25;
> }
>
> etc.
>
> This should give exactly correct p-value numbers
Those are incorrect. First, let's nail down what "p-value"
means. From:
http://mathworld.wolfram.com/P-Value.html
A p-value is:
The probability that a variate would assume a value greater
than or equal to the observed value strictly by chance:
P(z >= z_(observed)).
Thus if we have two random independent bits (each with probability
0.5 of being set), and our metric is the number of set bits, the
p-values for each possible outcome are:
0 bits set: p = 1
1 bit set: p = 0.75
2 bits set: p = 0.25
Given 128 random independent bits, to reach p <= .01 we need 78
or more set bits. The actual p-value associated with 78 set bits
is 0.008335, so the probability of reaching the 1% significance
level for a random block is noticeably less than 1%.
--Bryan
# Python code for computing p-values against binomially
# distributed random variables, optimized for obviousness
# of conformance to textbook definitions.
def factorial(n):
""" Factorial function of non-negative integers.
"""
fact = 1
for i in range(1, n + 1):
fact = fact * i
return fact
def choose(n, m):
""" Return 'n-choose-m'.
"""
return factorial(n) / (factorial(m) * factorial(n - m))
def binomial(n, m, p):
""" Return the probability of exactly m hits in n independent
trials, where each trial hits with probability p.
"""
return float(choose(n, m)) * p**m * (1 - p)**(n - m)
def binomial_p(n, m, p):
""" Return the probability of m or more hits in n independent
trials, where each trial hits with probability p.
"""
sum = 0
for i in range(m, n + 1):
sum += binomial(n, i, p)
return sum
At this point, I do not really understand your tests, yet
I am convinced the anomaly you note is a simple testing
error. I expect we can reach a conclusion simply by
pinning down exactly what we are talking about.
When examining individual blocks of 128 bits, what,
precisely, is your criteria for adding a trial to the count
of those with p-values < 0.01?
> hmmm... *thinks*.... okay, that's coincidentally close
> to the value "4.5/128". .... hmmm...
>
> okay, what's 4.5/128 * 2048000 that's 72000.
>
> hmmm... okay, what's a better appx median in one
> of the histogram outputs: it's 71300. hmmm....
>
> perhaps i should try increasing the number of blocks
> being tested, again...
More data shouldn't hurt, but I suggest we focus on
clarity of method. There is no question that if random
blocks would fail with probability 0.01, and 70,000 of
2048000 did in fact fail, then the blocks were not
random.
--
--Bryan
That is a direct copy of the code in question, you're right though it is
quite possible to precompute the interesting counts, and eliminate actually
a fair number of float ops.
Joe
Thanks for the reference, to be honest I never checked. The definition
differs greatly from the computations done in the thread subject, and could
itself be a huge source of the problem.
> Given 128 random independent bits, to reach p <= .01 we need 78
> or more set bits. The actual p-value associated with 78 set bits
> is 0.008335, so the probability of reaching the 1% significance
> level for a random block is noticeably less than 1%.
> --Bryan
I'll leave the code here because having a second copy shouldn't hurt much,
it's shorter than what I've written in this thread alone.
Joe
The run just completed, it was only a single value for k, and took 13 1/2
hours (that's what I get for using the original Rijndael submission Java
code), but out of 16384 tests, 4553 failed, for a failure rate of 27%,
which is very suspiciously high. This used the original code for generating
p-values, so there may be errors.
Joe
Wanted the bits, counted the bytes.
Joe, was this bug in Ikcl's original code?
> f = sobs / sqrt2;
>
> pval = erfc(f);
For tests on one or several blocks, use the exact binomial;
the code above uses the normal approximation. As Mike Amling
pointed out, in this experiment we need only calculate the
interesting values once, so we can handle reasonably large
sizes with the exact distribution.
If you do need the normal approximation, re-write the code
above to not suck so much. Maybe like:
n_trials = sizeof(test_data) * 8;
/* Assume binomial distribution with p = q = 0.5. */
expected = n_trials * 0.5;
variance = n_trials * 0.5 * 0.5;
stdev = sqrtl(variance);
z_score = fabs(count - expected) / stdev;
pval = erfc(z_score / sqrt2);
--
--Bryan
It is exactly the original code, copy and pasted from the distributed
tarball.
Joe
I made some alterations based on the input from others and personal
observations. The biggest change is that I moved from a pval to a straight
count, from there the p-value can be computed any way you choose, I'll even
make the program available to anyone that wants it but there's no way I'm
sending anyone a multi-GB csv file. The numbers look to very quickly
converge, I've got a graph of the first 9000 or so entries, at 1 block
intervals. The maximum setBits/totalBits starts at 70.83% and drops to
51.12%, the minimum 36.67% to 48.93%, and average 53.31% to 50.01% so it
still has quite a few statistical anomalies to work out, at 20626, the
numbers move to; max 70.83% - 51.12%, min 35.00%- 48.91%, and average
53.28%-50.01%. Interestingly, although the rounding error hides it, the
final average actually rose from 50.006... to 50.0098, probably just a
statistical anomaly, I do expect these to over time approach the perfect
numbers of 100, 0, 50. I also think I've got too much compute time to spare
right now.
Joe
Thanks; then I agree with:
[Joe:]
: I think this is where a bit part of the problem is. [...]
I had originally thought there was a typo in that, and you
meant either "a big part of the problem" or maybe "a bit of the
problem". Since the problem in that code was using bytes
where it should have used bits, I now see that you had it right.
--
--Bryan
You lost me. Are readers supposed to be able to tell what the numbers
mean?
--
--Bryan
Sorry, it was late. That's the percentage of set bits; maximum, minimum, and
average.
Joe
>>You lost me. Are readers supposed to be able to tell what the numbers
>>mean?
>
>
> Sorry, it was late. That's the percentage of set bits; maximum,
minimum, and
> average.
I'm still missing something. A maximum that starts high and
drops doesn't conform to my idea of what a maximum is. Can
you explain out how you generate the block and how you
take the counts?
As Paul Rubin pointed out, related-key issues could cause
non-random-like statistics, but I'd be surprised if a simple test
could distinguish AES from a random cipher.
--
--Bryan
Sorry again. The counts are from sample lengths starting at 16-bytes,
extending to 4096-bytes. The critical code looks like:
count = 0;
for(i = 0; i < numBlocks; i++)
{
count+=count_bits(outputBlock)
fullcounts[i] = count;
}
........
The csv generation process is:
for(int i = 0; i < numBlocks; i++)
{
System.out.print(fullcounts[i]+",");
}
System.out.println("");
I'm willing to share the entire source code if you'd like it, at 53KB it's
not that large. It currently generates 2 files each 32769 lines of csv, one
of the straight counts (with a sample size header) and the second the
percent of bits set. The straight count csv is about 840MB, the percentage
one is ~2.4GB. I will also warn you that these files take a long time to
load in OpenOffice, in 1.1.5 it was on the order of half an hour to get the
graph I was looking at, in 2.0 it's 5-10 minutes, I would expect that MS
Office is in the 5-10 minutes range as well.
> As Paul Rubin pointed out, related-key issues could cause
> non-random-like statistics, but I'd be surprised if a simple test
> could distinguish AES from a random cipher.
I don't see any evidence that this distinguishes it from random, in fact I
see the opposite. I see it behaving very much like a random function should
under this test. The average converges very quickly towards 50%, the minimum
converges very quickly toward the expected minimum for the sample size, the
maximum converges very quickly towards the maximum expected for the sample
size. I only published the max and min per sample size because they are more
meaningful. The only meaningful information that I can see coming from this
is if the cycle length is notably different from expected, but this test
doesn't appear to give any indications about those statistics.
Joe
brian, hi, thanks for responding.
i'm so sorry: i don't understand what you mean by this.
it's terribly terribly straightforward: if the number of
bits set in the 128-bit block goes above or below
a certain number (determined by the nist.gov sts
"frequency" test) then i increase the count in the
histogram from which that 128-bit block was generated.
and i made sure that i cover _all_ bases - without
bias - in the histogram:
for k = 0 to 512000
for i = 0 to 127
for j = 0 to 127
if (analyse(k,i,j) == fail)
histogram[i][j]++
where analyse uses i to set 1 bit in the key, j is used to XOR 1
bit in the data in the second encrypt, and k is used to generate
the initial input (and i chose to use something similar to CTR
mode to do that).
i've since discovered an error in my p-value calculating function,
and a correction in that function results in less values being
classified as "fails" - i.e. it's not so much a "fail" but it's that
the deviation from the expected average 50% frequency
(ratio) of 0s and 1s is now far more stringent than before - but
that mistake doesn't actually affect the way the graphs look.
so, this is an analysis of the instances where output data
contains extreme numbers of 0s and 1s.
irrespective of the fact that this is detailed analysis of such
instances, i still would expect that there be no correlation
between key, data and input.
and there are _definitely_ correlations going on, as the following
graphs show:
http://lkcl.net/crypto/rjd.1.512000.64x64.rangelimited.jpeg
http://lkcl.net/crypto/rjd.1.512000.32x32.jpeg
> More data shouldn't hurt, but I suggest we focus on
> clarity of method. There is no question that if random
> blocks would fail with probability 0.01, and 70,000 of
> 2048000 did in fact fail, then the blocks were not
> random.
i should re-emphasise: i made a slight descriptive
error in calling p_value < 0.01 a "fail". what i am simply
using p_value < 0.01 to detect is the instances where
an output block contains an extreme number of 0s or
1s.
i appreciate your concern that you believe that if there
is an extreme number of 1s or 0s in the output, and that
there are visible correlations in the input from which the
output is generated, then therefore the blocks (the input)
must not have been random.
if my understanding of what you believe is correct
(and please do tell me if it's not!) then i have to point
out that your logic is slightly flawed.
firstly, the input is definitely not random: it's uniformly
and very deliberately "comprehensive". if the input
was not complete or was not uniform, then there
would _definitely_ be biases in the histograms
introduced as a result of missing test data.
i test all cases i=0..127, j=0..127 at anything
up to 2048000 times - two million times!!!
a number of things i should point out.
firstly - i'm not _using_ a random function anywhere
in the code: i am deliberately using something
similar to CTR mode: (*(int*)input) = k and running
k from 0 .. 2048000.
secondly, as i pointed out in a reply to bryan, i made
a (irrelevant, as it turns out :) mistake in the p_value
function: i divide by the sqrt(sizeof(test_data)) whereas
this should in fact be divide by sqrt(sizeof(test_data)*8))
because it's the sqrt of the number of bits not bytes.
thirdly, again as i pointed out to bryan: i made a mistake
in calling this a "failure" count. what i am doing is
simply creating a histogram of the input that causes
outputs to have an extreme number of 0s or 1s in the
data.
the hypothesis is that there should be ABSOLUTELY NO
correlations in the input that causes output blocks with
above-average occurrences of 0s or 1s.
therefore if you're going to use a PRNG for the input,
it needs to be a really good one.
yes, your adaptation i believe to be quite a sensible one,
because it makes clear that this isn't about "failure"
testing (p_v < 0.01).
now pick a setBits/totalBits threshold, ABOVE which you
increase the histogram bucket [i][j] by one.
then, plot the histogram using this:
"set dgrid3d 32,32,1"
what this does is it takes the histogram i=0..127 and j=0..127
and "groups" every 4 bits, producing an "average" of
the histogram values in i=0..3,j=0..3, i=4..7,j=0..3
all the way up to i=124..127,j=0..3 then the next row
i=0..3,j=4..7, i=4..7, j=4..7 etc.
i repeat: if there were no problems with rijndael-128, then
the histogram would be basically flat, to within expected
tolerances.
okay.
what is the probability that, if the average count is supposed
to be 5250 in a set of 512000x128x128 tests, that the
input i=0 and j=0 produces a count of only 5177????
that value is, at a guess, about TEN, maybe TWENTY standard
deviations away from the mean, given that 99.9% of the
128x128 histogram values are 5250 +/- 5 !!!
> Thanks for the reference, to be honest I never checked. The definition
> differs greatly from the computations done in the thread subject, and could
> itself be a huge source of the problem.
hiya joe,
no, a correct implementation of "p-value" is not required,
however for a correct understanding it should be
emphasised (this is the third time in this thread :)
that i am simply using the function to set a threshold
of the nist.gov "frequency" test - deviation of the
difference in the number of 0s and 1s from an
expected 50% distribution.
so yes, i made a mistake in implementing "p-value"
for the "frequency" test. and no, it's not relevant.
sobs = fabs(testval) / sqrtl(sizeof(test_data) * 8);
but other than that yes.
joe, by using num_bits
(or a java equivalent) directly as the raw data,
makes it clearer that doing this p-value thing
isn't important, it's the num_bits > {SOMEVALUE}
that's important.
fourth time pointing this out :) just in case you're
busy and don't want to go through the _entire_ thread :)
the mistake i made (and you also spotted)
simply changes the threshold point. in the case
of doing a 128-bit "bit-count" test, it changes
the average count of the histograms (reduces
the occurrences from 3.5-ish % down to
1.3%-ish)
what it DOESN'T affect is the validity of the test
methodology.
and the graphs STILL show correlations:
http://lkcl.net/crypto/rjd.1.512000.64x64.rangelimited.jpeg
http://lkcl.net/crypto/rjd.1.512000.32x32.jpeg