Yes, but it's not a one-time pad.
What you are describing is a STREAM CIPHER and has none of the
properties that make an OTP an OTP.
Tom
No. If you only care about secure communications, not about the OTP,
then yes.
--
Kristian Gjųsteen
That would only work for a OTP of 256 bits or fewer. If you extend
the keystream any longer than that, you've got at best a stream cipher,
not a OTP.
--Mike Amling
However, the OP might be interested in knowing that there are several very
lovely algorithms which can take a 256 bit key and are considered very
secure, like AES for example.
--
LTP
:)
That is NOT a one time pad. That may be a steam cypher, but it is not a one
time pad. It does not have the features of a one time pad (ie proveable
security) but has almost all of its disavantages (NeverNeverNever reuse the
stream)-- except of course that much less data need be exchanged.
>Thanks
Sure it does. It also can only be used One Time (so it shares the OT part).
Which of course shares NONE of the features of a one time pad. It is not
even a stream cypher, although of course a stream cypher can be made using
AES. A better example might be RC4, which is a stream cypher and seems to
be very secure (if properly used).
I'm afraid I'm going to continue to disagree with
this statement every time you make it. The
keystream biases in RC4 cannot be avoided and are
too severe to tolerate in this day and age. I and
others have posted examples before, they should be
easily findable.
Greg.
--
Greg Rose
232B EC8F 44C6 C853 D68F E107 E6BF CD2F 1081 A37C
Qualcomm Australia: http://www.qualcomm.com.au
>In article <eauflh$43n$3...@nntp.itservices.ubc.ca>,
>Unruh <unruh...@physics.ubc.ca> wrote:
>>... A better example might be RC4, which is a stream cypher and seems to
>>be very secure (if properly used).
>I'm afraid I'm going to continue to disagree with
>this statement every time you make it. The
>keystream biases in RC4 cannot be avoided and are
>too severe to tolerate in this day and age. I and
>others have posted examples before, they should be
>easily findable.
Oh come on. AFAIK you have never demonstrated that those biases ( which require
gigbytes of data to even see) can be used to launch any kind of an attack
on an encrypted stream. While in general one would rather not see biases in
a stream, one KNOWS that they are there anyway, since the whole thing is
generated from a very very small key and by a very simply algorithm. Ie,
ALL stream cyphers are by definition biased extremely strongly for some
correleation function, and have very very very low
entropy. The question is whether or not that low entropy can be used to
attack the cypher. Demonstrate that and one might worry.
I'm with Greg Rose on this. Calling RC4 "very secure" feels like a
bit of an overstatement to me, given the known biases in RC4's output.
Whether you think the weaknesses in RC4 are theoretical or not, "very
secure" seems like a case of grade inflation. (What would you call AES?
"Super duper ultra secure"?)
Whether or not the cipher can be practically broken or not with the
stated bias is kind of irrelevant.
There's a maxim that attacks only get better, they never get worse. On
the basis of this, why use a cipher that has a discovered bias with
potentially practical ramifications? It's not like there aren't a
myraid of alternatives that don't have such a strong bias.
The bias also acts as a signature for data encrypted with that cipher.
In situation where the attacker doesn't even know what cipher is being
used he can determine that the encryption method is RC4, provided
enough gigabytes are recorded.
This in itself is a perfectly good reason not use RC4.
RC4 is not a standard encryption algorithm of any country that I know
of. Its name is also protected by trademark. That's a good reason not
to use it.
RC4 is designed by the "introduce enough muddle to confuse the
attacker" doctrine of cipher design. It is not designed with resistance
to any specific attacks in mind. The lack of science in its
construction is a good reason not to use it.
There is simply no justifyable reason to use RC4. In every possible
deployment scenario, AES in CTR is a better choice than RC4.
The only reason to use RC4 is its simplicity but even this is a cheap
argument as there is an AES implementation in practically every
language you can write code in.
Simon.
>> Oh come on. AFAIK you have never demonstrated that those biases ( which require
>> gigbytes of data to even see) can be used to launch any kind of an attack
>> on an encrypted stream. While in general one would rather not see biases in
>> a stream, one KNOWS that they are there anyway, since the whole thing is
>> generated from a very very small key and by a very simply algorithm. Ie,
>> ALL stream cyphers are by definition biased extremely strongly for some
>> correleation function, and have very very very low
>> entropy. The question is whether or not that low entropy can be used to
>> attack the cypher. Demonstrate that and one might worry.
>Whether or not the cipher can be practically broken or not with the
>stated bias is kind of irrelevant.
Of course it is not irrelevant. As I said ALL cyphers have biases. There
exists a 600 bit correleation function that is unity. For any stream
cypher, The 300 bits pattern in position n is fillowed by a
unique 300 bit bit pattern. There is perfect correlation of the 300 bit bit
pattern in position n with the next 300 bits. Knowning this is of no help
in breaking the cypher. Ie, whether or not the cypher can be broken with
the given correlations is the whole question of cypher use and design.
>There's a maxim that attacks only get better, they never get worse. On
There IS no attack. There is an observation of a bias if gigabytes are
used.
>the basis of this, why use a cipher that has a discovered bias with
>potentially practical ramifications? It's not like there aren't a
>myraid of alternatives that don't have such a strong bias.
Tell me what the potentially practical ramifications are? I am not saying
that RC4 is the only secure (AFAWK) stream cypher. It is certainly the best
known and is also probably the fastest known.
>The bias also acts as a signature for data encrypted with that cipher.
>In situation where the attacker doesn't even know what cipher is being
>used he can determine that the encryption method is RC4, provided
>enough gigabytes are recorded.
Whether or not the bias can be seen when the cypher is used for encryption
is unclear. It can be seen if you are encrypting GB of the same bit (eg all
GB of bits are zero). But if the data is semi random, you will not see the
biases. And whether or not you are using RC4 is by all assumptions known to
the attacker. It is the most obvious of the "security by obscurity" if you
think you can hide it from an attacker.
>This in itself is a perfectly good reason not use RC4.
>RC4 is not a standard encryption algorithm of any country that I know
>of. Its name is also protected by trademark. That's a good reason not
>to use it.
So don't use it.
>RC4 is designed by the "introduce enough muddle to confuse the
>attacker" doctrine of cipher design. It is not designed with resistance
>to any specific attacks in mind. The lack of science in its
>construction is a good reason not to use it.
Yes, and it has proven to be amazingly secure despite that. Not sure what
the science of attacking stream cyphers you would have it be designed
against.
>There is simply no justifyable reason to use RC4. In every possible
>deployment scenario, AES in CTR is a better choice than RC4.
Way slower AFAIK.
>The only reason to use RC4 is its simplicity but even this is a cheap
>argument as there is an AES implementation in practically every
>language you can write code in.
???
If you like using AES in CTR mode go ahead. This does not address the
question which started this argument-- is RC4 secure AFAIK?
There's an observed defect. We can detect AES from random after 2^64
blocks or 64 exabytes. In comparison we can tell RC4 from random after
only 32 *MEGABYTES*. [1]
> Tell me what the potentially practical ramifications are? I am not saying
> that RC4 is the only secure (AFAWK) stream cypher. It is certainly the best
> known and is also probably the fastest known.
It's not the fastest. SEAL batters it. [2]
>
> >The bias also acts as a signature for data encrypted with that cipher.
> >In situation where the attacker doesn't even know what cipher is being
> >used he can determine that the encryption method is RC4, provided
> >enough gigabytes are recorded.
>
> Whether or not the bias can be seen when the cypher is used for encryption
> is unclear. It can be seen if you are encrypting GB of the same bit (eg all
> GB of bits are zero).
Kirckoff's maxim dictates that all the security should reside in the
secrecy of the key. Claiming that it is secure provided that you choose
your plain-texts wisely is a direct violation of this maxim.
>But if the data is semi random, you will not see the
> biases. And whether or not you are using RC4 is by all assumptions known to
> the attacker. It is the most obvious of the "security by obscurity" if you
> think you can hide it from an attacker.
This is nonsense. First off, if I was encrypting many megabytes of
English text then it would not take all that many megabytes to be able
to see the bias. The entropy of English text is 1.3 bits per bytes,
after all.
Secondly, you're correct in saying that in our *threat model* we MUST
assume the attacker knows the details of the cipher, protocol, hardware
used etc and the construction must remain secure even with such
detailed knowledge.
In reality, the attacker may not know some of these details. I'd prefer
not to communicate anything to such an attacker if I possible can.
I consider the bias to be a serious weakness in this sense.
>
> >This in itself is a perfectly good reason not use RC4.
>
> >RC4 is not a standard encryption algorithm of any country that I know
> >of. Its name is also protected by trademark. That's a good reason not
> >to use it.
>
> So don't use it.
Well, the question I have is why recommend a cipher to somebody that
isn't a standard?
> Yes, and it has proven to be amazingly secure despite that. Not sure what
> the science of attacking stream cyphers you would have it be designed
> against.
RC4 is not "amazingly secure" - it is misleading to say so.
It turns out you can distinguish RC4 from random with only *32
megabytes* of cipher-text. Not the gigabytes you can claim. This attack
doesn't disappear even if you dump the first 256 outputs. This is an
absolutely shocking result. [1]
In light of the evidence, my view seems more credible than yours. If
anything, the lack of a scientific thought when constructing RC4 has
lead to these type of attacks. The rate things are going I expect there
to be a full break eventually.
If you want a stream cipher that deserves such an accolade I would put
forward BBS for the title. It has a concrete proof that it reduces to
factoring; a classical mathematics problem that has foxed the great
minds of antiquity and the modern era alike.
I think it is a beautiful cipher, even if it is too slow for most
purposes.
> >There is simply no justifyable reason to use RC4. In every possible
> >deployment scenario, AES in CTR is a better choice than RC4.
>
> Way slower AFAIK.
It's about half as fast with the figures I found on a Google search.
None too shabby. [2]
> If you like using AES in CTR mode go ahead. This does not address the
> question which started this argument-- is RC4 secure AFAIK?
I don't think it's wise to recommend RC4 to anyone these days. You are
of course free to use what you like however if I wanted to keep my data
secret, I'd use AES and I'd recommend others do the same.
Simon.
[1] - http://www.cosic.esat.kuleuven.be/publications/article-40.pdf
[2] - http://www.eskimo.com/~weidai/benchmarks.html
???? What has this to do with what I said? You still have not said how
those biases will be used to break the cypher, whether they are recognized
or not. I am NOT claiming that security resides in the selection of the
plaintext. That would be silly. I am saying that the recognition that the
stream cypher has biases is made much more difficult by the plaintext it is
xored against. Clearly if RC4 is xored against a truely random stream the
biases are completely invisible. If it is xored against a stream of 0, they
are completely visible. NOrmal plain text falls somewhere between but I
suspect closer to the "random" case.
>>But if the data is semi random, you will not see the
>> biases. And whether or not you are using RC4 is by all assumptions known to
>> the attacker. It is the most obvious of the "security by obscurity" if you
>> think you can hide it from an attacker.
>This is nonsense. First off, if I was encrypting many megabytes of
>English text then it would not take all that many megabytes to be able
>to see the bias. The entropy of English text is 1.3 bits per bytes,
>after all.
No it is not. And furthermore, you need to demonstrate that that 1.3bits
per byte is not sufficient to hide the biases. Those biases are about 1 bit
per 10^7 bytes of lessened entropy and 1.3 bits per byte would be expected
to hide that completely.
>Secondly, you're correct in saying that in our *threat model* we MUST
>assume the attacker knows the details of the cipher, protocol, hardware
>used etc and the construction must remain secure even with such
>detailed knowledge.
>In reality, the attacker may not know some of these details. I'd prefer
>not to communicate anything to such an attacker if I possible can.
>I consider the bias to be a serious weakness in this sense.
Fine, I have yet to see a convincing argument that it is any worse than the
known biases of all cyphers in having only 256 bits (keylength) of
randomness.
>>
>> >This in itself is a perfectly good reason not use RC4.
>>
>> >RC4 is not a standard encryption algorithm of any country that I know
>> >of. Its name is also protected by trademark. That's a good reason not
>> >to use it.
>>
>> So don't use it.
>Well, the question I have is why recommend a cipher to somebody that
>isn't a standard?
?? I suggested it as a possible example of stream cypher. It is also a
widely ( the most widely) used stream cypher.
>> Yes, and it has proven to be amazingly secure despite that. Not sure what
>> the science of attacking stream cyphers you would have it be designed
>> against.
>RC4 is not "amazingly secure" - it is misleading to say so.
Tell me how to break it.
>It turns out you can distinguish RC4 from random with only *32
>megabytes* of cipher-text. Not the gigabytes you can claim. This attack
>doesn't disappear even if you dump the first 256 outputs. This is an
>absolutely shocking result. [1]
So what? Show me how to use that to attack the cypher. If I reveal one bit
every 10^8 bits, it is not going to cause me or anyone to loose sleep. In
fact I know that my attacker Knows far more about the message than that
already. (eg, that the word "the" appears in the message with overwhelming
probability). The question is how that knowledge helps him to break the
cypher. That is an issue you have absolutely refused to address.
If I had to choose an unbiased stream cypher over a biased one where both
had equal amounts of study without a break. Sure. Why not. But would using
the latter cause me to lose sleep about the security? No. Not until I see
even the ghost of a suggestions as to how that bias could be used to attack
the cypher.
>In light of the evidence, my view seems more credible than yours. If
>anything, the lack of a scientific thought when constructing RC4 has
>lead to these type of attacks. The rate things are going I expect there
>to be a full break eventually.
>If you want a stream cipher that deserves such an accolade I would put
>forward BBS for the title. It has a concrete proof that it reduces to
>factoring; a classical mathematics problem that has foxed the great
>minds of antiquity and the modern era alike.
>I think it is a beautiful cipher, even if it is too slow for most
>purposes.
Since speed is one of the essential attributes of a cypher expected to
enctrypte 32MB of data, this is a pretty damming admission. After all a OTP
is by far the best cypher around. It has the drawback of speed, especially
speed of key exchange. Why do you use any Stream cypher at all in
comparison with the provable security of the OTP? Because provable security
is NOT the most important attribute of a cypher.
>> >There is simply no justifyable reason to use RC4. In every possible
>> >deployment scenario, AES in CTR is a better choice than RC4.
>>
>> Way slower AFAIK.
>It's about half as fast with the figures I found on a Google search.
>None too shabby. [2]
>> If you like using AES in CTR mode go ahead. This does not address the
>> question which started this argument-- is RC4 secure AFAIK?
>I don't think it's wise to recommend RC4 to anyone these days. You are
>of course free to use what you like however if I wanted to keep my data
>secret, I'd use AES and I'd recommend others do the same.
Fine. And it is certainly worthwhile pointing that out to the OP. However,
the diatribes against RC4 which this has produced are not, I believe,
justified at the present. Maybe you are right that the biases in RC4 are an
indication of an eventual break. But there is no evidence of that as yet,
and it is just as likely that any other cypher, including AES will be
discovered to be broken in the future. What about the attacks against
reduced round AES? Do they not worry you more than the biases in RC4?
In fact virtually all of the E*stream candidates
beat RC4 by a large margin, as well as being more
secure, as well as having an interface that allows
nonces.
<>>
<>> >The bias also acts as a signature for data encrypted with that cipher.
<>> >In situation where the attacker doesn't even know what cipher is being
<>> >used he can determine that the encryption method is RC4, provided
<>> >enough gigabytes are recorded.
<>>
<>> Whether or not the bias can be seen when the cypher is used for encryption
<>> is unclear. It can be seen if you are encrypting GB of the same bit (eg all
<>> GB of bits are zero).
<
<>Kirckoff's maxim dictates that all the security should reside in the
<>secrecy of the key. Claiming that it is secure provided that you choose
<>your plain-texts wisely is a direct violation of this maxim.
<
<???? What has this to do with what I said? You still have not said how
<those biases will be used to break the cypher, whether they are recognized
<or not. I am NOT claiming that security resides in the selection of the
<plaintext. That would be silly. I am saying that the recognition that the
<stream cypher has biases is made much more difficult by the plaintext it is
<xored against. Clearly if RC4 is xored against a truely random stream the
<biases are completely invisible. If it is xored against a stream of 0, they
<are completely visible. NOrmal plain text falls somewhere between but I
<suspect closer to the "random" case.
So, I have in the past given an example that
convinced me that this bias was too much. Here it
is again:
You break into a cave in Afghanistan and find a
laptop (I used to say "big server", but laptops
now have sufficiently large disks). It has 160GB
of encrypted files on it. The RC4 bias is
sufficiently large that it is possible for you to
tell the difference between Arabic and Hebrew
character sets in the encrypted data.
This, to me, is more information than you should
be able to get from the encrypted data. No other
reputable algorithm would suffer this problem.
<
<>>But if the data is semi random, you will not see the
<>> biases. And whether or not you are using RC4 is by all assumptions known to
<>> the attacker. It is the most obvious of the "security by obscurity" if you
<>> think you can hide it from an attacker.
<
<>This is nonsense. First off, if I was encrypting many megabytes of
<>English text then it would not take all that many megabytes to be able
<>to see the bias. The entropy of English text is 1.3 bits per bytes,
<>after all.
<
<No it is not. And furthermore, you need to demonstrate that that 1.3bits
<per byte is not sufficient to hide the biases. Those biases are about 1 bit
<per 10^7 bytes of lessened entropy and 1.3 bits per byte would be expected
<to hide that completely.
Wrong. The bias is reliably *detectable* in 10^7
bytes of output. That means the bias itself is of
the order of 10^-3.5, or about .0003. Which it
is.
<>Secondly, you're correct in saying that in our *threat model* we MUST
<>assume the attacker knows the details of the cipher, protocol, hardware
<>used etc and the construction must remain secure even with such
<>detailed knowledge.
<
<>In reality, the attacker may not know some of these details. I'd prefer
<>not to communicate anything to such an attacker if I possible can.
<
<>I consider the bias to be a serious weakness in this sense.
<
<Fine, I have yet to see a convincing argument that it is any worse than the
<known biases of all cyphers in having only 256 bits (keylength) of
<randomness.
I tried to follow your argument before and failed
utterly. The RC4 bias is there and detectable no
matter what key you use. But for all other
reputable ciphers, there are only two ways to
detect what you are calling "known biases": either
enumerate all the keys until you find the right
one, or generate so much keystream that you have
covered all possible states. They're both variants
of brute force.
<>> >This in itself is a perfectly good reason not use RC4.
<>>
<>> >RC4 is not a standard encryption algorithm of any country that I know
<>> >of. Its name is also protected by trademark. That's a good reason not
<>> >to use it.
<>>
<>> So don't use it.
<
<>Well, the question I have is why recommend a cipher to somebody that
<>isn't a standard?
<
<?? I suggested it as a possible example of stream cypher. It is also a
<widely ( the most widely) used stream cypher.
<
<
<>> Yes, and it has proven to be amazingly secure despite that. Not sure what
<>> the science of attacking stream cyphers you would have it be designed
<>> against.
<
<>RC4 is not "amazingly secure" - it is misleading to say so.
<
<Tell me how to break it.
<
<>It turns out you can distinguish RC4 from random with only *32
<>megabytes* of cipher-text. Not the gigabytes you can claim. This attack
<>doesn't disappear even if you dump the first 256 outputs. This is an
<>absolutely shocking result. [1]
<
<So what? Show me how to use that to attack the cypher. If I reveal one bit
<every 10^8 bits, it is not going to cause me or anyone to loose sleep. In
<fact I know that my attacker Knows far more about the message than that
<already. (eg, that the word "the" appears in the message with overwhelming
<probability). The question is how that knowledge helps him to break the
<cypher. That is an issue you have absolutely refused to address.
No, we (and most cryptographers working on stream
ciphers) already consider it to be broken merely
because it is easily distinguishable from random.
You're the one who's out of step here... insisting
that it isn't broken unless we can recover
plaintext in bulk amounts.
[rest snipped.]
Great. Hard to see "large margin" since RC4 uses about 6 simple commands
and I do not understand how one could make a cypher with one command or
even two but I will believe you.
This I do not understand. a) I do not think that there is 160GB of Arabic
written in the world. Most big files are binary-- pictures, etc, and
certainly not text Text files tend to be tiny. And they had sure as hell
better have encrypted all those files with different keys. Are you claiming
that 10000 files all encrypted with different keys, whose size adds up to
say 160GB, have biases? I believe that the biases have been found in a
SINGLE stream with a single key.
And I also have no idea by what means you used the biases to detect that it
was arabic rathr than Hebrew (that being the most likely laptop to be found
in a cave in Afghanistan I am sure).
I would be interested to see your argument that even with one 160GB file
encrypted with RC4 you could see the difference.
>This, to me, is more information than you should
>be able to get from the encrypted data. No other
>reputable algorithm would suffer this problem.
Two bits of data? Hmm, I think I could have gotten that without running any
tests.
><
><>>But if the data is semi random, you will not see the
><>> biases. And whether or not you are using RC4 is by all assumptions known to
><>> the attacker. It is the most obvious of the "security by obscurity" if you
><>> think you can hide it from an attacker.
><
><>This is nonsense. First off, if I was encrypting many megabytes of
><>English text then it would not take all that many megabytes to be able
><>to see the bias. The entropy of English text is 1.3 bits per bytes,
><>after all.
><
><No it is not. And furthermore, you need to demonstrate that that 1.3bits
><per byte is not sufficient to hide the biases. Those biases are about 1 bit
><per 10^7 bytes of lessened entropy and 1.3 bits per byte would be expected
><to hide that completely.
>Wrong. The bias is reliably *detectable* in 10^7
>bytes of output. That means the bias itself is of
>the order of 10^-3.5, or about .0003. Which it
>is.
OK, I still state that the entropy in English text vastly vastly overwhelms
this and would make the bias undetectable in English text.
><>Secondly, you're correct in saying that in our *threat model* we MUST
><>assume the attacker knows the details of the cipher, protocol, hardware
><>used etc and the construction must remain secure even with such
><>detailed knowledge.
><
><>In reality, the attacker may not know some of these details. I'd prefer
><>not to communicate anything to such an attacker if I possible can.
><
><>I consider the bias to be a serious weakness in this sense.
><
><Fine, I have yet to see a convincing argument that it is any worse than the
><known biases of all cyphers in having only 256 bits (keylength) of
><randomness.
>I tried to follow your argument before and failed
>utterly. The RC4 bias is there and detectable no
>matter what key you use. But for all other
>reputable ciphers, there are only two ways to
>detect what you are calling "known biases": either
>enumerate all the keys until you find the right
>one, or generate so much keystream that you have
>covered all possible states. They're both variants
>of brute force.
All cyphers are known to have very very low entropy. It is not
hypothesised, there is no need to measure it, it is there. Yet as far as we
know the only way to make use of it in breaking the cypher is exhaustive
search. A completely provable weakness is in practice irrelevant. You claim
a bias which decreases the entropy per bit by about 10^-4 or -5. Ie
instead of a raw entropy in a 30MB stream of 10^8.5 it has 10^8.499. Now,
how are you going to use that to break the cypher? I hand you a 1000 page
book encrypted with RC4 ( that is about 1MB of text). How will you use that
bias to break the cypher, and tell me the contents of the book. I would
suggest that again exhaustive search is the only way.
It isn't Just as MD5 is not broken for preimage collision resistance (Mind
you MD5 is a lot shakier than is RC4, and I would not suggest its use for
any cryptographic uses.)
Cryptographers might feel that they can do better than RC4, and they
probably can (although I would like to see 10 years high profile use
without a break first).
Can the contents of a file encrypted with RC4 be read by someone without
access to the key?
The answer is AFAIK no, and is nowhere near being in danger of being so
read.
Can cryptographers do better? Probably yes. Have they done better
already--maybe. But that does not make RC4 insecure.
Anyway, I think that we have all made our points many times over.
Well, you believe wrong, and clearly don't
understand why the biases exist, or how they can
be detected.
>And I also have no idea by what means you used the biases to detect that it
>was arabic rathr than Hebrew (that being the most likely laptop to be found
>in a cave in Afghanistan I am sure).
Different Unicode character sets.
>I would be interested to see your argument that even with one 160GB file
>encrypted with RC4 you could see the difference.
RC4 is biased towards producing two zero bytes in
a row more frequently than it should (among other
biases). So, keep counts of what ciphertext looks
like Hebrew Unicode, and what looks like Arabic
Unicode, and after enough gigabytes one will start
to win pretty clearly.
>OK, I still state that the entropy in English text vastly vastly overwhelms
>this and would make the bias undetectable in English text.
And you're flatly wrong. For one thing, English
text all fits into seven bits, and the bias is
easily detectable in the eighth bit alone.
>Anyway, I think that we have all made our points many times over.
Indeed.
>In article <eb3otq$p7g$1...@nntp.itservices.ubc.ca>,
>Unruh <unruh...@physics.ubc.ca> wrote:
>>>You break into a cave in Afghanistan and find a
>>>laptop (I used to say "big server", but laptops
>>>now have sufficiently large disks). It has 160GB
>>>of encrypted files on it. The RC4 bias is
>>>sufficiently large that it is possible for you to
>>>tell the difference between Arabic and Hebrew
>>>character sets in the encrypted data.
>>
>>This I do not understand. a) I do not think that there is 160GB of Arabic
>>written in the world. Most big files are binary-- pictures, etc, and
>>certainly not text Text files tend to be tiny. And they had sure as hell
>>better have encrypted all those files with different keys. Are you claiming
>>that 10000 files all encrypted with different keys, whose size adds up to
>>say 160GB, have biases? I believe that the biases have been found in a
>>SINGLE stream with a single key.
>Well, you believe wrong, and clearly don't
>understand why the biases exist, or how they can
>be detected.
>>And I also have no idea by what means you used the biases to detect that it
>>was arabic rathr than Hebrew (that being the most likely laptop to be found
>>in a cave in Afghanistan I am sure).
>Different Unicode character sets.
The question is whether or not that difference in Unicode character sets
can be distinguished via that bias. Lets take your English example below,
and assume that no compression was used (under your philosophy that the
a good crypto sysem should hide everything). It is of course not true that
English could be distinguished since most of the european langauges share
that characteristic-- that the 8th bit is 0 in the usual Ascii encoding--
but even then it would depend on the biases in the 8th bit. That the bytes
have biases does not mean that the 8th bit has biases. But it is also true
that in English the 8th bit has zero entropy per bit. So the lessened
entropy in the stream can be detected. However, the fact that Arabic and
hebrew use different unicodes does not to mean mean that I can detect that
difference. There is still the fact that each of the characters has a large
entropy itself (compared to the small deficiecy of entropy in the stream).
So the question is, will that large entropy per byte ( lets me give you your
1.3 bits per byte-- which is however highly variable across texts) hide
that small lack of entropy per byte which you claim for RC4. You claim it
will not. I am having a hard time seeing why it will not.
Note that AES, your preference, is severely broken from your standpoint
since any block cypher reveals repeats in the 64 bit blocks within the
text. There is the nice Wikipedia example of encrypting the Tux penguin.
So for you AES is completely broken and should not be used? I suspect you
would say -- no simply use one of the chaining modes. And I would say
"precompress the files" to use with RC4. Ie, why do you call RC4 broken due
to a bias which is easily overcome, when for AES you accept the weakness
which can be easily overcome by a different mode of operation?
>>I would be interested to see your argument that even with one 160GB file
>>encrypted with RC4 you could see the difference.
>RC4 is biased towards producing two zero bytes in
>a row more frequently than it should (among other
>biases). So, keep counts of what ciphertext looks
>like Hebrew Unicode, and what looks like Arabic
>Unicode, and after enough gigabytes one will start
>to win pretty clearly.
I do not know what "keep count of what ciphertest looks like Hebrew
Unicode" means. You are presumably measuring some correlation function
which is biased in Hebrew Unicode, but that bias is small per byte (ie,
therer is still a large randomness in it and randomness in that
correlation) and that randomness will hide the biases in the RC4 stream.
Ie, if that 160 GB were known to be exactly the same 1000 byte Hebrew or
ARabic text repeated 10^8 times, then I might be able to believe that you
could distinguish them, but not encryption of 160GB of "random" text.
Don't be silly. No one recommends use of ECB mode. If a user
is ignorant enough to use AES-ECB, the problem isn't with AES; the
problem is with ECB mode.
>I suspect you
>would say -- no simply use one of the chaining modes. And I would say
>"precompress the files" to use with RC4.
Apples and oranges. General-purpose encryption algorithms have to be
able to encrypt any kind of input safely. AES-CBC meets that test.
RC4 does not. (Neither does AES-ECB.)
And precompression often isn't an option, in many real-world settings.
I think Greg Rose has made some pretty convincing arguments.
And I still think calling RC4 "very secure" is an overstatement;
RC4 seems borderline at best.
?? What is the relationship between "preimage collision resistance"
and the industry-standard terms "collision resistance", "(first)
preimage resistance" and "second preimage resistance"? Are you trying to
confuse your readers?
> Cryptographers might feel that they can do better than RC4, and they
> probably can (although I would like to see 10 years high profile use
> without a break first).
>
> Can the contents of a file encrypted with RC4 be read by someone without
> access to the key?
Which is riskier, ARC4 or AES-CTR? I think Greg Rose is asking "Why
take chances?" and I wonder if you actually have an answer to that question.
Nothing personal, Dr. Unruh. Cryptographers operate in an atmosphere
of fear, and under those conditions people just naturally lash out at
those who refuse to conform. :)
--Mike Amling
You are conflating two different examples that I
gave.
>and assume that no compression was used (under your philosophy that the
>a good crypto sysem should hide everything). It is of course not true that
>English could be distinguished since most of the european langauges share
>that characteristic-- that the 8th bit is 0 in the usual Ascii encoding--
>but even then it would depend on the biases in the 8th bit. That the bytes
>have biases does not mean that the 8th bit has biases. But it is also true
>that in English the 8th bit has zero entropy per bit. So the lessened
>entropy in the stream can be detected. However, the fact that Arabic and
>hebrew use different unicodes does not to mean mean that I can detect that
>difference. There is still the fact that each of the characters has a large
>entropy itself (compared to the small deficiecy of entropy in the stream).
>So the question is, will that large entropy per byte ( lets me give you your
>1.3 bits per byte-- which is however highly variable across texts) hide
>that small lack of entropy per byte which you claim for RC4. You claim it
>will not. I am having a hard time seeing why it will not.
Hey, I wasn't the one who said 1.3 bits per byte
(although who am I to argue with Claude Shannon?).
But later in my posting I did tell you how to
distinguish.
>Note that AES, your preference, is severely broken from your standpoint
Again, you are confusing me with someone else.
>since any block cypher reveals repeats in the 64 bit blocks within the
AES has 128-bit blocks, and it only shows these
repeats when used in ECB mode, which no-one
would use in practice.
>text. There is the nice Wikipedia example of encrypting the Tux penguin.
>So for you AES is completely broken and should not be used? I suspect you
>would say -- no simply use one of the chaining modes. And I would say
>"precompress the files" to use with RC4. Ie, why do you call RC4 broken due
>to a bias which is easily overcome, when for AES you accept the weakness
>which can be easily overcome by a different mode of operation?
Avoiding using ECB mode is simply "good crypto
hygiene" (as is not reusing the output of a stream
cipher). So, used properly, AES alone suffices.
(Note; it still wasn't me who was advocating using
AES.)
One of your arguments *for* RC4 was its simplicity
and efficiency. But to use it properly, you're
saying that we also need to include a compression
library, and to make up for its lack of support
for a nonce, we have to include a hash function
too (Rivest's recommendation)? Neither so simple
nor so efficient any more.
>>>I would be interested to see your argument that even with one 160GB file
>>>encrypted with RC4 you could see the difference.
>
>>RC4 is biased towards producing two zero bytes in
>>a row more frequently than it should (among other
>>biases). So, keep counts of what ciphertext looks
>>like Hebrew Unicode, and what looks like Arabic
>>Unicode, and after enough gigabytes one will start
>>to win pretty clearly.
>
>I do not know what "keep count of what ciphertest looks like Hebrew
>Unicode" means. You are presumably measuring some correlation function
>which is biased in Hebrew Unicode, but that bias is small per byte (ie,
>therer is still a large randomness in it and randomness in that
>correlation) and that randomness will hide the biases in the RC4 stream.
>Ie, if that 160 GB were known to be exactly the same 1000 byte Hebrew or
>ARabic text repeated 10^8 times, then I might be able to believe that you
>could distinguish them, but not encryption of 160GB of "random" text.
I'm not sure whether or not you're being
deliberately obtuse, so I'll give you the benefit
of the doubt and spell it out. Two-byte Unicode
characters can be considered as if the first byte
specifies the page of a book of characters, and
the second byte specifies the character within the
page. Hebrew and Arabic are on different pages,
and (IIRC) not all of the character spaces are
filled in. So, for every pair of bytes of the
ciphertext, there is a slightly higher than random
chance that they were actually enciphered by
XORing with two consecutive zero bytes... that is,
they are actually plaintext. So, for each pair of
ciphertext bytes, classify them into one of three
categories:
1. looks like valid Arabic character
2. looks like valid Hebrew character
3. everything else.
Category 3 will swamp the other two categories,
but with enough ciphertext, and assuming the
plaintext was indeed one of the two character
sets, either category 1 or 2 will start to
dominate more than can be explained by random
chance. As soon as the probability of their
deviance becomes statistically significant to,
say, 99.9%, you have your answer.
I'm not going to elucidate this any more. Go read
Fluhrer & McGrew's paper to understand the biases,
and think about Unicode; there's no rocket science
here.
Alright, I will alter that to "secure" from "very secure".
I have a hard time seeing how it is borderline. There is absolutely no
indication that breaking it is better than exhaustive search. The bias
seems to me a pretty weak hook to hang "borderline" onto.
> ?? What is the relationship between "preimage collision resistance"
>and the industry-standard terms "collision resistance", "(first)
>preimage resistance" and "second preimage resistance"? Are you trying to
>confuse your readers?
preimage resistance is restistance to finding a collision with a given
hash. collision resistance is resistance to finding any collision ( which
MD5 has miserably failed at).
Both are collisions.
>> Cryptographers might feel that they can do better than RC4, and they
>> probably can (although I would like to see 10 years high profile use
>> without a break first).
>>
>> Can the contents of a file encrypted with RC4 be read by someone without
>> access to the key?
> Which is riskier, ARC4 or AES-CTR? I think Greg Rose is asking "Why
>take chances?" and I wonder if you actually have an answer to that question.
That is what people who double and triple encrypt also say. That is what
people who use 10,000 bit keys for RSA also say. And they also get dumped
on from the other side.
Again is there any indication at all that the contents of a file encrypted
with RC4 can be read by someone without access to the key?
In the case of RC4 vs AES-CTR, one reason is because the latter is
much slower. And speed is a critical feature with stream
cyphers, since they tend to be used to encrypt huge amounts of data in
speed critical situations ( eg sending stuff over the web).
Of course if the OP wants to encrypt 100 bytes, then probably AES is much
faster.
Besides this whole discussion began because I suggested to the OP that RC4
was an EXAMPLE of a very secure stream cypher, in contrast to the very
insecure one he was advocating. It is that adjective "very" that has caused
all of the argy-bargy.
> Nothing personal, Dr. Unruh. Cryptographers operate in an atmosphere
>of fear, and under those conditions people just naturally lash out at
>those who refuse to conform. :)
I am certainly not taking it personally.
I am just non-plussed by the vehemence that that adjective caused.
>--Mike Amling
Not true. If I hash some secret message x and give you the result
y = h(x) (but not x), and then you can find a preimage of y, then you
have definitely broken one-wayness (i.e., first preimage resistance),
but you might not have found a collision (i.e., if you found x itself,
rather than some other x').
>In article <eb54av$955$1...@nntp.itservices.ubc.ca>,
>Bill Unruh <un...@physics.ubc.ca> wrote:
>>g...@qualcomm.com (Gregory G Rose) writes:
>>
>>>In article <eb3otq$p7g$1...@nntp.itservices.ubc.ca>,
>>>Unruh <unruh...@physics.ubc.ca> wrote:
>[snipped, including some unattributed quoting]
>>
>>>>And I also have no idea by what means you used the biases to detect that it
>>>>was arabic rathr than Hebrew (that being the most likely laptop to be found
>>>>in a cave in Afghanistan I am sure).
>>
>>>Different Unicode character sets.
>>
>>The question is whether or not that difference in Unicode character sets
>>can be distinguished via that bias. Lets take your English example below,
>You are conflating two different examples that I
>gave.
Yes, because the English one is more clearcut.
I still have trouble seeing how you can use the bias to tell the difference
between arabic and hebrew.
>>and assume that no compression was used (under your philosophy that the
>>a good crypto sysem should hide everything). It is of course not true that
>>English could be distinguished since most of the european langauges share
>>that characteristic-- that the 8th bit is 0 in the usual Ascii encoding--
>>but even then it would depend on the biases in the 8th bit. That the bytes
>>have biases does not mean that the 8th bit has biases. But it is also true
>>that in English the 8th bit has zero entropy per bit. So the lessened
>>entropy in the stream can be detected. However, the fact that Arabic and
>>hebrew use different unicodes does not to mean mean that I can detect that
>>difference. There is still the fact that each of the characters has a large
>>entropy itself (compared to the small deficiecy of entropy in the stream).
>>So the question is, will that large entropy per byte ( lets me give you your
>>1.3 bits per byte-- which is however highly variable across texts) hide
>>that small lack of entropy per byte which you claim for RC4. You claim it
>>will not. I am having a hard time seeing why it will not.
>Hey, I wasn't the one who said 1.3 bits per byte
>(although who am I to argue with Claude Shannon?).
>But later in my posting I did tell you how to
>distinguish.
You said that the 8th bit is always zero, since ascii is assumed. However,
the 8th bit carries zero entropy, and thus that deficit of 1 bit per 10^3
or 10^4 of entropy in RC4 could perhaps be seen. (Mind you noone has shown
that that 8th bit is biased. Just because the bytes are biased does not
mean that there is a bias in any individual bit). Of course if the text
carries zero entropy, one can see the bias in the stream. But if one
operates on a byte basis, then English text with its 1.3,2.3,... bits per
bytes of entropy will hide the bias in the stream.
>>Note that AES, your preference, is severely broken from your standpoint
>Again, you are confusing me with someone else.
No, I am saying that by the same arguments you use here you would say that
AES is broken because of the ECB weakness.
>>since any block cypher reveals repeats in the 64 bit blocks within the
>AES has 128-bit blocks, and it only shows these
>repeats when used in ECB mode, which no-one
>would use in practice.
So use chaining for RC4.
Ci= Ri^Mi^C(i-8)
or a hundred other techniques one could imagine for hiding the tiny biases.
>>text. There is the nice Wikipedia example of encrypting the Tux penguin.
>>So for you AES is completely broken and should not be used? I suspect you
>>would say -- no simply use one of the chaining modes. And I would say
>>"precompress the files" to use with RC4. Ie, why do you call RC4 broken due
>>to a bias which is easily overcome, when for AES you accept the weakness
>>which can be easily overcome by a different mode of operation?
>Avoiding using ECB mode is simply "good crypto
>hygiene" (as is not reusing the output of a stream
>cipher). So, used properly, AES alone suffices.
>(Note; it still wasn't me who was advocating using
>AES.)
I was not claiming it was.
Thank you.
Well, I don't think you have understood Greg Rose's point yet.
Greg Rose's arguments would say that AES-ECB is broken, not that AES
is broken. And I think that's exactly the right conclusion to draw.
By a similar argument, if you use RC4 in the standard mode of operation
for RC4 (namely, xor-ing its keystream output against the plaintext),
then you get a scheme that has some known security weaknesses.
The consequences of those weaknesses on a real application will depend
on the application context, but there are plausible scenarios where the
known security weaknesses in RC4 could potentially pose a problem.
>So use chaining for RC4.
>Ci= Ri^Mi^C(i-8)
>or a hundred other techniques one could imagine for hiding the tiny biases.
You keep jumping to conclusions. This "chaining" idea doesn't help.
The adversary can undo the effect of the chaining by replacing Ci with
C'i = Ci^C(i-8) (note that then you have C'i = Ri^Mi), and then analyzing
C'i just as he would if there were no "chaining". Cryptosystem design
requires careful thought, not blind trial and error.
>Unruh wrote:
>>So use chaining for RC4.
>>Ci= Ri^Mi^C(i-8)
>>or a hundred other techniques one could imagine for hiding the tiny biases.
>You keep jumping to conclusions. This "chaining" idea doesn't help.
>The adversary can undo the effect of the chaining by replacing Ci with
>C'i = Ci^C(i-8) (note that then you have C'i = Ri^Mi), and then analyzing
>C'i just as he would if there were no "chaining". Cryptosystem design
>requires careful thought, not blind trial and error.
Your right.
> So, for each pair of ciphertext bytes, classify them into one of three
> categories:
> 1. looks like valid Arabic character
> 2. looks like valid Hebrew character
> 3. everything else.
>
> Category 3 will swamp the other two categories, but with enough ciphertext,
> and assuming the plaintext was indeed one of the two character sets, either
> category 1 or 2 will start to dominate more than can be explained by random
> chance. As soon as the probability of their deviance becomes statistically
> significant to, say, 99.9%, you have your answer.
I don't get it.
Even if there were squintillions of 1's, and no 2's at all, how could
you conclude from this, that the plaintext is in Arabic?
AFAICS, you could only conclude that *if* the plaintext *is* in either
Arabic or Hebrew, *then*, it is more likely to be in Arabic than in
Hebrew.
But that's a long way from saying that you can exploit the bias to tell
what language the plaintext is in. What if the plaintext was in German,
and German happens to have lots of 1's and not many 2's?
What am I missing?
TC (MVP MSAccess)
http://tc2.atspace.com
> >> RC4 is biased towards producing two zero bytes in a row
> >> more frequently than it should (among other biases).
Could you avoid that by enciphering twice with different keys?
C = RC4 ( RC4 ( P, K1 ) , K2 )
K1 and K2 could be the first & second half of the key; or the key, and
the byte-reversed key; or any other different values that you could
derive from a single key.
You might say, "Yes, but that's no longer simple RC4". But similarly, a
block cipher used in other-than ECB mode, is no longer a simple
application of that block cipher.
Yes? No?
>But that's a long way from saying that you can exploit the bias to tell
>what language the plaintext is in. What if the plaintext was in German,
>and German happens to have lots of 1's and not many 2's?
>
>What am I missing?
Not much, at the current state of analysis, the bias doesn't give you that
much. You get enough RC4 ciphertext and you can tell that you are probably
looking at RC4 ciphertext, that's it. Since knowing that you have RC4 output
doesn't actually help you with the decryption or even telling what possible
language you have it's an interesting, but essentially useless property. For
one thing you are supposed to be assuming that your attacker already knows
you're using RC4; the only things you can assume are hidden are the secret key
and the plain text.
All I claimed originally was that you'd be able to
distinguish between Hebrew and Arabic... reading
back I guess I could have phrased it better, since
you think I was claiming that one could determine
the language a priori. That wasn't what I meant,
although it's still at least feasible. The bias
toward zero bytes from RC4 would allow you to build up a
frequency histogram that would eventually begin to
resemble the classical English (or German)
frequency tables.
But it's always easier, statistically, to test
between two hypotheses (eg. German vs. not German,
or Hebrew vs. Arabic) than to try to classify the
input.
Hope that helps.
Yes, but that's no longer simple RC4. :-)
>You might say, "Yes, but that's no longer simple RC4". But similarly, a
>block cipher used in other-than ECB mode, is no longer a simple
>application of that block cipher.
I don't agree. Block ciphers, no matter how good,
require the correct mode of operation to be
secure, and that mode of operation isn't any less
efficient than ECB (Counter Mode for example can
be more efficient). ECB isn't "simple
application", it's "incorrect application".
Stream ciphers that don't have RC4's biases don't
need this double encryption to make them secure,
and the double encryption obviously halves the
efficiency. So I don't see these things as being
particularly similar.
Another thing to note is that your construction
doesn't *eliminate* the bias, since the second
encryption's bias will still allow detection of
the first encryption's bias. What it does is
square the bias. So, RC4's major bias is about 2^-16
(from memory... I haven't looked up the paper 'cos
my office is all in boxes) so the bias of
double-rc4 is still about 2^-32; this is, by
today's standards, still way too high.
If you go look at the E-stream archives, you'll
find a couple of ciphers that were either rejected
or tweaked (our NLS among them) because of biases
of about 2^-32. There's no consensus yet on how
much is too much. Many of the ciphers state a
limit of 2^80 (bytes or words) of output, and
consider any bias detectable with less than that
amount of output to be too high. (Detectable means
a bias of about the square root of the amount of
output needed, so bias of 2^-40 in this case.)
Note that this is already higher than the
threshhold for any 128-bit block cipher, where you
can distinguish it from random after 2^64 blocks.
So even double-RC4 would probably be rejected from
E-stream as insecure by today's standards.
I still have difficulty. From the small bias toward zero ( is it a bias
toward zero, or a bias toward "if one is zero then the next will also be
zero), I still find it difficult to believe that the any such frequency
histogram would show anything since the bias would be buried in the natural
variability of the languages.
And the difference between languages lies not so much in the relative
frequencies of the different letters, but in the distribution of digrams
and tridrams in the language. And those distributions are highly dependent
on the texts. They will be different in Shakespeare from in a treatise on
botany. Ie, again I believe that the natural variability of the language
will swamp the biases. In teh case of unicode
Arabic vs Hebrew, the zero entropy in the first code page character would
of course let you distinguish since it has zero entropy. If you know some
feature has zero entropy, then small changes in the bias entropy can
eventually be distinguished.
....
>If you go look at the E-stream archives, you'll
>find a couple of ciphers that were either rejected
>or tweaked (our NLS among them) because of biases
>of about 2^-32. There's no consensus yet on how
>much is too much. Many of the ciphers state a
>limit of 2^80 (bytes or words) of output, and
>consider any bias detectable with less than that
>amount of output to be too high. (Detectable means
>a bias of about the square root of the amount of
>output needed, so bias of 2^-40 in this case.)
>Note that this is already higher than the
>threshhold for any 128-bit block cipher, where you
>can distinguish it from random after 2^64 blocks.
>So even double-RC4 would probably be rejected from
>E-stream as insecure by today's standards.
I am sorry, but this is abusing the word insecure. Ceasar is an insecure
stream cypher. RC4 or thos E-stream cyphers are not insecure. They may have
smaller security that is idea but to lump them in with insecure is
bastardising the language. Is the use of SSL for secure web transactions
which uses RC4 insecure? The answer is almost certainly no. Ie, no-one,
including the NSA, will be able to figure out what the message sent was.
That is what secure means.
By your definition of secure, you are simply encouraging all those people
who argue that one should double, triple or hexatupel encrypt "because you
never know". Security is a matter of balance, not a matter of absolutes.
Especially, it is a matter of spending one's resouces on the weakest not the
strongest link in the chain, and for any encryption system, even one using RC4, the encrytion algorithm is by far the strongest link in the chain (assuming it is properly used).
Your belief (or lack thereof) is irrelevant to a cryptanalysis.
Answering this question requires more than belief; it requires analysis.
I would encourage you to do the calculation.
If you'd like to know how to do the calculation, here is how do to it.
Let D_H denote the distribution of Hebrew bigrams (say), D_A the
distribution of Arabic bigrams (say), and D_RC4 the distribution of
RC4 keystream bigrams (which is biased towards double-zero bytes).
(By a "bigram", I mean something like a "pair of bytes in their UTF16
encoding".) Compute the distribution D_0 by convolving D_H and D_RC4.
D_0 is the distribution of the random value P xor Z, where P is drawn from
D_H and Z is drawn from D_RC4. Similarly, compute the distribution D_1 by
convolving D_A and D_RC4. Now compute the variation difference between
D_0 and D_1. That will enable you to calculate the number of bytes of
ciphertext needed to distinguish between D_0 and D_1, or in other words,
to distinguish RC4-encrypted Hebrew from RC4-encrypted Arabic.
I don't know what answer to expect for the specific example of Hebrew vs
Arabic, but it is clear that there are plausible scenarios where the
resulting distributions have enough remaining bias that one can distinguish
between two hypotheses at the input distribution, given a large (but not
impractically large) amount of ciphertext.
>In teh case of unicode
>Arabic vs Hebrew, the zero entropy in the first code page character would
>of course let you distinguish since it has zero entropy. If you know some
>feature has zero entropy, then small changes in the bias entropy can
>eventually be distinguished.
I am having trouble understanding what position you are trying to take.
Up until here, it sounded to me like you disbelieve that one can
distinguish RC4-encrypted Hebrew from RC4-encrypted Arabic (in Unicode);
but then this sentence pops along and seems to be completely at odds
with that impression.
Have you now conceded that Greg Rose is right and it is indeed possible
to distinguish RC4-encrypted Hebrew from RC4-encrypted Arabic (assuming
Unicode encoding)? If so, will you withdraw your advocacy of RC4?
Yes, I expect this would dramatically reduce the bias
that Greg Rose pointed out.
(Caveat: You must never reuse such a key.)
It would take some analysis to be certain, but I expect this
would suffice.
However, just replacing RC4 with AES would also dramatically reduce
the bias.
Personally, I would recommend just using AES in a reasonable mode of
operation, rather than going to contortions to rescue RC4.
>Unruh wrote:
>>I still find it difficult to believe that [...]
>Your belief (or lack thereof) is irrelevant to a cryptanalysis.
Of course. It is an expression of my state of knowledge, not of the world.
>Answering this question requires more than belief; it requires analysis.
>I would encourage you to do the calculation.
Of course and my statement was expressing doubt that the calculation had
actually been done.
>If you'd like to know how to do the calculation, here is how do to it.
>Let D_H denote the distribution of Hebrew bigrams (say), D_A the
>distribution of Arabic bigrams (say), and D_RC4 the distribution of
>RC4 keystream bigrams (which is biased towards double-zero bytes).
>(By a "bigram", I mean something like a "pair of bytes in their UTF16
>encoding".) Compute the distribution D_0 by convolving D_H and D_RC4.
>D_0 is the distribution of the random value P xor Z, where P is drawn from
>D_H and Z is drawn from D_RC4. Similarly, compute the distribution D_1 by
>convolving D_A and D_RC4. Now compute the variation difference between
>D_0 and D_1. That will enable you to calculate the number of bytes of
>ciphertext needed to distinguish between D_0 and D_1, or in other words,
>to distinguish RC4-encrypted Hebrew from RC4-encrypted Arabic.
Yes. I agree. Has the calculation been done?
Note that what you and Rose continually state is that it is the excess
numbers of double zeros that consitute at least part of the bias. Not that
there are excess zeros.
>I don't know what answer to expect for the specific example of Hebrew vs
>Arabic, but it is clear that there are plausible scenarios where the
>resulting distributions have enough remaining bias that one can distinguish
>between two hypotheses at the input distribution, given a large (but not
>impractically large) amount of ciphertext.
Of course there are. If one text is always aaaaaaaaaa for 160GB and the
other is always bbbbbbb for 160 GB, then the bias in the distribution of
RC4 would become obvious in 30MB. Both of those streams have zero entropy.
The small change in entropy of the stream can thus be measured.
Since the bias was discovered by encryptiong a string of 30MB of 0 bytes,
it is possible to discover the bias in such a zero entropy plaintext
stream.
What I am arguing is that the in the presence of the huge entropy of text,
this small decrease in entropy of the stream is undetectable. NOw, IF every
second byte of the text has zero entropy, and IF there were a bias in the
idividual bytes ( not byte pairs), then cleary one can use that zero
entropy input to discover the stream bias ( and to differentiate a bias
toward 0 byte with one toward 0x5a byte). On the other hand, IF the bias in
RC4 is one of pairs, then the entropy in those pairs in the input would be
expected to hide the entropy in the stream.
>>In teh case of unicode
>>Arabic vs Hebrew, the zero entropy in the first code page character would
>>of course let you distinguish since it has zero entropy. If you know some
>>feature has zero entropy, then small changes in the bias entropy can
>>eventually be distinguished.
>I am having trouble understanding what position you are trying to take.
I am trying to understand. I am not taking a position except trying to
discover and understand the truth.
>Up until here, it sounded to me like you disbelieve that one can
>distinguish RC4-encrypted Hebrew from RC4-encrypted Arabic (in Unicode);
>but then this sentence pops along and seems to be completely at odds
>with that impression.
It indicates a potential way in which I could see that one could
distinguish.
>Have you now conceded that Greg Rose is right and it is indeed possible
No, not yet, but I have offered a way of convincing me.
>to distinguish RC4-encrypted Hebrew from RC4-encrypted Arabic (assuming
>Unicode encoding)? If so, will you withdraw your advocacy of RC4?
No, because that one bit of data-- Hebrew vs Arabic-- is just silly. If you
find a laptop in a cave in Afghanistan, you already know what language is
on that 160GB. Can one regard the revealing of one bit of information as a
breaking of the cypher? No. That is just silly. It is not the ideal state
of affairs, but then it never is. RC4 is still far far far far far stronger
than any other link in the chain of secrecy. It is like saying that RSA
/AES is broken because a 2000 bit RSA is far weaker than the 128 bit AES
against the best attacks.
Of course it is, but it is still far stronger than anything in the chain.
To ask a different question-- does anyone understand what it is about RC4
that leads to the bias?
Not to my knowledge. I certainly agree that until the calculation
is done, we're only guessing.
There are some rules of thumb that make me think that it is not
implausible that Greg Rose's attack will work. Here is one rule
of thumb. Suppose D, D' are two distributions, and let U denote the
uniform distribution. Let D * D' denote the convolution of D and D'
(e.g., the distribution on the random variable Y obtained by sampling X
<- D and X' <- D' and then setting Y = X xor X'). Let Dist(D,U) denote
the statistical variation distance between D and U. Note that
Dist(D,U) is a measure of the bias in D; a large value of Dist(D,U)
means that D is measurably non-uniform, while a small value means
that it will require many samples from D to distinguish D from uniform.
With that background, here are the rules of thumb.
Rule of thumb #1: "Matsui's rule of thumb": When Dist(D,U) is small,
the number of samples that will be needed to distinguish D from U is
~ 1/Dist(D,U)^2.
Rule of thumb #2: "Biases multiply": When Dist(D,U) and Dist(D',U)
are small, Dist(D * D', U) is ~ Dist(D,U) * Dist(D',U).
These are very rough estimates. They are certainly not something that
can be mathematically proven. They are not always true; there are important
exceptions. Nonetheless, I have found them helpful as first approximations.
My wild-eyed guess is that Dist(Arabic,U) and Dist(RC4,U) are large
enough that Dist(Arabic * RC4, U) is detectably larger than zero.
Moreover, my guess is that Dist(Arabic,U) and Dist(Hebrew,U) and
Dist(Arabic,Hebrew) and Dist(RC4,U) are large enough that
Dist(Arabic * RC4, Hebrew * RC4) is detectably larger than zero.
These are only guesses. I haven't actually done the calculations
that would be necessary, so I'm only guessing.
>>I don't know what answer to expect for the specific example of Hebrew vs
>>Arabic, but it is clear that there are plausible scenarios where the
>>resulting distributions have enough remaining bias that one can distinguish
>>between two hypotheses at the input distribution, given a large (but not
>>impractically large) amount of ciphertext.
>
>Of course there are. If one text is always aaaaaaaaaa for 160GB and the
>other is always bbbbbbb for 160 GB, then the bias in the distribution of
>RC4 would become obvious in 30MB.
That's an extreme example. There are also less extreme examples that,
I believe, are still plausible.
>No, because that one bit of data-- Hebrew vs Arabic-- is just silly. If you
>find a laptop in a cave in Afghanistan, you already know what language is
>on that 160GB. Can one regard the revealing of one bit of information as a
>breaking of the cypher? No. That is just silly. It is not the ideal state
>of affairs, but then it never is. RC4 is still far far far far far stronger
>than any other link in the chain of secrecy.
Ok. That makes sense to me. My only concern is that it is hard to
be _sure_ that these one or two bits can't possibly cause any harm.
I have seen enough surprising ways that even a little bit of information
leakage can cause harm that I have become pretty cautious about this.
At the very least, it often requires a lot of hard thinking to convince
yourself that this is harmless in any particular application, and you have
to do that thinking once for each new application you want to use RC4 in.
It's a minor headache, frankly. If there are alternative solutions that
don't have even theoretical weaknesses -- that don't require this kind of
hard thought -- and if those better solutions don't cost anything more,
then I know which one I'm choosing.
I'm a follow of the no-aspirin principle for cryptosystem design: anything
that reduces the amount of hard thinking I have to do to convince myself
that the system is secure -- anything that minimizes the number of these
headaches one has to put up with -- is a good thing, IMHO.
You're right that it doesn't immediately follow; I
hadn't realised that until you questioned it. But
if you read the original Fluhrer & McGrew paper,
it is clear that the second zero is in some sense
an "extra" zero; there are indeed more zeros in
the keystream than there should be and they occur
in somewhat predictable places.
Sheesh, I've mentioned the original paper at least
three times. Google finds it (the paper itself,
not just references) within the first result page.
http://www.mindspring.com/~dmcgrew/rc4-03.pdf
I'm glad you made me look; there are some other
papers there that I hadn't read.
>In article <eb8gie$bie$2...@nntp.itservices.ubc.ca>,
>Unruh <unruh...@physics.ubc.ca> wrote:
>>To ask a different question-- does anyone understand what it is about RC4
>>that leads to the bias?
>Sheesh, I've mentioned the original paper at least
>three times. Google finds it (the paper itself,
>not just references) within the first result page.
>http://www.mindspring.com/~dmcgrew/rc4-03.pdf
I understood that the paper reveals the biases. The question was whether or
not the reason for the biases is understood.
I am reading the paper to see if they explain it.
If the cave is in Lebanon, you certainly might want to distinguish
between the data being in Arabic or being in Persian, and the one bit
could make the difference between getting into a war and not getting
into one. So I'd consider it a break of the cipher.
You have a laptop encoded with RC4 ( who in the world uses a stream cypher
to encode a laptop), and it has 160 GB of data on it (And where do you find
a laptop with a 160GB hard drive. In a cave with no electicity!) and you
are going to go to war based on whether or not the contents are Persian or
arabic. Well the world is a crazy place but I am sure glad that you are not
in charge of anything important.
The cypher is NOT the weakest link in any chain.
Even with the bias.
You'd use it for hard disk encryption, by re-keying it for each
cluster of 8 kbytes or whatever. I've seriously considered this
method since it should be much faster than AES.
> and it has 160 GB of data on it (And where do you find
> a laptop with a 160GB hard drive. In a cave with no electicity!)
http://www.newegg.com/Product/Product.asp?Item=N82E16822148073
> and you
> are going to go to war based on whether or not the contents are Persian or
> arabic. Well the world is a crazy place but I am sure glad that you are not
> in charge of anything important.
We're in a war right now for stupider reasons than that.
>Unruh <unruh...@physics.ubc.ca> writes:
>> You have a laptop encoded with RC4 ( who in the world uses a stream cypher
>> to encode a laptop),
>You'd use it for hard disk encryption, by re-keying it for each
>cluster of 8 kbytes or whatever. I've seriously considered this
>method since it should be much faster than AES.
And as soon as you rekey, the bias is no longer there. You HAVE to encrypt
the whole of the 160 GB with a single key to take advantage of the biases
as far as I understand the attack.
By the way, how in the world do you keep track of 10,000,000 keys?
>> and it has 160 GB of data on it (And where do you find
>> a laptop with a 160GB hard drive. In a cave with no electicity!)
>http://www.newegg.com/Product/Product.asp?Item=N82E16822148073
Does not solve the electricity problem. (Note the first review as well).
Remember that you have to have all that 160GB filled with the
Persian/Arabic stuff ( and no mixing in of English or French or the other
language either). Not sure if there is 160GB of Persian/Arabic or any
language written. That is a lot of text.
Ie, the scenario is silly. I understand the point-- that sometimes even one
bit of data could be useful. But in general it is not. Would I choose an
encryption which did not leak that one bit in 160GB? Probably. Do I
consider RC4 insecure because of it? No. In part precisely because of the
way in which RC4 must be used-- you cannot reuse a key, which means that if
you change anything, you must use a different key, which AFAIK destroys the
bias. IF that bias could be used to break the key faster than exhaustive
search, then RC4 would be broken and insecure in my opinion as well.
As you point out, AES, the suggested alternative, is slow.
>> and you
>> are going to go to war based on whether or not the contents are Persian or
>> arabic. Well the world is a crazy place but I am sure glad that you are not
>> in charge of anything important.
>We're in a war right now for stupider reasons than that.
Stupider, maybe, but much more extensive.
No, it does not. A OTP has Kolmogorov Complexity equal to its length.
One could also measure its entropy, ala Shannon.
A stream cipher generated by a PRNG that uses a 256-bit seed only has
256 bits of entropy. It can be uniquely generated by a deterministic
algorithm with only 256 bits of entropy. To break it, one need only
guess 256 bits of information. To break a true OTP, one must guess
the entire pad. They are NOT the same.
False, as I have already stated. The bias is
independent of the key. Even rekeying every 8KB,
the attack as I have described it still works.
>Ie, the scenario is silly.
I agree that the scenario is, ummm, contrived.
Still, it was sufficient to flip me from "don't
care" to "do care". There are ciphers that don't
give away anything like that amount of
information, so we should recommend them, and not
RC4.
>Unruh wrote:
>> "Tom St Denis" <tomst...@gmail.com> writes:
>>
>>
>> >Dave -Turner wrote:
>> >> Alice wants to give Bob her one time pad so they can use it for secure
>> >> communications, but as it's a large padfile she'd rather not give it to him
>> >> as it is. Can she give him a key of say 256 bits which Bob can then plug
>> >> into an algorithm such as a PRNG which uses the key as a seed to create the
>> >> pad?
>>
>> >Yes, but it's not a one-time pad.
>>
>> >What you are describing is a STREAM CIPHER and has none of the
>> >properties that make an OTP an OTP.
>>
>> Sure it does. It also can only be used One Time (so it shares the OT part).
>No, it does not. A OTP has Kolmogorov Complexity equal to its length.
>One could also measure its entropy, ala Shannon.
Yes, it does. You can only use the same key once for a stream cypher.
Exactly the same as for a One Time Pad. They share that property. Of course
they do not share the property of the entropy in the key. I never said they
shared ALL properties. Just the fact that each can only be used once. Or do
you deny that the key for a stream cypher and the pad for a one time pad
can only be used once?
>A stream cipher generated by a PRNG that uses a 256-bit seed only has
>256 bits of entropy. It can be uniquely generated by a deterministic
>algorithm with only 256 bits of entropy. To break it, one need only
>guess 256 bits of information. To break a true OTP, one must guess
>the entire pad. They are NOT the same.
Of course. That does not allow you to reuse them however. And of course
they are not the same. I never said they were. The statement was that a
stream cypher and a one time pad share NONE of the properties. I pointed
out one property that they do share. (In fact it is one of the major
problems with both a one time pad and a stream cypher.)
> >No, it does not. A OTP has Kolmogorov Complexity equal to its length.
> >One could also measure its entropy, ala Shannon.
>
> Yes, it does. You can only use the same key once for a stream cypher.
> Exactly the same as for a One Time Pad. They share that property.
Oy vay. For a OTP, the key IS the pad. They are the SAME LENGTH.
For a stream cipher they are not. They possess different amounts of
entropy.
>In article <ebaa0p$som$1...@nntp.itservices.ubc.ca>,
>Bill Unruh <un...@physics.ubc.ca> wrote:
>>Paul Rubin <http://phr...@NOSPAM.invalid> writes:
>>
>>>Unruh <unruh...@physics.ubc.ca> writes:
>>>> You have a laptop encoded with RC4 ( who in the world uses a stream cypher
>>>> to encode a laptop),
>>
>>>You'd use it for hard disk encryption, by re-keying it for each
>>>cluster of 8 kbytes or whatever. I've seriously considered this
>>>method since it should be much faster than AES.
>>
>>And as soon as you rekey, the bias is no longer there. You HAVE to encrypt
>>the whole of the 160 GB with a single key to take advantage of the biases
>>as far as I understand the attack.
>False, as I have already stated. The bias is
>independent of the key. Even rekeying every 8KB,
>the attack as I have described it still works.
I am sorry, where do you get this from. It is certainly not in the paper.
The state of the permutation is not random in the first 8K bytes. The state of i is not known, since
you have no idea what the block size actually is.
>>Ie, the scenario is silly.
>I agree that the scenario is, ummm, contrived.
>Still, it was sufficient to flip me from "don't
>care" to "do care". There are ciphers that don't
>give away anything like that amount of
>information, so we should recommend them, and not
>RC4.
The question is not should it be recommended, but is it secure.
I take "secure" to mean-- is it the most difficult way of getting the
information, and can the message be derived from the encrypted text, even
if some plain text is known. I have absolutely no objection to you
recommending something else, as long as it is as fast and is as resistant to
deriving the messages from the encrypted text as is RC4.
If it is faster and more resistant, the better.
It was you that objected to me calling RC4 "very secure" which I have
modified to "secure". The "one bit per 10^7 bytes" insecurity I find hard
to take terribly seriously as converting RC4 from secure to insecure when
there are far far far more serious insecurities in almost all cryptosystems.
Yes. And? Are you arguing that you can reuse the key in either case, no
matter what the length? Of course they have different amounts of entropy.
But despite that difference they are both ONE TIME.
To share one property does NOT mean they share all properties. Both horses
and humans have two legs. That does not mean that horses and humans are the
same ( not least because horses actually have more than two legs).
But both DO have two legs.
The previous poster (who I snipped) postulated 8k
blocks for disk encryption. While the state is
never in fact random, since it is
deterministically derived from the key, it is
reasonable to consider it to be so for analysis
purposes. So the bias is as likely to be present
in the post-keying state as it is anywhere else.
The conclusion follows, that the bias is present
in collections of independently generated streams,
as well as in single streams.
Personally, I think it is obvious, if you claim to
have understood the paper. I can't be bothered to
argue for the sake of argument, so you should go
ahead and have the last word on this.
>Tom's statement was a joke, guys. He was saying
^^^^ Bill
>that a stream cipher is a little like a one-time
>pad, because they could both only be used one
>time.
Agreed.
>In article <ebalki$312$2...@nntp.itservices.ubc.ca>,
>Unruh <unruh...@physics.ubc.ca> wrote:
>>g...@qualcomm.com (Gregory G Rose) writes:
>>>False, as I have already stated. The bias is
>>>independent of the key. Even rekeying every 8KB,
>>>the attack as I have described it still works.
>>
>>I am sorry, where do you get this from. It is certainly not in the paper.
>>The state of the permutation is not random in the first 8K bytes. The state of i is not known, since
>>you have no idea what the block size actually is.
>The previous poster (who I snipped) postulated 8k
>blocks for disk encryption. While the state is
>never in fact random, since it is
>deterministically derived from the key, it is
>reasonable to consider it to be so for analysis
>purposes. So the bias is as likely to be present
>in the post-keying state as it is anywhere else.
>The conclusion follows, that the bias is present
>in collections of independently generated streams,
>as well as in single streams.
>Personally, I think it is obvious, if you claim to
>have understood the paper. I can't be bothered to
>argue for the sake of argument, so you should go
>ahead and have the last word on this.
Obvious and true are not necessarily the same thing. Have experiments been
carried out to see if the biases are there if the stream is rekeyed every
8K times for example (using the usual "throw away the first 256 bytes" )?
The state of the permutation is an assumption, which certainly need not be
true early in the process ( and we know it is not true, which is why one
throws away the first bunch of bytes).
In particular you I believe carried out experiments to make sure that the
biases actually were there, and did not simply accept the conclusions of
the paper.
I'm not sure what your definition of "slow" is, but I wouldn't
characterize AES as "slow". It seems to be good enough for most
purposes.
More importantly it's "fast enough" for most purposes.
The C reference code gets ~16 cycles per byte on an AMD64. At 2.6Ghz
(my sweet sweet workstations clockrate) that's 154MiB/sec. My network
is 100Mbit and my RAID-5 can only sustain ~30MiB/sec. So even if AES
is "slow" it's not the bottleneck in ANYTHING I'd be doing here other
than artificial in cache benchmarks.
If you were maitaining a multi-gigabit switch or something you'd be
using hardware AES anyways. In that case, multi-gigabit AES already
exists.
So Unruh clearly needs to be beaten with a sufficiently large foam clue
bat.
Tom
You're confusing the biases that are present in
the entire keystream with the other result, of
Fluhrer, Mantin and Shamir, where the biases that
are present in the first few output bytes allow
recovery of the key.
>In particular you I believe carried out experiments to make sure that the
>biases actually were there, and did not simply accept the conclusions of
>the paper.
The purpose of my experiments wasn't to see
whether the biases were present or not, since
others had already experimentally verified them.
What I wanted to show was that an off-the-shelf
statistics program, one that wasn't tuned to look
for the specific characteristics of RC4, would
also claim that the output was not believably
random. And it did, although it required a few
hundred megabytes rather than the few megabytes
that a tuned program could work with. The actual
detection mechanism that first declares the output
to be non-random is doing a binomial test on the
frequency of zero bytes... there are too many of
them, and too few FFs.
I did demonstrate that the bias was detectable on
multiple keystreams of a megabyte each. I didn't
go down to 8KB keystreams... but I have no doubt
at all that the result will hold.
Rubin stated that he was contemplating replacing AES on his disk encryption
with RC4 because of the speed advantage of RC4.
Beat Rubin. He was the one suggesting replacing AES with RC4 for disk
encryption.
What is the rate you can get for RC4? That was the comparison, not whether
AES was fast enough for you. Speed was always RC4's claimed advantage.
>Tom
>In article <ebanqh$3oc$2...@nntp.itservices.ubc.ca>,
>Unruh <unruh...@physics.ubc.ca> wrote:
>>Obvious and true are not necessarily the same thing. Have experiments been
>>carried out to see if the biases are there if the stream is rekeyed every
>>8K times for example (using the usual "throw away the first 256 bytes" )?
>>The state of the permutation is an assumption, which certainly need not be
>>true early in the process ( and we know it is not true, which is why one
>>throws away the first bunch of bytes).
>You're confusing the biases that are present in
>the entire keystream with the other result, of
>Fluhrer, Mantin and Shamir, where the biases that
>are present in the first few output bytes allow
>recovery of the key.
No I am not. I am simply stating what I mean when I say "run RC4" I mean,
run RC4 in its standard mode where those bytes are thrown away because of
the key leakage. Ie, do not measure the biases when those bytes are
included.
>>In particular you I believe carried out experiments to make sure that the
>>biases actually were there, and did not simply accept the conclusions of
>>the paper.
>The purpose of my experiments wasn't to see
>whether the biases were present or not, since
>others had already experimentally verified them.
>What I wanted to show was that an off-the-shelf
>statistics program, one that wasn't tuned to look
>for the specific characteristics of RC4, would
>also claim that the output was not believably
>random. And it did, although it required a few
>hundred megabytes rather than the few megabytes
>that a tuned program could work with. The actual
>detection mechanism that first declares the output
>to be non-random is doing a binomial test on the
>frequency of zero bytes... there are too many of
>them, and too few FFs.
>I did demonstrate that the bias was detectable on
>multiple keystreams of a megabyte each. I didn't
>go down to 8KB keystreams... but I have no doubt
>at all that the result will hold.
OK, thanks.
>Paul Rubin <http://phr...@NOSPAM.invalid> writes:
>
>>Unruh <unruh...@physics.ubc.ca> writes:
>>> You have a laptop encoded with RC4 ( who in the world uses a stream cypher
>>> to encode a laptop),
>
>>You'd use it for hard disk encryption, by re-keying it for each
>>cluster of 8 kbytes or whatever. I've seriously considered this
>>method since it should be much faster than AES.
>
>And as soon as you rekey, the bias is no longer there. You HAVE to encrypt
>the whole of the 160 GB with a single key to take advantage of the biases
>as far as I understand the attack.
>
>By the way, how in the world do you keep track of 10,000,000 keys?
Key + IV, with the IV being based on which 8kb block you are working on?
But I'm saying it doesn't matter. Except for low latency applications.
But in terms of bulk throughput you have to look at your bottlenecks.
Let's see.
~6.4GiB/sec memory bus which usually averages ~4.5GiB/sec sustained
154MiB/sec for AES in the cache
132MiB/sec top for a PCI bus which you will likely top out at
~60MiB/sec.
30MiB/sec for the RAID array
10MiB/sec for networking
1MiB/sec for Internet, etc...
...
It doesn't matter if AES is slower than RC4. In fact, so long as AES
clocks in faster than 30MiB/sec it isn't really noticeable as far as
throughput is concerned.
Suppose RC4 clocked in at 300MiB/sec. Would that mean you could
encrypt faster than the 30MiB/sec your RAID array can sustain?
Tom
Generate each one on the fly by encrypting the sector number with AES
using a single master key. Doing one AES operation per sector (in
addition to the RC4 bulk encryption) is much less expensive than doing
the bulk encryption with AES.
> >http://www.newegg.com/Product/Product.asp?Item=N82E16822148073
> Does not solve the electricity problem. (Note the first review as well).
Portable generators have made their way to Lebanon.
I see on my 850 mhz P3 laptop, "openssl speed aes" gets about 19
mbyte/sec for 8k-sized blocks. The hard disk transfer rate is maybe
10 MB/sec copying from the internal HD to an external USB2 HD. If
both disks are encrypted with AES, that means the CPU can't keep up
with decrypting one drive and encrypting the other at full speed, much
less leave any cycles left over for other stuff while the file copy
happens in the background.
"openssl speed rc4" for 8k blocks gets over 120 mbyte/sec, so it's
much more able to keep up.
Oh I get it. In the future, everyone will be running Pentium 3s and
nothing any newer. Therefore, clearly, we should scrap AES and stick
to RC4. I mean I can't possibly think of any reason why that's a
ludicrious notion. You have a contrived situation for which you can
show your choice is faster [note I don't specfically grant that using
RC4 is better as it's a stupid cipher anyways].
However, people who are fielding crypto won't be using Pentium 3s. So
making your decisions based on that technology is a bit stupid.
If you really need gigabyte/sec speed you need an accelerator. Ideally
one that is inline with your media controller to avoid bus traffic. If
you're targetting current desktops, AES is so ridiculously fast enough
that it isn't an issue.
Tom
Lets see, you castigate him for using his machine as an example, and then
use your example as the defining standard. Do you know that disk access
will not get better? And net access is now Gigabit standard, not 100Mbit.
>Tom
I don't see drives getting up to the 100MiB/sec speed anytime soon.
The only way you'll get that is with PCI-X devices, fiber channel and a
decent size RAID. Oddly enough, people make crypto network storage
devices... Imagine that. Net access is not gigabit standard. The
average cable modem uploads less than 1Mbit/sec and downloads less than
6Mbit/sec. The vast majority of crap you can get at Best Buy is
100Mbit gear.
So effectively SATA/SCSI drivers are nowhere near 100MiB/sec now, and
the only way to get that is with expensive RAID gear. So chances are
you're well served by hardware crypto.
So instead of proposing people use non-standard inferior crypto, we
could be professionals here and recommend standards based proper
crypto. There are ways of getting fast encryption. Using non-standard
algos is not the way.
Tom
Did I say that?
> Therefore, clearly, we should scrap AES and stick to RC4.
Did I say that?
> I mean I can't possibly think of any reason why that's a ludicrious
> notion.
Did I say it wasn't?
> You have a contrived situation for which you can show your choice is
> faster [note I don't specfically grant that using RC4 is better as
> it's a stupid cipher anyways].
I have a real situation with a real laptop.
> However, people who are fielding crypto won't be using Pentium 3s. So
> making your decisions based on that technology is a bit stupid.
I'm fielding crypto and I have a Pentium 3 that I use every day. In
fact I still have a Pentium MMX (133 mhz) that I use from time to
time.
> If you really need gigabyte/sec speed you need an accelerator. Ideally
> one that is inline with your media controller to avoid bus traffic. If
> you're targetting current desktops, AES is so ridiculously fast enough
> that it isn't an issue.
I just want to be able to do ordinary day-to-day operations without
the crypto slowing things down. Ordinary day-to-day operations
include accessing or copying files on the hard drive.
In reality, for a high security system I'd be more concerned about key
leakage than randomness distinguishers, so I'd want to use hardware
crypto to keep the keys safe. For a personal laptop whose contents
aren't tremendously valuable or interesting but which need to be
protected in case the laptop is stolen, the AES/RC4 hybrid still looks
attractive to me, despite the RC4 security blemish.
> But I'm saying it doesn't matter. Except for low latency applications.
> But in terms of bulk throughput you have to look at your bottlenecks.
>
> Let's see.
>
> ~6.4GiB/sec memory bus which usually averages ~4.5GiB/sec sustained
>
> 154MiB/sec for AES in the cache
>
> 132MiB/sec top for a PCI bus which you will likely top out at
> ~60MiB/sec.
>
> 30MiB/sec for the RAID array
>
> 10MiB/sec for networking
>
> 1MiB/sec for Internet, etc...
>
> ...
>
> It doesn't matter if AES is slower than RC4. In fact, so long as AES
> clocks in faster than 30MiB/sec it isn't really noticeable as far as
> throughput is concerned.
would it matter for the slower symmetric algorithms of 3DES and
SERPENT,
or are they also not the bottleneck ?
TIA,
vedaal
Serpent and 3DES are not standards [recall DES was decomissioned
finally].
Serpent is a decent choice, not nearly as fast as Rijndael. RC6 is a
close runner up if you want to play this game. People look, there is
more to crypto than picking neato algorithms and saying "hack the
planet" as you skateboard through a subway system ... [I like that
movie hehehehe].
You start recommending to people they use non-standard algorithms and
then they start fielding stuff based on it. Then they get in shit when
they eventually have to work with others or get their stuff certified.
Tom
>Unruh wrote:
>> >If you really need gigabyte/sec speed you need an accelerator. Ideally
>> >one that is inline with your media controller to avoid bus traffic. If
>> >you're targetting current desktops, AES is so ridiculously fast enough
>> >that it isn't an issue.
>>
>> Lets see, you castigate him for using his machine as an example, and then
>> use your example as the defining standard. Do you know that disk access
>> will not get better? And net access is now Gigabit standard, not 100Mbit.
>I don't see drives getting up to the 100MiB/sec speed anytime soon.
>The only way you'll get that is with PCI-X devices, fiber channel and a
>decent size RAID. Oddly enough, people make crypto network storage
>devices... Imagine that. Net access is not gigabit standard. The
>average cable modem uploads less than 1Mbit/sec and downloads less than
>6Mbit/sec. The vast majority of crap you can get at Best Buy is
>100Mbit gear.
>So effectively SATA/SCSI drivers are nowhere near 100MiB/sec now, and
>the only way to get that is with expensive RAID gear. So chances are
>you're well served by hardware crypto.
?????? You castigate him for his standard laptop, and then start quoting
Best Buy to justify your choice? Sheesh. Gigabit IS the standard now in any
kind of high performace location. Our University department is even
converting now to Gigabit.
And you do not want to have your computer use 100% of its processing power
to do the encryption. Or even 20.
>So instead of proposing people use non-standard inferior crypto, we
>could be professionals here and recommend standards based proper
>crypto. There are ways of getting fast encryption. Using non-standard
>algos is not the way.
RC4 IS a "standard" algo, just as Windows is a "standard" operating system.
Most of the world's crypto by volume uses RC4 I am sure on the web.
Serpent might faster than AES-128 if you use a bit-slice
implementation with x86 XMM instructions. I don't know if anyone has
tried that yet. Probably not worthwhile. Some of the Ecrypt stream
cipher candidates are way faster than AES and look promising in terms
of security. Block ciphers have a property (i.e., invertibility)
which is not necessary for many uses of crypto, and which appears to
be somewhat expensive. We're in the habit of using them because
history starting with DES, but it's sort of as if public-key crypto
had been invented earlier than secret-key, and we got in the habit of
using RSA even for symmetric encryption, despite its slow speed. But
we should be looking to switch to stream ciphers, now that we have a
better understanding of what we want our primitives to do.
You are missing a crucial point here though. I'm not a standards
fanboi or somesuch. I truly believe that AES is both efficient and can
be used in a secure fashion. If Ecrypt produces a winner [or
winners] which then become standardized, well documented and are both
secure and fast then sure go for it.
For example, NESSIE produced Whirlpool, Anubis and Khazad. I have no
problems recommending them if the circumstances warrant. They're all
decently efficient, [Anubis basically being an optimized Rijndael],
secure, etc...Serpent and RC4 are not standards. Maybe if AES was
shown to weak I'd consider recommending the other AES finalists.
Tom
>Bill Unruh <un...@physics.ubc.ca> writes:
>> By the way, how in the world do you keep track of 10,000,000 keys?
>Generate each one on the fly by encrypting the sector number with AES
>using a single master key. Doing one AES operation per sector (in
>addition to the RC4 bulk encryption) is much less expensive than doing
>the bulk encryption with AES.
Sorry, that does not work. A stream cypher key can never be used to encrypt
different material. So what do you do when you read a sector, change it and
write it back? You cannot reuse the same stream key. But your proposal
would force you to. So you still have to keep a database of 10,000,000 keys
-- one for each sector.
And changing the master key on each write means having to decrypt and
reencrypt the whole 160GB drive even if only one byte had changed.
Hehehe, oh this explains your attitude. You haven't got a real job
yet.
Universities are NOT THE REAL WORLD.
Look up where I work, the workstations JUST moved to 100mbit [the HPC
centre has much higher bandwidth stuff though].
You'd probably be surprised at the amount of 10 and 100mbit stuff still
lying around.
> And you do not want to have your computer use 100% of its processing power
> to do the encryption. Or even 20.
But you wouldn't be. It doesn't take 100% of my processor to max out a
hard drives sustained write speed. Just because your SATA port is
3Gb/sec doesn't mean the drive can sustain that. A single 7200RPM
drive usually can sustain a max of 10MiB/sec. My raid-5 array can get
about 30MiB/sec.
I doubt your typical laptop or home computer will have a NAS device for
your precious home videos or whatever... So even at 30MiB/sec that's
1/5th the top throughput of the AES routine. With overhead and all
that jazz you'd probably be taking 30-40% of the processor time at most
[hard for me to say because I have four cores in my workstation...
hehehe :-)]
Point is, if you design cryptosystems for a Pentium 3 then you have a
very small niche market. On my box RC4 gets 223390K/sec and AES-128
gets 142393K/sec. Sure RC4 is faster, but I'd be damned to see my RAID
array sustain 142393K/sec.
> >So instead of proposing people use non-standard inferior crypto, we
> >could be professionals here and recommend standards based proper
> >crypto. There are ways of getting fast encryption. Using non-standard
> >algos is not the way.
>
> RC4 IS a "standard" algo, just as Windows is a "standard" operating system.
> Most of the world's crypto by volume uses RC4 I am sure on the web.
No, TLS uses RC4. TLS didn't specify RC4 [e.g. create it] therefore
it's not a standard.
But this line of thinking is ludicrous. Windows standard? O RLY? So
you mean my Win3.11 applications will work flawlessly in Vista? I
certainly have X11 applications from the early 90s that still work in
2006.
Tom
>Unruh wrote:
>> "Tom St Denis" <tomst...@gmail.com> writes:
>> ?????? You castigate him for his standard laptop, and then start quoting
>> Best Buy to justify your choice? Sheesh. Gigabit IS the standard now in any
>> kind of high performace location. Our University department is even
>> converting now to Gigabit.
>Hehehe, oh this explains your attitude. You haven't got a real job
>yet.
>Universities are NOT THE REAL WORLD.
Yes, they are usually more cash strapped than the real world in many ways.
>Look up where I work, the workstations JUST moved to 100mbit [the HPC
>centre has much higher bandwidth stuff though].
>You'd probably be surprised at the amount of 10 and 100mbit stuff still
>lying around.
Of course there is a lot of 10 to 100Mbit stuff around. There are also
millions and millions of 500MHz PIII and Pentuim machines around. YOu took
the highest level processor, linked it to slow outdated network and disks,
and said-- See AES can more than keep up. IF you want to use old network
and disk speeds, use old processor speeds as well-- that is my point.
>> And you do not want to have your computer use 100% of its processing power
>> to do the encryption. Or even 20.
>But you wouldn't be. It doesn't take 100% of my processor to max out a
>hard drives sustained write speed. Just because your SATA port is
>3Gb/sec doesn't mean the drive can sustain that. A single 7200RPM
>drive usually can sustain a max of 10MiB/sec. My raid-5 array can get
>about 30MiB/sec.
>I doubt your typical laptop or home computer will have a NAS device for
>your precious home videos or whatever... So even at 30MiB/sec that's
>1/5th the top throughput of the AES routine. With overhead and all
>that jazz you'd probably be taking 30-40% of the processor time at most
>[hard for me to say because I have four cores in my workstation...
>hehehe :-)]
Sheesh. 30-40% t encrypt, 30-40% to decrypt and you have your whole
processor in use just to copy a file. That level should be down at a max of
1%.
>Point is, if you design cryptosystems for a Pentium 3 then you have a
>very small niche market. On my box RC4 gets 223390K/sec and AES-128
>gets 142393K/sec. Sure RC4 is faster, but I'd be damned to see my RAID
>array sustain 142393K/sec.
No, if you design it for quad core processors you have a very small niche
market. 500MHz or slower processors are by far the majority out there.
>> >So instead of proposing people use non-standard inferior crypto, we
>> >could be professionals here and recommend standards based proper
>> >crypto. There are ways of getting fast encryption. Using non-standard
>> >algos is not the way.
>>
>> RC4 IS a "standard" algo, just as Windows is a "standard" operating system.
>> Most of the world's crypto by volume uses RC4 I am sure on the web.
>No, TLS uses RC4. TLS didn't specify RC4 [e.g. create it] therefore
>it's not a standard.
>But this line of thinking is ludicrous. Windows standard? O RLY? So
>you mean my Win3.11 applications will work flawlessly in Vista? I
>certainly have X11 applications from the early 90s that still work in
>2006.
There are standards which are set by standards bodies and standards which
are set by the marketplace. Both are important.
It's a lot cheaper to get a fast processor than a fast disk array.
> Sheesh. 30-40% t encrypt, 30-40% to decrypt and you have your whole
> processor in use just to copy a file. That level should be down at a max of
> 1%.
How often are you moving files from encrypted volumes? Maybe you
should re-consider your crypto deployment? Also if they are encrypted
with the same master password/key you could just literally move the
file. Why would it not be decryptable on the other end?
Of course I also disagree with the general deployment of encrypted
volumes. People over deploy it then think "I can do anything I want
[such as run windows] because obviously I'm secure now."
> >Point is, if you design cryptosystems for a Pentium 3 then you have a
> >very small niche market. On my box RC4 gets 223390K/sec and AES-128
> >gets 142393K/sec. Sure RC4 is faster, but I'd be damned to see my RAID
> >array sustain 142393K/sec.
>
> No, if you design it for quad core processors you have a very small niche
> market. 500MHz or slower processors are by far the majority out there.
Um, the only place you see <500MHz processors in wide deployment are
ARM and MIPS in which case you can work in a crypto core to get faster
than RC4 rates anyways. Besides, you don't need a 2P 285 Opteron box
to get this. Any 2.6Ghz AMD64 will get those rates. Certainly, it's
easier and cheaper to buy a high end Athlon64 then it is a disk array.
So if you were really worried about performance you could easily move
the bottleneck to the I/O subsystem.
> >But this line of thinking is ludicrous. Windows standard? O RLY? So
> >you mean my Win3.11 applications will work flawlessly in Vista? I
> >certainly have X11 applications from the early 90s that still work in
> >2006.
>
> There are standards which are set by standards bodies and standards which
> are set by the marketplace. Both are important.
Yes, and no. If you are making a NEW design generally it's best to
take from standard bodies as much as possible.
Oh I get it, I'm making a new product totally unrelated to TLS and
Windows, but because TLS has RC4 [as well as AES] and the NT hash is
MD4 I should base my product on RC4/MD4. Clearly that's a well thought
out design decision.
:-)
Face it, RC4 had its time but it's just moot for all but the simplest
homebrew applications.
Tom
The threat model involves the laptop being stolen and not returned.
It doesn't encompass the situtaion of someone stealing the laptop,
reading the hard drive, returning the laptop, waiting for the HD to
change, then stealing the laptop again and re-reading the disk. You
-do- have to worry about the HD sparing out a sector, so that there's
some old ciphertext visible to forensic data recovery, and so the
attacker does get to see the xor of some two plaintexts. Again, the
security-vs-speed tradeoff may favor speed for some users and
applications (certainly not all of them). Remember most users use no
encryption whatsoever, so it's not as if things become any worse.
> So you still have to keep a database of 10,000,000 keys -- one for
> each sector.
Keeping a separate IV per sector isn't all that infeasible, but even
if you do that, it's still not totally satisfactory from the viewpoint
of trying to defeat an all-seeing adversary who can read your HD
between your usage sessions an unlimited number of times. There's no
realistic way to prevent some info from leaking to such an attacker.
All you can do is apply some physical security to the laptop or its
storage media, so that repeated reading isn't an issue. For example,
keep your sensitive stuff on a USB key that you carry in your pocket
when not in use.
Interesting. I still have a 200 mhz ARM-based Sharp Zaurus PDA around
here somewhere. Where can I get a crypto core for it?
And how often are you sending hundreds of MiB/sec through it?
I have a wrist watch running [probably] some PIC. I can clearly see
the argument for RC4 forming.
Tom
It's unlikely that a wristwatch would use a PIC. It probably uses a
4-bit processor, or even random logic. For the 4-bit processor case
I'd seriously consider using the old GOST cipher. RC4 would be
completely unsuitable because of its large RAM consumption.
I guess I find this weird. This thread has mostly been about RC4 possibly
leaking one bit of information per MB of data. And here a steam cypher is
being advocated with a reused key which can leak bits of data per byte
encrypted.
I realise that you are not one of those who were excited by the possibility
of RC4 leaking that bit, and that your threat model is different from the
usual threat model for stream cyphers, but it still makes me feel dizzy.
What I don't understand is this.
As far as I understand it - AES does nothing but shuffle bits around - and
in the case of hard drive encryption, shuffle bits around inside of a
sector.
But what if a TLA/law enforcement didn't need to break the encryption of
your drive - they only needed to see if a file existed (some form of
contraband)
For a large file, and guessing drive geometry, isn't is reasonable that a
large file could be detected on a drive with a very high level of certainty
without actually breaking AES?
--
LTP
:)
Sorry, there's been some topic drift in the thread. A generic
cryptography system for wide deployment should follow best practices,
which means don't leak that bit. A specific user with some data to
secure, based on knowledge of exactly how interesting or
non-interesting the data is, might decide that it's ok to accept
leaking the bit for performance reasons. Remember that 99% of users
with this type of data (i.e. the contents of a random personal
computer) don't encrypt it at all, so leaking one bit isn't so bad by
comparison.
> I realise that you are not one of those who were excited by the possibility
> of RC4 leaking that bit, and that your threat model is different from the
> usual threat model for stream cyphers, but it still makes me feel dizzy.
I consider RC4 to be broken because of the leaked bit, just like MD5
is broken because we can now find collisions in it. Nonetheless those
algorithms are still useful for some purposes. I'm not going berserk
over the fact that most https web sessions still use RC4. But I
wouldn't choose RC4 for a new application that needed high security.
I'm not sure I understand this question. A well-designed HD
encryption system using AES should fill the HD with data that's
indistinguishable from random. Using RC4 the same way would both be
distinguishable from random and leak a tiny amount of information
about the plaintext. Both of those leaks are bad, though some users
might be willing to tolerate them for performance reasons.
No, all that could be detected was that a file of that size existed (
assuming that the inode or file allocation table was not also encrypted).
Ie, there is nothing about the bits for a really good encryption program
that would give a clue as to contents of the file.
Even if they could detect say the size of the file, until owning files of a
certain size became illegal, they would have nothing to go for you on.
Note that AES, RC4, ... do not just shuffle bits around, they replace bits
by other bits, so that the outcome looks entirely random.
Ie there is no way to tell the difference between two files each of which
had been encrypted. (Well, as my opponents in the argument about RC4 would
say, with RC4 there is a danger that 1 bit in a few thousand would not be
entirely random, and that perhaps that could be used to tell for a large
enough file, some characteristic of that file. I believe that this
probability is low enough that it is incosequential. They believe that any
difference, no matter how small, in the output from purely random is a
fatal flaw that should relegate that encryption scheme to the dustbin of history
>Unruh <unruh...@physics.ubc.ca> writes:
>> I guess I find this weird. This thread has mostly been about RC4 possibly
>> leaking one bit of information per MB of data. And here a steam cypher is
>> being advocated with a reused key which can leak bits of data per byte
>> encrypted.
>Sorry, there's been some topic drift in the thread. A generic
>cryptography system for wide deployment should follow best practices,
>which means don't leak that bit. A specific user with some data to
>secure, based on knowledge of exactly how interesting or
>non-interesting the data is, might decide that it's ok to accept
>leaking the bit for performance reasons. Remember that 99% of users
>with this type of data (i.e. the contents of a random personal
>computer) don't encrypt it at all, so leaking one bit isn't so bad by
>comparison.
Yes, I was just bemused by the direction of the drift. I understand it, but
still believe that the dangers of reusing a key on a steam cypher is much
much much greater than is the danger presented by that leaked bit.
>> I realise that you are not one of those who were excited by the possibility
>> of RC4 leaking that bit, and that your threat model is different from the
>> usual threat model for stream cyphers, but it still makes me feel dizzy.
>I consider RC4 to be broken because of the leaked bit, just like MD5
>is broken because we can now find collisions in it. Nonetheless those
Hardly an adequate comparison. MD5 is completely broken for collision
resistance. It is trivial to find collisions. It was supposed to be hard.
Now it is still hard to find preimage collisions (Ie, find a second text
with the same hash as a given text) but the weakness against collisions
does not give one great faith. Ie, this would be the same as finding RC4 to
be breakable with chosen plaintext, but still hard to break with unknown
plaintext. I would certainly consider it broken. Just as I consider any
stream cypher broken if the key is reused, no matter what arguments are
made that in the special case used it is OK.
Ie, the single bit leakage is nowhere near being in the same league as the
MD5 break.
>algorithms are still useful for some purposes. I'm not going berserk
>over the fact that most https web sessions still use RC4. But I
>wouldn't choose RC4 for a new application that needed high security.
So what would you choose for something where you wanted the speed, the
known resistance to key derivation? AES is slow. The E cyphers are very
very new apparently. Balancing the known weakness of RC4, its relatively
long study without cracking, and its speed against the newer cyphers
without the know weakness ( but without much study, potentially much more
serious weaknesses), which would you choose and why?
Only if the plaintext had very low entropy could you distinguish the RC4
encryption from random. Knowing what the weakness in RC4 is, it might, or
might not, be useable to leak a tiny amount of information. (one bit in
10^7 reliably if the plaintext were very low entropy in some feature)
Ie, I think you are regarding the same weakness as two separate weaknesses.
To make it clear, if the plaintext being encrypted were entirely random,
could you distinguish an encrytion with RC4 ( or with a stream cypher
which produced all 0s) from random? No.
A typical hard drive is mostly empty and initialized with zeros. That
entropy is pretty low ;-).
I'd certainly use AES for this type of application. I'd consider it
irresponsible to use anything else, except maybe 3DES.
How about RSA? That is even slower.
> g...@qualcomm.com (Gregory G Rose) writes:
>
> ....
> >If you go look at the E-stream archives, you'll
> >find a couple of ciphers that were either rejected
> >or tweaked (our NLS among them) because of biases
> >of about 2^-32. There's no consensus yet on how
> >much is too much. Many of the ciphers state a
> >limit of 2^80 (bytes or words) of output, and
> >consider any bias detectable with less than that
> >amount of output to be too high. (Detectable means
> >a bias of about the square root of the amount of
> >output needed, so bias of 2^-40 in this case.)
> >Note that this is already higher than the
> >threshhold for any 128-bit block cipher, where you
> >can distinguish it from random after 2^64 blocks.
>
> >So even double-RC4 would probably be rejected from
> >E-stream as insecure by today's standards.
>
> I am sorry, but this is abusing the word insecure. Ceasar is an insecure
> stream cypher. RC4 or thos E-stream cyphers are not insecure. They may have
> smaller security that is idea but to lump them in with insecure is
> bastardising the language. Is the use of SSL for secure web transactions
> which uses RC4 insecure? The answer is almost certainly no. Ie, no-one,
> including the NSA, will be able to figure out what the message sent was.
> That is what secure means.
I sit firmly in Greg and David's camp. My definition of secure is a
stream-cipher that can't be feasibly distinguished from random.
What is feasible is still open to debate. My personal opinion is that
any bias should be around the 2^-128 mark. That said, I think a bias
around 2^-64 is acceptable but anything less should be treated with
suspicion.
The bias RC4 demonstrates is unacceptable.
> By your definition of secure, you are simply encouraging all those people
> who argue that one should double, triple or hexatupel encrypt "because you
> never know".
> Security is a matter of balance, not a matter of absolutes.
Agreed.
> Especially, it is a matter of spending one's resouces on the weakest not the
> strongest link in the chain, and for any encryption system, even one using RC4, the
>encrytion algorithm is by far the strongest link in the chain (assuming it is properly
> used).
Nobody can seriously disagree with this statement. The point is there
are ciphers that are faster and do not exhibit this bias. So that begs
the question: Why use RC4 at all?
Answer me this: Why deliberately choose a slower, less secure cipher in
a new deployment?
Simon.
Well, anything faster than RC4 hasn't resisted nearly as much
cryptanalysis, at least in public.
>Nobody can seriously disagree with this statement. The point is there
>are ciphers that are faster and do not exhibit this bias. So that begs
>the question: Why use RC4 at all?
>Answer me this: Why deliberately choose a slower, less secure cipher in
>a new deployment?
I ask again, what cypher do you suggest that is faster, and more secure. A
cypher invented last weekend does not fall into the latter class, as one of
the criteria of security is that the cypher has received a wide variety of
study. Knapsack was a great public key system, until it was shown to be
easily breakable.
Rubin suggested AES or 3DES which probably satisfy the second criterion,
but certainly not the first.
While the bias is unfortunate, it is not something that anyone knows how to
use to break (ie read a substantial portion of the plaintext) RC4
To suggest a cypher that does not have this particular bias, but has
received virtually no study does not seem to be a responsible tactic (not
that you have suggested any cypher).
>Simon.