Successful remote AES key extraction

D. J. Bernstein

unread,

Apr 14, 2005, 10:58:05 PM4/14/05

to

``This paper reports successful extraction of a complete AES key from a
network server on another computer. The targeted server used its key
solely to encrypt data using the OpenSSL AES implementation on a Pentium
III.''

http://cr.yp.to/papers.html#cachetiming

---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago

Roger Schlafly

unread,

Apr 14, 2005, 11:36:17 PM4/14/05

to

"D. J. Bernstein" <d...@cr.yp.to> wrote:
> ``This paper reports successful extraction of a complete AES key ...

Was it a weak key? <g>

sammy

unread,

Apr 15, 2005, 3:49:49 AM4/15/05

to

Eye-opening. Wow.

Tom St Denis

unread,

Apr 15, 2005, 5:28:37 AM4/15/05

to

D. J. Bernstein wrote:
> ``This paper reports successful extraction of a complete AES key from
a
> network server on another computer. The targeted server used its key
> solely to encrypt data using the OpenSSL AES implementation on a
Pentium
> III.''
>
> http://cr.yp.to/papers.html#cachetiming

You talk in your paper that to avoid the attack you have to completely
avoid cache stalls. Couldn't you just insert noise in the process to
lower the bias?

E.g. have a 16-byte counter which you initially set to random data
(anything really). Then at the start of the AES encrypt you do an "AES
round" on it. Heck, even XOR in the normal block after half of the
rounds.

Net result is you roughly add the cost of a round and a bit to the
design. However, it makes the state of the cache harder to model as 16
random accesses to the four tables would have been made. Later on
today I want to try out the effect.

My hunch is it doesn't stop the attack but it may make it harder to
exploit and thus less practical (than it already isn't).

Tom

Tom St Denis

unread,

Apr 15, 2005, 5:50:14 AM4/15/05

to

D. J. Bernstein wrote:
> ``This paper reports successful extraction of a complete AES key from
a
> network server on another computer. The targeted server used its key
> solely to encrypt data using the OpenSSL AES implementation on a
Pentium
> III.''
>
> http://cr.yp.to/papers.html#cachetiming

As another note... it seems you implemented the very attack I PROPOSED
LAST YEAR [*] but you never tried to attack it.

So were these two computers "over the network" just the same box with
an alias? Or were they cross-over connected? or ...?

Two major flaws with your attack (in case you're wondering why reaction
is minimal).

1. It requires a large amount of chosen plaintext. How many times do
you repeatedly encrypt the same block anyways?

2. Very timing sensitive. Between my box and your box are probably 18
hops, so a guaranteed packet latency is not likely.

So while you have an "implementation" flaw in AES implementations it is
entirely impractical (for the time being at least). It requires
protocols to do things they're not meant to do either.

[*]
http://groups-beta.google.com/group/sci.crypt/browse_frm/thread/420baaa88b1ffd39/6bae5d5b96abc0eb?q=aes+attack+udp&rnum=2#6bae5d5b96abc0eb

BRG

unread,

Apr 15, 2005, 6:03:06 AM4/15/05

to

D. J. Bernstein wrote:

As always an interesting piece of work.

There is an issue in this paper on which I would like to hear comments
from those who are expert in the inner workings of modern processors
(x86 in particular).

For AES the need for very large tables is based on the need to avoid the
speed penalty involved in accessing 32-bit values that are not aligned
on natural 32-bit boundaries. A large reduction in the size of the AES
tables is possible if mis-aligned 32-bit accesses do not involve
significant speed penalties . This is not only a gain in timing attack
terms but is also a big gain in terms of cache loading overhead for AES
more generally.

But I have tried this several times (although not in the last couple of
years) and I have always seen a large speed penalty. I may be reading
DJB's paper wrongly but he seems to be saying that this is no longer a
problem with several current (x86) processors. Is this right?

If this is true then there is very good reason to change the design of
most fast AES code (including my own) since this no longer involves a
speed penalty.

Brian Gladman

David A. Scott

unread,

Apr 15, 2005, 7:47:12 AM4/15/05

to

"D. J. Bernstein" <d...@cr.yp.to> wrote in
news:slrnd5ubei....@stoneport.math.uic.edu:

Very Good work. You are the rare type of crypto person I admire so I fear
that you must have many jealous enimes. Keep up the good work.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

Tom St Denis

unread,

Apr 15, 2005, 7:54:13 AM4/15/05

to

David A. Scott wrote:
> "D. J. Bernstein" <d...@cr.yp.to> wrote in
> news:slrnd5ubei....@stoneport.math.uic.edu:
>
> > ``This paper reports successful extraction of a complete AES key
from a
> > network server on another computer. The targeted server used its
key
> > solely to encrypt data using the OpenSSL AES implementation on a
Pentium
> > III.''
> >
> > http://cr.yp.to/papers.html#cachetiming
> >
> > ---D. J. Bernstein, Associate Professor, Department of Mathematics,
> > Statistics, and Computer Science, University of Illinois at Chicago
>
> Very Good work. You are the rare type of crypto person I admire so
I fear
> that you must have many jealous enimes. Keep up the good work.

What you fail to realize is that he didn't break the actual cipher.
Just how it's implemented. Even then the attack [which does work] is
entirely impractical.

First off, MAC YOUR DATA. His attack says

"Hey server, encrypt BLAH for me!"

The server should reply

"This request wasn't MAC'ed, session closed, key destroyed".

He wouldn't even get 1 plaintext let alone the 2^25 or so required.

Second, you have to be VERY close to the box to attack it. Chances are
he had two computers attached to a switch in the same room. That's not
realistic since the box you're going to attack is likely not even in
the same country let alone state let alone county let alone campus let
alone office...

Third, I'm about the only one going on about how the attack while cool
[and useful knowledge] isn't something to throw a wrench in the gears
over. As far as I know Prof. Bernstein is [and should be] well
respected in the community.

You have to also realize that two ADULTS can disagree about something
and not become mortal enemies for life. I disagree with the gravity of
the situation but I respect the work (because to be honest I never
thought it would be that effective in even the contrived case) and the
person.

Unlike someone who will rename nameless civilized discourse doesn't
require everyone to kiss ass to "get along". Disagreements ARE useful
AND commonplace.

Tom

Tom St Denis

unread,

Apr 15, 2005, 7:58:30 AM4/15/05

to

Tom St Denis wrote:
> Unlike someone who will rename nameless civilized discourse doesn't
> require everyone to kiss ass to "get along". Disagreements ARE
useful
> AND commonplace.

s/rename/remain/

Early morning usenet'ing while sick with a cold is not a good idea
apparently ;-)

Tom

David A. Scott

unread,

Apr 15, 2005, 8:53:34 AM4/15/05

to

"Tom St Denis" <tomst...@gmail.com> wrote in
news:1113566053.8...@o13g2000cwo.googlegroups.com:

>
> Unlike someone who will rename nameless civilized discourse doesn't
> require everyone to kiss ass to "get along". Disagreements ARE useful
> AND commonplace.
>
>

You of all people should speak of "civilizes dicourse". Your rants
seldom produce anything useful. I don't kiss ass to get along. I think
you should take a good look in the mirror. Ask Wagner or Mr BS who is
the better ass kisser. I lately have read some of your posts. You seem
to greatly over rate your importance but I guess thats nothing new.

Tom St Denis

unread,

Apr 15, 2005, 8:56:27 AM4/15/05

to

David A. Scott wrote:
> "Tom St Denis" <tomst...@gmail.com> wrote in
> news:1113566053.8...@o13g2000cwo.googlegroups.com:
>
> >
> > Unlike someone who will rename nameless civilized discourse doesn't
> > require everyone to kiss ass to "get along". Disagreements ARE
useful
> > AND commonplace.

> You of all people should speak of "civilizes dicourse". Your rants
> seldom produce anything useful. I don't kiss ass to get along. I
think
> you should take a good look in the mirror. Ask Wagner or Mr BS who is
> the better ass kisser. I lately have read some of your posts. You
seem
> to greatly over rate your importance but I guess thats nothing new.

Did you just insult Wagner and Schneier in the process of telling me
that I should "learn about civilized discourse" (note spelling)?

YO POT! HEY KETTLE!

Tom

D. J. Bernstein

unread,

Apr 15, 2005, 12:51:49 PM4/15/05

to

Tom St Denis wrote:
> It requires a large amount of chosen plaintext.

No. It requires a large amount of _known_ plaintext where the AES inputs
are fairly random: e.g., known plaintext in CBC mode. Page 11 mentions
one idea for extending the attack to counter mode, although I'm not
planning to investigate this.

Tom St Denis

unread,

Apr 15, 2005, 1:04:34 PM4/15/05

to

D. J. Bernstein wrote:
> Tom St Denis wrote:
> > It requires a large amount of chosen plaintext.
>
> No. It requires a large amount of _known_ plaintext where the AES
inputs
> are fairly random: e.g., known plaintext in CBC mode. Page 11
mentions
> one idea for extending the attack to counter mode, although I'm not
> planning to investigate this.

Sorry, the point still holds. Last time I saw your attack you cycled
through different values for a given byte position to see which took
the longest...

The implies fixing 15 bytes and cycling through 1 of them. Not a
scenario that is likely to happen outside of CTR mode [interestingly
enough] but even then you won't get more than the first couple LS
bytes... I'm not trying to take away from the result other than to say
it's not a practical threat.

Just trying to provide "the other side of the coin" here...If you can't
stand a little scrutiny ... :-(

Tom

Paul Rubin

unread,

Apr 15, 2005, 1:07:49 PM4/15/05

to

"D. J. Bernstein" <d...@cr.yp.to> writes:
> No. It requires a large amount of _known_ plaintext where the AES inputs
> are fairly random: e.g., known plaintext in CBC mode. Page 11 mentions
> one idea for extending the attack to counter mode, although I'm not
> planning to investigate this.

Got any thoughts about Serpent in the "bit slice" implementation style?
Should it have become the AES instead of Rijndael? Would it have been
an adequate AES with 16 rounds instead of 32?

D. J. Bernstein

unread,

Apr 15, 2005, 1:12:06 PM4/15/05

to

Tom St Denis wrote:
> You talk in your paper that to avoid the attack you have to completely
> avoid cache stalls.

No. What I said was that to _retain confidence in AES_ we need
_constant-time software_. ``Avoid the attack'' and ``retain confidence''
are not the same thing; see page 13. ``Take constant time'' is also an
extremely difficult task; see Sections 10 through 15.

> Couldn't you just insert noise in the process to lower the bias?

Adding noise doesn't lower the bias. The attacker sees through the noise
by averaging over a larger number of packets. See page 5. See also [7].

The number of packets needed is proportional to the timing variance (the
square of the timing deviation). I was already dealing with a variance
over a thousand squared cycles; your specific look-up-a-few-entries
tweak wouldn't have noticeably affected the variance. Hiding behind a
cable modem is a much better way to add variance, although the attack is
still doable with enough effort.

Paul Rubin

unread,

Apr 15, 2005, 1:25:56 PM4/15/05

to

"D. J. Bernstein" <d...@cr.yp.to> writes:

> No. What I said was that to _retain confidence in AES_ we need
> _constant-time software_. ``Avoid the attack'' and ``retain confidence''
> are not the same thing; see page 13. ``Take constant time'' is also an
> extremely difficult task; see Sections 10 through 15.

How about using the CPU cycle counter to add enough delay to make
the total time constant. You'd do that on entire packets, not AES
blocks; that is, for a 1600 byte packet (100 blocks) the mean encryption
time might be 32000 cycles with SD=(say) 500 cycles. So you'd use
the cycle counter to delay until 34000 cycles (= mean + 4 sigma) giving
you about 6% average slowdown and almost-always-constant time.
Such accurate timing would need OS support, most likely, and someone
with access to the physical hardware might still be able to distinguish
encryption from timer delays through power consumption analysis.

Brownout

unread,

Apr 15, 2005, 2:10:45 PM4/15/05

to

Tom St Denis wrote:
> As another note... it seems you implemented the very attack I PROPOSED
> LAST YEAR [*] but you never tried to attack it.

Publish or perish.

Brownout

Alan

unread,

Apr 15, 2005, 3:16:51 PM4/15/05

to

"D. J. Bernstein wrote in message...

> Adding noise doesn't lower the bias. The attacker sees through the noise
> by averaging over a larger number of packets. See page 5. See also [7].

At the risk of making a fool of myself...

Instead of adding the noise, why not just replace the underlying timing with
the noise? In other words, if the current task would be completed natively
at time T, don't deliver the results at time T + t-random, but instead at
t-random. Of course, this would substantially slow the overall process,
since the shortest time t-random must be no less than the longest task
completion time T, to avoid leaking information about T.

Another thought: Is there a way to perform the AES subtasks in parallel
(and synchronized) such that it would be impossible to tell which atomic
step determined the completion time for the parallelled set of tasks?

It seems that any attempt to mask the timing will slow things down. Speed
was one of the advantages that led to Rijndael winning out as AES. Maybe
that speed comes with a cost...

For what it's worth...

Alan

D. J. Bernstein

unread,

Apr 15, 2005, 5:38:21 PM4/15/05

to

Tom St Denis wrote:
> The implies fixing 15 bytes and cycling through 1 of them.

Nope. The attack used random packets. You should read the description of
the attack, or simply read the code in Appendix B.

This does mean that variance in each byte acted as noise for other bytes
(as mentioned on page 11), but that noise was averaged out, just like
all the other noise, producing separate results for each byte.

Tom St Denis

unread,

Apr 15, 2005, 5:46:44 PM4/15/05

to

D. J. Bernstein wrote:
> Tom St Denis wrote:
> > The implies fixing 15 bytes and cycling through 1 of them.
>
> Nope. The attack used random packets. You should read the description
of
> the attack, or simply read the code in Appendix B.
>
> This does mean that variance in each byte acted as noise for other
bytes
> (as mentioned on page 11), but that noise was averaged out, just like
> all the other noise, producing separate results for each byte.

Oops I missed this. I'm sorry. I read the paper fairly early this
morning... Hehehe.

I think the safe move here is for me to duck out of this thread.

Adios.

Tom

D. J. Bernstein

unread,

Apr 15, 2005, 6:27:02 PM4/15/05

to

Paul Rubin wrote:
> How about using the CPU cycle counter to add enough delay to make
> the total time constant.

Extremely expensive, and inadequate.

Extremely expensive: The time _can_ be huge---consider uncached tables.
Slowing the average case down to the worst case is a terrible penalty.

Inadequate: The AES process can be interrupted, so making the _total_
time constant isn't sufficient to stop timing leaks. See Section 13 of
the paper.

D. J. Bernstein

unread,

Apr 15, 2005, 6:42:36 PM4/15/05

to

Paul Rubin wrote:
> Got any thoughts about Serpent in the "bit slice" implementation style?

The Serpent speed records are from Dag Osvik. For PPro (similar to PII
etc.), his paper says 770 cycles to encrypt each block (compared to ~225
for Rijndael), or 1876 including key setup (compared to ~400 for
Rijndael). He mentioned that he has better results now; I'm not sure how
much better. Presumably all these timings are independent of key/input.

> Should it have become the AES instead of Rijndael?

I wasn't involved in the AES design/selection process, but it's clear
that Serpent flunked many contributors' performance requirements.

> Would it have been an adequate AES with 16 rounds instead of 32?

That modification of Serpent wasn't one of the AES submissions.

Henrick Hellström

unread,

Apr 15, 2005, 6:58:54 PM4/15/05

to

D. J. Bernstein wrote:
> Tom St Denis wrote:
>
>>The implies fixing 15 bytes and cycling through 1 of them.
>
>
> Nope. The attack used random packets. You should read the description of
> the attack, or simply read the code in Appendix B.
>
> This does mean that variance in each byte acted as noise for other bytes
> (as mentioned on page 11), but that noise was averaged out, just like
> all the other noise, producing separate results for each byte.

FWIW it seems as if the timing differences are gone if the plain text is
RC4 encrypted before it is AES-CBC encrypted. It would be interesting to
see some results for AES-OCB as well.

D. J. Bernstein

unread,

Apr 15, 2005, 7:58:25 PM4/15/05

to

Roger Schlafly wrote:
> Was it a weak key? <g>

It is now! The only way you can guarantee that the attack doesn't find
your key 43 74 84 a9 89 b1 62 a7 bb 7f d0 3d 9e ec 3d 35 is to exclude
43 74 84 a9 89 b1 62 a7 bb 7f d0 3d 9e ec 3d 35 during key generation!
:-)

Matt Mahoney

unread,

Apr 15, 2005, 9:38:16 PM4/15/05

to

Actually in appendix A, the server code sends the timing information
back to the client by calling timestamp() before and after the AES
encryption. timestamp() uses an RDTSC instruction to get the time with
a resolution of one CPU cycle. (RDTSC stores a 64 bit count in
%edx:%eax).

unsigned int timestamp(void)
{
unsigned int bottom;
unsigned int top;
asm volatile(".byte 15;.byte 49" : "=a"(bottom),"=d"(top));
return bottom;
}

But as the paper points out, an attack is still possible with less
accurate timing information, it just takes more effort. Excellent
work, IMHO.

-- Matt Mahoney

D. J. Bernstein

unread,

Apr 15, 2005, 10:03:48 PM4/15/05

to

BRG wrote:
> But I have tried this several times (although not in the last couple of
> years) and I have always seen a large speed penalty.

Which CPUs? Did you avoid crossing 8-byte boundaries?

My own public-domain AES software (see http://cr.yp.to/mac.html) uses
compressed tables---the usual 8-byte T,S,S,S^T,T,S,S,0---for aes_athlon,
aes_ppro, aes_aix, and aes_macos. It's as fast as any other published
code, so the penalty is obviously not terrible.

Dag Osvik pointed out that the Athlon does one unaligned load per cycle,
as opposed to two normally; but I'm limiting myself to one load per
cycle anyway, so that isn't a problem.

> This is not only a gain in timing attack terms but is also a big gain
> in terms of cache loading overhead for AES more generally.

Yup. There's definitely a speedup in real-world applications, although
there's also an unfortunate tendency of cryptographic papers to focus on
small-scale benchmarks that can't see the speedup.

David Wagner

unread,

Apr 15, 2005, 11:15:23 PM4/15/05

to

D. J. Bernstein wrote:
>``This paper reports successful extraction of a complete AES key from a
>network server on another computer. The targeted server used its key
>solely to encrypt data using the OpenSSL AES implementation on a Pentium
>III.''

Fascinating work, as always. Scary stuff.

Do you have any estimate of the workload of this type of attack in a
setting that is more representative of typical deployments? I noticed
that your experiment involved server code, network topology, and other
parameters that were chosen to make the attacker's life as easy as
possible, and the attack's workfactor was around 2^35 bytes of data. If,
for instance, it turns out that real applications have a signal-to-noise
ratio that is three orders of magnitude higher than what you saw in your
test, then the workfactor would go up to 2^55 bytes of data. Do you
have any sense of what the workfactor would be in a more common setup?

I guess the next question for research is whether we can come up with
countermeasures that don't require abandoning AES.

D. J. Bernstein

unread,

Apr 16, 2005, 12:24:24 AM4/16/05

to

David Wagner wrote:
> If, for instance, it turns out that real applications have a
> signal-to-noise ratio that is three orders of magnitude higher than
> what you saw in your test,

You're talking about a timing _deviation_ on the scale of 100000 CPU
cycles---like Tom's 2GHz Athlon 64 stuck behind a cable modem. That's an
extreme situation, certainly not representative of typical servers.

A deviation of a few thousand CPU cycles---for example, a network
latency of 400 +- 5 microseconds to reach a 600MHz server on the other
side of campus---is a much more common attack scenario.

> then the workfactor would go up to 2^55 bytes of data.

No, it wouldn't. Upper bounds are not lower bounds! For example, it was
massive overkill to narrow the key set down to just 2^23 possibilities.
See page 11 of the paper for many other ways to save time in the attack.
See also the subsequent paragraph explaining why I didn't bother.

> I guess the next question for research is whether we can come up with
> countermeasures that don't require abandoning AES.

The paper devotes 12 pages to this question, and comes up with answers
to all of the known obstacles, although some of the answers are rather
painful for AES implementors and OS designers.

Paul Rubin

unread,

Apr 16, 2005, 12:32:50 AM4/16/05

to

d...@taverner.cs.berkeley.edu (David Wagner) writes:
> I guess the next question for research is whether we can come up with
> countermeasures that don't require abandoning AES.

The real fix is to put AES hardware into all Pentium-class CPU's. In
such large cpu's, it's not much extra silicon. Smaller embedded CPU's
tend to not use cache memory (they run at speeds compatible with
uncached ram), so they're less vulnerable to timing attacks.

Via x86 processors (made in Taiwan) already have AES, but I guess
there are political obstacles against US chipmakers adding it, so
security suffers.

BRG

unread,

Apr 16, 2005, 4:22:33 AM4/16/05

to

D. J. Bernstein wrote:

> BRG wrote:
>
>>But I have tried this several times (although not in the last couple of
>>years) and I have always seen a large speed penalty.
>
> Which CPUs? Did you avoid crossing 8-byte boundaries?

Pentium Pro and, IIRC, P2. I used the table layout that you use and
tested with all the different table alignments to actually measure the
effect this had.

> My own public-domain AES software (see http://cr.yp.to/mac.html) uses
> compressed tables---the usual 8-byte T,S,S,S^T,T,S,S,0---for aes_athlon,
> aes_ppro, aes_aix, and aes_macos. It's as fast as any other published
> code, so the penalty is obviously not terrible.

I don't accept your speed comparisons with other published code because
you are not comparing like with like. My comparisons were all coded in
assembler and all had the same AES functionality with only the table
formats and alignments varying.

> Dag Osvik pointed out that the Athlon does one unaligned load per cycle,
> as opposed to two normally; but I'm limiting myself to one load per
> cycle anyway, so that isn't a problem.

>>This is not only a gain in timing attack terms but is also a big gain
>>in terms of cache loading overhead for AES more generally.
>
> Yup. There's definitely a speedup in real-world applications, although
> there's also an unfortunate tendency of cryptographic papers to focus on
> small-scale benchmarks that can't see the speedup.

Its certainly possible that the cost in unaligned loads is now more than
offset by improved cache utilisation.

I am hence inclined to add table compression as an option in my code but
I am interested in hearing wider views about the processor features
under discussion before doing this.

I agree that timing information can easily be misinterpreted.

Brian Gladman

BRG

unread,

Apr 16, 2005, 4:34:29 AM4/16/05

to

D. J. Bernstein wrote:

> Paul Rubin wrote:
>
>>Got any thoughts about Serpent in the "bit slice" implementation style?
>
> The Serpent speed records are from Dag Osvik. For PPro (similar to PII
> etc.), his paper says 770 cycles to encrypt each block (compared to ~225
> for Rijndael), or 1876 including key setup (compared to ~400 for
> Rijndael). He mentioned that he has better results now; I'm not sure how
> much better. Presumably all these timings are independent of key/input.

For Serpent in a bit-slice implementation I report 570 cycles/block for
encryption here:

http://fp.gladman.plus.com/cryptography_technology/serpent/index.htm

This was a 'two block parallel' bit-slice implementation but on a modern
processor we could do 'four block parallel' and gain more speed for
applications that can exploit simultaneous multiple block encryption.

[snip]

Brian Gladman

jsa...@ecn.ab.ca

unread,

Apr 16, 2005, 9:44:43 AM4/16/05

to

D. J. Bernstein wrote:
> Tom St Denis wrote:
> > You talk in your paper that to avoid the attack you have to
completely
> > avoid cache stalls.

> No. What I said was that to _retain confidence in AES_ we need
> _constant-time software_. ``Avoid the attack'' and ``retain
confidence''
> are not the same thing; see page 13. ``Take constant time'' is also
an
> extremely difficult task; see Sections 10 through 15.

It would seem there are two choices. Either use programs, or hardware,
that takes exactly the same amount of time regardless of the key or the
data, or ensure that information about the time taken to encrypt data
is not accessible externally.

If I compose an E-mail on a computer, and encrypt it using a utility
like PGP, and later transmit it, I need not be concerned about timing
attacks by people who only have access to the encrypted E-mail.

One possibility is to have a "black box" with some memory inside
processors to execute encryption programs, and then release the result
at a specified (worst-case) time - so that whether it works, or is
idle, no resources are used that are shared with other programs.

John Savard

Ari Silversteinn

unread,

Apr 16, 2005, 3:04:17 PM4/16/05

to

On 15 Apr 2005 21:32:50 -0700, Paul Rubin wrote:

> The real fix is to put AES hardware into all Pentium-class CPU's. In
> such large cpu's, it's not much extra silicon. Smaller embedded CPU's
> tend to not use cache memory (they run at speeds compatible with
> uncached ram), so they're less vulnerable to timing attacks.
>
> Via x86 processors (made in Taiwan) already have AES, but I guess
> there are political obstacles against US chipmakers adding it, so
> security suffers.

Is there a purchasable chip in the USA that could be added or built around
for a PC?
--
Drop the alphabet for email

Paul Rubin

unread,

Apr 16, 2005, 3:09:28 PM4/16/05

to

Ari Silversteinn <abcarisi...@yahoo.comxyz> writes:
> > Via x86 processors (made in Taiwan) already have AES, but I guess
> > there are political obstacles against US chipmakers adding it, so
> > security suffers.
>
> Is there a purchasable chip in the USA that could be added or built around
> for a PC?

Um, sure, there are all kinds of AES chips, and as mentioned the Via
x86 processors have AES built in. You can buy the Via chips in the
US. They just don't make them here.

D. J. Bernstein

unread,

Apr 16, 2005, 3:32:54 PM4/16/05

to

BRG wrote:
> I don't accept your speed comparisons with other published code because
> you are not comparing like with like.

At a lower level I am. For example, timings show that each round---
including 16 table loads, of which 12 are unaligned, and 4 key loads---
is taking 23 Pentium M cycles. Everyone conjectures that it's impossible
to beat 20 Pentium M cycles.

At an even lower level: Agner Fog says that ``misaligned operands
smaller than 16 bytes that do not cross a 32-byte boundary give no
penalty'' for the PPro/PII/PIII and that ``there is little or no penalty
for misaligned operands smaller than 16 bytes if reads do not occur
shortly after writes to the same address'' for the P4.

Of course, cycle counts are complicated; this is why timing attacks on
AES are so easy. Even if you have carried out extensive experiments and
read all available documentation, you will occasionally be surprised by
some undocumented timing variability that didn't show up in previous
experiments. But it's clear that these CPUs can handle compressed AES
tables without serious penalties.

BRG

unread,

Apr 16, 2005, 5:25:12 PM4/16/05

to

D. J. Bernstein wrote:
> BRG wrote:
>
>>I don't accept your speed comparisons with other published code because
>>you are not comparing like with like.
>
> At a lower level I am. For example, timings show that each round---
> including 16 table loads, of which 12 are unaligned, and 4 key loads---
> is taking 23 Pentium M cycles. Everyone conjectures that it's impossible
> to beat 20 Pentium M cycles.

Nevertheless you compare your assembler code with other people's C code,
which is unsound since compiler performance becomes a significant factor
in any such comparison.

Moreover comparing AES code designed for high key agility (on the fly
key expansion) with code designed for static keying (pre computed key
schedule) is bound to be misleading if the testing regime favours one
design approach rather than the other.

[snip]
Brian Gladman

Ari Silversteinn

unread,

Apr 16, 2005, 6:20:10 PM4/16/05

to

On 16 Apr 2005 12:09:28 -0700, Paul Rubin wrote:

>> Is there a purchasable chip in the USA that could be added or built around
>> for a PC?
>
> Um, sure, there are all kinds of AES chips, and as mentioned the Via
> x86 processors have AES built in. You can buy the Via chips in the
> US. They just don't make them here.

Thanks, I got the wrong interpretation and AES imported, I thought, might
have been an issue.

D. J. Bernstein

unread,

Apr 16, 2005, 6:47:54 PM4/16/05

to

BRG wrote:
> Nevertheless you compare your assembler code with other people's C code,
> which is unsound since compiler performance becomes a significant factor
> in any such comparison.

The objectives are (1) top speed and (2) no timing leaks. It's hard
enough to reach these objectives with a sensible choice of programming
tools. If the objectives are even more difficult to reach with a bad
choice of programming tools, that's something that should be emphasized,
not covered up.

> Moreover comparing AES code designed for high key agility (on the fly
> key expansion) with code designed for static keying (pre computed key
> schedule) is bound to be misleading if the testing regime favours one
> design approach rather than the other.

One problem is to compute AES_k(n) from n and k. Another problem is to
compute AES_k(n) from n and an expanded version of k. Both problems show
up in practice. If the optimal solutions are different, that's again
something that should be emphasized, not covered up.

Anyway, compressed tables are beneficial for both problems. Since I
haven't seen your code, I don't know why you were seeing big slowdowns
from compressed tables.

BRG

unread,

Apr 16, 2005, 8:03:56 PM4/16/05

to

D. J. Bernstein wrote:
> BRG wrote:
>
>>Nevertheless you compare your assembler code with other people's C code,
>>which is unsound since compiler performance becomes a significant factor
>>in any such comparison.
>
> The objectives are (1) top speed and (2) no timing leaks. It's hard
> enough to reach these objectives with a sensible choice of programming
> tools. If the objectives are even more difficult to reach with a bad
> choice of programming tools, that's something that should be emphasized,
> not covered up.

Whether C or assembler is the better choice for implementation will be
determined by the applications context.

I am simply pointing out that in comparing the speed of your assembler
code with the speed of other people's C code you are not comparing like
with like. Its not a great revelation to find that assembler turns out
to be faster than C.

>>Moreover comparing AES code designed for high key agility (on the fly
>>key expansion) with code designed for static keying (pre computed key
>>schedule) is bound to be misleading if the testing regime favours one
>>design approach rather than the other.
>
> One problem is to compute AES_k(n) from n and k. Another problem is to
> compute AES_k(n) from n and an expanded version of k. Both problems show
> up in practice. If the optimal solutions are different, that's again
> something that should be emphasized, not covered up.

I have always seen this as obvious and I don't know of any attempts to
cover this up.

> Anyway, compressed tables are beneficial for both problems. Since I
> haven't seen your code, I don't know why you were seeing big slowdowns
> from compressed tables.

Brian Gladman

Louis Scheffer

unread,

Apr 17, 2005, 3:52:04 AM4/17/05

to

"D. J. Bernstein" <d...@cr.yp.to> writes:

>Tom St Denis wrote:

>> Couldn't you just insert noise in the process to lower the bias?

>Adding noise doesn't lower the bias. The attacker sees through the noise
>by averaging over a larger number of packets. See page 5. See also [7].

>The number of packets needed is proportional to the timing variance (the
>square of the timing deviation). I was already dealing with a variance
>over a thousand squared cycles; your specific look-up-a-few-entries
>tweak wouldn't have noticeably affected the variance. Hiding behind a
>cable modem is a much better way to add variance, although the attack is
>still doable with enough effort.

This is not clear to me, in practice. A session of ping -s shows at least
1 ms rms deviation on all servers I tried. According to the chart in your
paper, you are looking at 10 cycle deviations, or about 10^{-8} sec. So
making this work over typical inter-institution networks would require
roughly 10^10 trials. This is months at 1000 packets/second, even
assuming you can arrange for 10 billion known plaintexts.

On a more theoretical note, beating down the noise by factors of 1000s is
not always possible. To do so requires the noise to have some nice
statistical properties, which are not guaranteed in real life. (For
example, router delays are definitely not stationary - they vary with the
time of day, and the day of week, among other factors.) So the
routers between you and the target may introduce timing variances that
cannot be averaged out. Routers have CPUs and caches, use table lookup,
are subject to delays due to other traffic, and so on. I'd guess the
data dependency (and delays from other traffic, etc) is much worse for
routers than crypto algorithms. This may well get worse as routers do
more "deep inspection" for viruses, worms, etc.

Exploiting this in practice, against an opponent at least one hop distant
through the public internet, would seem very difficult. It would be an
good experiment to try, though.

If you do need to fix this at the CPU level, there are many possibilities.
For example, memory is cheap, so you could perhaps provide a "fast array"
which is few thousand bytes of dedicated memory, not shareed with anything
else, so all array accesses ARE constant time. (it would be saved and
restored, when needed, as a unit to avoid traffic conflcits).
This could also be useful DSP coefficients, FFT tables, and so on.

Lou Scheffer

D. J. Bernstein

unread,

Apr 17, 2005, 3:53:50 AM4/17/05

to

Louis Scheffer wrote:
> A session of ping -s shows at least 1 ms rms deviation on all servers
> I tried.

That's excessive even for a cable modem. Did you forget to eliminate
outliers?

When I send pings several miles from UIC to the University of Chicago,
the ping program claims a deviation of 0.51 ms, but the bottom half of
the times have a deviation of only 0.01 ms---just 6000 CPU cycles on a
typical 600MHz server. The noise within UIC is even smaller.

Furthermore, it's possible to increase the signal to hundreds of CPU
cycles, as explained on page 11. This would require the attacker to look
at the target software more carefully than I did, but you're kidding
yourself if you think that it's particularly difficult.

> On a more theoretical note, beating down the noise by factors of 1000s is
> not always possible.

It is always possible, and my paper gives an example of doing it. Figure
5.2 shows noise around 0.05 cycles; the actual signal is flat for 8-byte
stretches. The original deviations were over 100 cycles, as mentioned on
page 5 and on page 8. That's a reduction factor of 2000.

> To do so requires the noise to have some nice
> statistical properties, which are not guaranteed in real life. (For
> example, router delays are definitely not stationary - they vary with the
> time of day, and the day of week, among other factors.)

No, that's completely wrong. Changes in the noise over time are
irrelevant. There's no correlation between the noise and the AES input
bytes, and therefore no correlation between the noise and the signal.
The only important feature of the noise is its volume.

D. J. Bernstein

unread,

Apr 17, 2005, 4:41:57 AM4/17/05

to

BRG wrote:
> I am simply pointing out that in comparing the speed of your assembler
> code with the speed of other people's C code you are not comparing like
> with like.

Speed is speed. I'm simply reporting all the timings I found; I'm not
excluding any particular programming language. When I say that your AES
code reportedly takes 482 Pentium-III cycles (202 for expansion, 280 for
encryption), I'm making no comments about how that speed was achieved.

If someone wants to explain bad performance or timing leaks by saying
``I decided to use C and couldn't do better in C,'' that's useful data
for implementors choosing programming tools. But saying ``I decided to
use C and it's unfair to be compared to asm'' is silly.

Anyway, all I was saying was that the published aes_ppro code takes 23
Pentium III cycles per round with compressed tables, and nobody claims
that uncompressed tables can do better than 20 cycles, so obviously
there's not a serious slowdown from compressed tables. In fact, I see
no reason to disbelieve Agner Fog's statement that there's zero penalty
for arbitrary alignment within lines on the PPro/PII/PIII.

Rob Warnock

unread,

Apr 18, 2005, 4:53:48 AM4/18/05

to

D. J. Bernstein <d...@cr.yp.to> wrote:
+---------------

+---------------

Doesn't seem at all excessive to me. I have a really good symmetric DSL
connection and I see much worse than 1ms *all* the time, e.g., picking
a server at random:

20 packets transmitted, 20 packets received, 0% packet loss
round-trip min/avg/max/stddev = 71.346/73.763/84.398/2.595 ms

It just depends on whether you happen to go through a "busy" peering
point or not. When things get busy, variance gets *BAD*, e.g., the same
remote host as above, a half-minute later:

20 packets transmitted, 20 packets received, 0% packet loss
round-trip min/avg/max/stddev = 71.286/82.635/140.136/19.699 ms

This sort of variation is quite typical, actually.

+---------------

| When I send pings several miles from UIC to the University of Chicago,

+---------------

*HAH!* You're probably not going through more than a couple of routers.
[Use "traceroute" to see exactly how many.] Try a path that goes halfway
across the country, something with over a dozen hops that goes through
at least two major peering points, and you'll see what I mean.

-Rob

-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607

BRG

unread,

Apr 18, 2005, 6:11:09 AM4/18/05

to

D. J. Bernstein wrote:

> BRG wrote:
>
>>I am simply pointing out that in comparing the speed of your assembler
>>code with the speed of other people's C code you are not comparing like
>>with like.
>
> Speed is speed. I'm simply reporting all the timings I found; I'm not
> excluding any particular programming language. When I say that your AES
> code reportedly takes 482 Pentium-III cycles (202 for expansion, 280 for
> encryption), I'm making no comments about how that speed was achieved.

My point is not about your comments on the speed of my code, its a more
general point (I do think that your commentary on my code is wrong but
that's not an issue that worries me).

> If someone wants to explain bad performance or timing leaks by saying
> ``I decided to use C and couldn't do better in C,'' that's useful data
> for implementors choosing programming tools. But saying ``I decided to
> use C and it's unfair to be compared to asm'' is silly.

My point is not about fairness, its about accuracy and completeness
given the purpose for which you are presenting comparisons.

> Anyway, all I was saying was that the published aes_ppro code takes 23
> Pentium III cycles per round with compressed tables, and nobody claims
> that uncompressed tables can do better than 20 cycles, so obviously
> there's not a serious slowdown from compressed tables. In fact, I see
> no reason to disbelieve Agner Fog's statement that there's zero penalty
> for arbitrary alignment within lines on the PPro/PII/PIII.

I might look at this on current processors if I find the time.

Brian Gladman

D. J. Bernstein

unread,

Apr 18, 2005, 6:56:20 AM4/18/05

to

In response to a claim of ``at least 1 ms rms deviation on all servers,''
I said ``That's excessive even for a cable modem. Did you forget to
eliminate outliers?'' I then spent a paragraph giving a typical example
where crude preprocessing reduced the deviation by a factor of 50 while
dividing the number of samples by just 2.

Rob Warnock wrote:
> I see much worse than 1ms *all* the time

Instead of quoting irrelevant ``stddev'' numbers from ping, try graphing
a bunch of round-trip times in increasing order, figuring out what the
shape means, and then calculating the deviation _without_ outliers.

> You're probably not going through more than a couple of routers.

Well, yes, as an attacker I think I can get myself within several miles
of a typical Internet target. If the AES marketing slogan is

AES. Speedy. Standard. Safe for several months of continuous use if
everyone within six router hops is trustworthy.

then I think it's time to review our notion of cryptographic security.

Michael Amling

unread,

Apr 18, 2005, 9:46:06 AM4/18/05

to

Rob Warnock wrote:
>
> *HAH!* You're probably not going through more than a couple of routers.
> [Use "traceroute" to see exactly how many.] Try a path that goes halfway
> across the country, something with over a dozen hops that goes through
> at least two major peering points, and you'll see what I mean.

What is your point here? That attackers can never attack from a few
miles (or a few blocks or a few meters) away? That you need not defend
against attackers that are less than halfway across the country?

--Mike Amling

Vernon Schryver

unread,

Apr 18, 2005, 11:09:43 AM4/18/05

to

In article <ySO8e.1056$s12...@newssvr31.news.prodigy.com>,
Michael Amling <nos...@nospam.com> wrote:

How about "the sky has not yet fallen"?

Attackers that are less than halfway accross the country are in
fact more worrisome. They are far more likely to use other kinds
of cryptoanalysis including key loggers, rubber hoses, bribery, and
extortion. Attackers that are within a couple of high speed router
hops from the target will often be able walk over to the target
system and look for PostIt Notes with keys.

What about the varience in the timings from other software on the same
system? Real systems run a lot of other stuff besides AES code. There
are reason why real systems have MBytes of cache, and those reasons
will introduce a *lot* of noise into measurements needed for this
attac. Extracting remote timing in general is not as easy as many
sci.crypt contributors claim, whether it is fingerprinting a system's
clock or this application. Anyone who wants to make claims about the
topic should have substantial experience with schemes such as NTP, the
network time protocol.

These results are quite interesting, but not nearly as interesting as
the headings suggested. Remote extraction of an AES key in an academic
setting using victim software specially crafted to be vulnerable is
interesting but not fatal, particularly given the work factor involved.
Extracting even a few bits of AES key will look like a denial of service
attack to all but the biggest systems. The giant systems that won't
notice the number of operations required, particularly when the
encryptions are not purely artificial but part of a real application,
probably ought to be using special purpose encryption hardware. It's
not exactly news that encyption hardware must not leak bits via timing.

One reasonable response is
Either use an encryption system that works in constant or limit the
total number of encryptions done per day.

Isn't it a good practice in general to limit the number of encryptions
used for any key?

This thread is a lot like the SHA-and-MD5-are-broken threads, interesting
but grossly overstated. Just as a secure system cannot be built by
ignoring everything but the encryption scheme, a system cannot be
broken by looking only at the encryption scheme. What matters is the
(in)security of the entire system. It is silly to concentrate entrely
on any single part of a system, including the encryption scheme.

I've been stewing about another recent sci.crypt sloppy claim that
cyprotographic hashes are "collision resistent" in the context of
ordinary, bit rot data integrity. That is balderdash, because it
implies that other hash functions such as an ordinary CRCs of the same
lengths are not "collision resistent." In fact, it is easier to
support a claim that a simple CRC is more "collision resistent" than
SHA-256 by considering the distribution of simple products and sums
of uniformly (or otherwise) distributed random variables. A cryptographic
hash has a particular kind of collision resistence which is important,
but not the only important kind.

Vernon Schryver v...@rhyolite.com

thomas...@gmail.com

unread,

Apr 18, 2005, 11:18:44 AM4/18/05

to

> Well, yes, as an attacker I think I can get myself within several
miles of a typical Internet target.

To drive this point home further, readers should consider the new
threat model this implies: basic network access anywhere topologically
close to the target translates to losing key information. "Basic
network access" doesn't imply supervisor privileges, continuous
access, or even knowledge of your "stepping stone" targets. Previously
stupid attacks that, under this model, could now be catastrophic:

- Outlook viruses

- "Nobody" privileges from PHP injection attacks

- Compromised "decoy" systems

I think I'm reiterating points that smarter people have already made
here, but if putting this issue in a "practical" context (like Tom St
Denis and Rob Warnock are trying to do) helps increase attention to
side-channel attacks, it's probably worth the extra noise.

D. J. Bernstein

unread,

Apr 18, 2005, 7:45:11 PM4/18/05

to

I always love seeing corporate-style responses to security problems.
``A known-plaintext attack? Ha! You can't even start your attack until
you learn our secret plaintext! Where do you think you'll get _that_?''

Vernon Schryver wrote:
> Isn't it a good practice in general to limit the number of encryptions
> used for any key?

``You used millions and millions of blocks? Ha! We've used Dynamic Key
Modification technology to ensure that no key is ever used for more than
one block!''

I must have missed the NIST memo saying ``WARNING: ALWAYS SWITCH AES
KEYS AFTER B BLOCKS.'' What's the number B, Vernon? How many blocks
should an implementor allow under one AES key before switching keys?

Presumably you're aware that naming a small B---demanding very fast key
switches---will display a shocking ignorance of real-world cost issues.
Presumably you're also aware that naming a large B---allowing a key to
be used for a while---will embarrass you as soon as someone carries out
a timing attack on the B blocks that you claimed were safe.

So what's the number B, Vernon? Is there any actual content to your
``limit the number of encryptions'' comment? Try to think about this
from a real-world perspective. What is the implementor supposed to do?

> What about the varience in the timings from other software on the same
> system? Real systems run a lot of other stuff besides AES code.

``You need accurate timings? Ha! Our operating system is MultiThreaded,
MultiProcessing, MultiUser, and MultiModal, making its actual timings
completely unpredictable! And we've turned off the clock!''

I must have missed the NIST memo saying ``WARNING: ALWAYS HIDE AES
TIMINGS BEHIND C CYCLES OF NOISE.'' What's the number C, Vernon? How
much noise should an implementor make sure to put into his system?

As above, presumably you're aware of the risks of naming a number.
You'll make a fool of yourself if you demand a noise level that's
expensive to generate, and you'll also make a fool of yourself if you
don't demand enough noise to stop the attack.

> Extracting even a few bits of AES key will look like a denial of
> service attack to all but the biggest systems.

That's even dumber than your previous comments. There's no requirement
for the attacker to generate _any_ packets---let alone to send them at
any particular speed---as long as he can watch the packet response
times. In fact, for many common protocols, the attacker doesn't even
need to know the plaintexts.

Next I suppose you'll claim that cryptography doesn't have to be secure
against attackers who can see legitimate packets. Those attackers can
``walk over to the target system and look for PostIt Notes with keys,''
so it's pointless to try to defend against them, right?

Tom St Denis

unread,

Apr 18, 2005, 8:07:22 PM4/18/05

to

D. J. Bernstein wrote:
> I always love seeing corporate-style responses to security problems.
> ``A known-plaintext attack? Ha! You can't even start your attack
until
> you learn our secret plaintext! Where do you think you'll get
_that_?''
>
> Vernon Schryver wrote:
> > Isn't it a good practice in general to limit the number of
encryptions
> > used for any key?
>
> ``You used millions and millions of blocks? Ha! We've used Dynamic
Key
> Modification technology to ensure that no key is ever used for more
than
> one block!''
>
> I must have missed the NIST memo saying ``WARNING: ALWAYS SWITCH AES
> KEYS AFTER B BLOCKS.'' What's the number B, Vernon? How many blocks
> should an implementor allow under one AES key before switching keys?

<snip>

I know I said I was staying out of this ... but you need a bitchslap
back to reality.. Your attack *really* is impractical over many
situations.

Take my low bandwidth relatively high latency SSH session. ... oh and
it re-keys every hour [I can change that]. Oh wait, your attack
doesn't apply there.

Ok, wait wait wait. Take my GPG encrypted emails... oh wait your
attack is online only.

Ok, take a network switch with say GCM-AES ... oh wait that's in
hardware and your specific timing attack doesn't apply.

This is silly, I know. Let's take my SSL connection to banks. ... you
can work with say 100KB of data right? .. I mean with 2^14 AES blocks
over a cable modem the attack works more than 1/2^50 of the time?

... fine Prof. Bernstein indulge my curiosity. To what did you image
applying your attack today that would actually impose a serious threat?

Tom

David Wagner

unread,

Apr 18, 2005, 11:08:04 PM4/18/05

to

D. J. Bernstein wrote:
>Vernon Schryver wrote:
>> Isn't it a good practice in general to limit the number of encryptions
>> used for any key?
>

>I must have missed the NIST memo saying ``WARNING: ALWAYS SWITCH AES
>KEYS AFTER B BLOCKS.'' What's the number B, Vernon? How many blocks
>should an implementor allow under one AES key before switching keys?

Sarcasm aside, I think Vernon is asking a reasonable question. I see
Vernon as asking whether there is a reasonable choice of B that makes
AES secure enough for use in (at least some) networked environments.
Seems like a fair question to me. I don't think Vernon is required to
know the answer to that question before asking it.

I noticed you explicitly say in your paper that you are not interested
in studying this question. That's your prerogative. It still strikes
me as a relevant question. I'm curious about the answer myself.

(If this isn't the question he meant to ask, I hope he will correct me.)

Henrick Hellström

unread,

Apr 19, 2005, 1:53:01 AM4/19/05

to

David Wagner wrote:

> D. J. Bernstein wrote:
>
>>Vernon Schryver wrote:
>>
>>>Isn't it a good practice in general to limit the number of encryptions
>>>used for any key?
>>
>>I must have missed the NIST memo saying ``WARNING: ALWAYS SWITCH AES
>>KEYS AFTER B BLOCKS.'' What's the number B, Vernon? How many blocks
>>should an implementor allow under one AES key before switching keys?
>
>
> Sarcasm aside, I think Vernon is asking a reasonable question.

Indeed. For instance, the SecSH (aka SSH) wg is already recommending
that transports are rekeyed at least every 2^32 blocks, due to the
security bounds of the cipher modes.

Rob Warnock

unread,

Apr 19, 2005, 2:37:14 AM4/19/05

to

Henrick Hellström <henrick....@telia.com> wrote:
+---------------

| > Sarcasm aside, I think Vernon is asking a reasonable question.
|
| Indeed. For instance, the SecSH (aka SSH) wg is already recommending
| that transports are rekeyed at least every 2^32 blocks, due to the
| security bounds of the cipher modes.

+---------------

And as Tom pointed out, the standard distribution of OpenSSH defaults
to "KeyRegenerationInterval 3600" (seconds). So if your network connection
is faster than ~152.7 Mbit/s [2^32 AES blocks/hr], it might be wise to
lower that parameter accordingly.

[Hmmm... That *does* have serious implications for "Internet 2"!]

D. J. Bernstein

unread,

Apr 19, 2005, 3:37:15 AM4/19/05

to

David Wagner wrote:
> I see Vernon as asking whether there is a reasonable choice of B that makes
> AES secure enough for use in (at least some) networked environments.

This is supposed to be a cipher standard for everyone to use. What are
we supposed to do with the applications where your frequent expensive
key changes still aren't sufficient to stop timing attacks?

When I say that the AES time variability is a huge security threat, I'm
not saying that _every_ application is in trouble. I'm saying that
_many_ applications are in trouble. Ignoring the problem of creating
high-speed constant-time cryptographic software means ignoring those
applications. That's violating the basic promise of a cipher standard.

David Wagner

unread,

Apr 19, 2005, 4:21:45 AM4/19/05

to

D. J. Bernstein wrote:
>When I say that the AES time variability is a huge security threat, I'm
>not saying that _every_ application is in trouble. I'm saying that
>_many_ applications are in trouble.

I understand. Still, there are practical reasons why, in the short
term, we might like to understand a little better which applications
are in trouble, and how much trouble they are in.

If you want to tell me this is hard to estimate, or that it is hard to
have confidence in one's estimate, or that such estimates are dangerous
because attacks always get better, or that it is far preferable to have
a system that is provably secure against timing attacks instead of one
with only heuristic "no one has figured out how to break it yet" security
-- well, I'll agree with you 100%. That doesn't really make the question
go away, though.

>Ignoring the problem of creating
>high-speed constant-time cryptographic software means ignoring those
>applications. That's violating the basic promise of a cipher standard.

Understood.

BRG

unread,

Apr 19, 2005, 4:36:00 AM4/19/05

to

D. J. Bernstein wrote:

> David Wagner wrote:
>
>>I see Vernon as asking whether there is a reasonable choice of B that makes
>>AES secure enough for use in (at least some) networked environments.
>
> This is supposed to be a cipher standard for everyone to use. What are
> we supposed to do with the applications where your frequent expensive
> key changes still aren't sufficient to stop timing attacks?
>
> When I say that the AES time variability is a huge security threat, I'm
> not saying that _every_ application is in trouble. I'm saying that
> _many_ applications are in trouble. Ignoring the problem of creating
> high-speed constant-time cryptographic software means ignoring those
> applications. That's violating the basic promise of a cipher standard.

We can all spend a lot of time talking this threat either up or down but
there is no substitute for taking an _information security_ approach to
what is a _potential_ problem in a low level cryptographic component.

Practical information systems are open to a large number of potential
threats, this being just one of many that have to be considered in
producing an appropriate design for any given applications context.

In that proportion of AES applications where potential key leakage as a
result of this timing attack is an issue, those responsible for the
security of the system will need to decide whether this attack is an
issue for them. If it is they will then have to decide which of a number
of possible countermeasures is available to them and which is the most
effective in their context.

This attack is interesting and the work you have done on it is first
class. Moreover the effort you have made to ensure that the attack is
visible is laudible.

But none of this is new in information security terms so there is no
need to promote a panic response by suggesting that there is something
seriously wrong with AES or that the whole cryptographic security world
is about to come crashing down on our heads. There isn't and it won't.

When you suggest that _many_ AES applications are in trouble, what
evidence do you have for this? Are you talking in absolute numbers or
in proportional terms?

My initial reaction has been that only a small proportion of AES
applications are likely to be in trouble. But I am certainly willing to
reflect on _objective_ evidence that suggests otherwise.

Brian Gladman

Stefan Tillich

unread,

Apr 19, 2005, 5:26:53 AM4/19/05

to

D. J. Bernstein schrieb:

[snip]

>
>
> Well, yes, as an attacker I think I can get myself within several miles
> of a typical Internet target. If the AES marketing slogan is
>
> AES. Speedy. Standard. Safe for several months of continuous use if
> everyone within six router hops is trustworthy.
>
> then I think it's time to review our notion of cryptographic security.

Well I think the concept of side-channel attacks has been around long
enough so that we don't need to do that. Any serious implementor of
cryptographic algromithms should be aware of that issue.

But of course, your work shows interesting possiblities.

Regards

Stefan Tillich

Tom St Denis

unread,

Apr 19, 2005, 7:18:54 AM4/19/05

to

D. J. Bernstein wrote:
> David Wagner wrote:
> > I see Vernon as asking whether there is a reasonable choice of B
that makes
> > AES secure enough for use in (at least some) networked
environments.
>
> This is supposed to be a cipher standard for everyone to use. What
are
> we supposed to do with the applications where your frequent expensive
> key changes still aren't sufficient to stop timing attacks?

I had this conversation with my buddy Dan Kaminsky and he came up with
a few threat scenarios for you [so you can use to push back people like
me].

One of them was a simple wifi access point. If the client does AES in
software chances are they're just using the ref-alg-fast code that
everyone [including myself] uses.

Another scenario is the case of a HTTPS file share [webdav?] where you
upload a large file [known plaintext] and watch when people download it
along with other files you don't have access to.

> When I say that the AES time variability is a huge security threat,
I'm
> not saying that _every_ application is in trouble. I'm saying that
> _many_ applications are in trouble. Ignoring the problem of creating
> high-speed constant-time cryptographic software means ignoring those
> applications. That's violating the basic promise of a cipher
standard.

But you've got it all wrong [and you're hitting on a good point]. The
AES specification specifies only the algorithm not it's implementation.
In fact it's perfectly legal to mix in [say] CAST5. So long as the
output is right for the inputs it doesn't matter how you get there.

However, there should be a standard towards implementations in software
and hardware that addresses this. E.g. the maximum allowed standard
deviation in cycle timing/power consumption/etc for a given
platform/configuration.

This also makes it a nightmare since now you have ...

"Why yes I'm NIST certified [*] and yes my AES implementation has been
individually certified on a 386, 486, 586, ppro, pii, piii, p4, k6,
k6-2, ..., sparc, alpha, G3, G4, G5, ARM7, MIPS4k, ..., intel 8051,
ATMEL AVR, Dallas 80C32, ..., motorola 6809, ..."

But we also have to be reasonable. The attack really doesn't apply to
"most" applications of AES [or any other crypto].

And besides the real way to use your attack is to make AES queries to
the server and watch the power meter at the side of the house. Over a
few 2^60 blocks or so you'll notice a signal over the noise and voila.
Your key.

Don't even need timing data...

Tom

Vernon Schryver

unread,

Apr 19, 2005, 9:04:33 AM4/19/05

to

In article <d41smk$j8q$1...@agate.berkeley.edu>,

David Wagner <daw-u...@taverner.cs.berkeley.edu> wrote:
>D. J. Bernstein wrote:
>>Vernon Schryver wrote:
>>> Isn't it a good practice in general to limit the number of encryptions
>>> used for any key?
>>
>>I must have missed the NIST memo saying ``WARNING: ALWAYS SWITCH AES
>>KEYS AFTER B BLOCKS.'' What's the number B, Vernon? How many blocks
>>should an implementor allow under one AES key before switching keys?
>
>Sarcasm aside, I think Vernon is asking a reasonable question. I see
>Vernon as asking whether there is a reasonable choice of B that makes
>AES secure enough for use in (at least some) networked environments.
>Seems like a fair question to me. I don't think Vernon is required to
>know the answer to that question before asking it.

That is part of my point. Isn't it true that for every application
of every encryption scheme, there exists such a B even without timing
or power attacks. For example, shouldn't B be related to the entropy
in the key? Or the range of key values? Should I announce that AES
is broken and useless because keys can be limited to 1 or 2 English
words padded with blanks? How many AES installations use such keying?--I
bet plenty and far more than are vulnerable to network timing attacks.

You are ignoring my larger point that such a timing attack involves
more than enough traffic to be a denial of service attack against
almost all systems. As far as I can tell from
http://cr.yp.to/antiforgery/cachetiming-20050414.pdf at least 2^27 or
100M network round trips are required with D.J.Bernstein's attacker-friendly
network and software. Add the timing variance that is inevitiable
with real networks and real applications on systems large enough and
so concurrently serving enough other users to handle more thn 100 M
attacking transactions, and you're talking about billions of attacking
HTTP server hits. There are systems that might not notice a billion
timing attack transactions, but AES timing attacks are unlikely to be
their biggest vulnerability.

Set aside the real world timing variance issues, and consider the time
and bandwidth required for 2^27 network operations. Unless you have
a foothold inside your target's network, a network round trip is likely
to be at least 50 milliseconds. 2^27 50 ms round trips for 400 byte
blocks require about 1800 hours or 75 days and 50 GByte. If you can
insert a trojan horse or other system close enough to do D.J.Bernstein's
"few thousand packets per second" (page 9) you might do such timing
attacks, but they won't be the main event and you won't do them until
after you don't care about being noticed, at least not today.

The work is quite interesting, but its importance is a warning about
only a potential problem that can be fixed.

Vernon Schryver v...@rhyolite.com

Roger Schlafly

unread,

Apr 19, 2005, 11:16:14 AM4/19/05

to

"D. J. Bernstein" <d...@cr.yp.to> wrote:

> When I say that the AES time variability is a huge security threat, I'm

> not saying that _every_ application is in trouble. I'm saying that ...

ISTM that the only real solution is to put AES directly on the chip,
and count on Intel to execute it in a constant number of cycles.
A current Pentium has about 1000x as many gates as a 486 chip,
so surely there is plenty of space for AES, SHA-1, and a few
other popular crypto algorithms.

Tim Smith

unread,

Apr 19, 2005, 12:17:42 PM4/19/05

to

In article <slrnd69d9u...@stoneport.math.uic.edu>,

"D. J. Bernstein" <d...@cr.yp.to> wrote:

> When I say that the AES time variability is a huge security threat, I'm
> not saying that _every_ application is in trouble. I'm saying that
> _many_ applications are in trouble. Ignoring the problem of creating
> high-speed constant-time cryptographic software means ignoring those
> applications. That's violating the basic promise of a cipher standard.

Suppose the cryptographic software takes between 1k and 20k cycles
(making up numbers for illustrative purposes), averaging 2k cycles.
Making it constant-time would make it always take 20k, and would make it
run an average of an order of magnitude slower.

Question: if, instead of making it constant-time, it were made "rounded
up to the next 4k"-time (e.g., add a delay at the end to bring the time
up to the next multiple of 4k, so it can take 4k, 8k, 12k, 16k, or 20k
cycles), how much would that help against your attack? Is there some
way along these lines to reduce the timing information enough to foil
your attack without making every operation take the maximum time?

--
--Tim Smith

Paul Rubin

unread,

Apr 19, 2005, 1:18:02 PM4/19/05

to

"Tom St Denis" <tomst...@gmail.com> writes:
> I had this conversation with my buddy Dan Kaminsky and he came up with
> a few threat scenarios for you [so you can use to push back people like me].
>

> One of them was a simple wifi access point. ...

Another is host security modules (HSM's). These are sealed processors
that run trusted code and contain encapsulated keys and ciphers.
Often they sit directly on an untrusted server's PCI bus, so there's
not even network latency between the attacker and the cipher.

RSA blinding to prevent timing attacks is normal and expected practice
on these types of devices. It sounds completely reasonable to expect
to also have some countermeasures against timing attacks on the
symmetric ciphers.

A lot of these modules contain DES hardware engines but not AES.
Timing attacks against AES might be a reason to choose 3DES (using the
module's DES hardware) instead of AES (using software in the module)
to secure certain types of applications.

Paul Rubin

unread,

Apr 19, 2005, 1:25:18 PM4/19/05

to

Tim Smith <reply_i...@mouse-potato.com> writes:
> Suppose the cryptographic software takes between 1k and 20k cycles
> (making up numbers for illustrative purposes), averaging 2k cycles.
> Making it constant-time would make it always take 20k, and would make it
> run an average of an order of magnitude slower.

I think slowing it down artificially doesn't mean having to slow it
down to the worst case. That 20k figure presumably involves a lot of
cache misses, so if you encrypt several blocks, all the blocks after
the first are likely to be much faster than 20k cycles.

Let's say your application uses counter mode. You can pre-encrypt 100
blocks at a time. There will be some variation in the time per block,
which might follow something like a normal distribution with a few
outliers. So the average time per block (just for the AES operation)
might be 400 cycles (25 cycles/byte) which a standard deviation of 20
cycles. That means that encrypting 100 blocks and delaying the output
til 48000 cycles after the operation starts (400 + 4 sigma) should
completely mask any timing effects 99.99+% of the time.

As Dan Bernstein mentions, there are other effects like interrupted
operations that have to be taken into account when trying to insert
delays. It may have to be done at a low OS level.

Gregory G Rose

unread,

Apr 19, 2005, 5:13:47 PM4/19/05

to

In article <1113869242.0...@o13g2000cwo.googlegroups.com>,

Tom St Denis <tomst...@gmail.com> wrote:
>... fine Prof. Bernstein indulge my curiosity. To what did you image
>applying your attack today that would actually impose a serious threat?

IPsec traffic tunneled through a security gateway?
(Note that the cellphone industry in Europe has
standardized this.)

Greg.
--
Greg Rose
232B EC8F 44C6 C853 D68F E107 E6BF CD2F 1081 A37C
Qualcomm Australia: http://www.qualcomm.com.au

Tom St Denis

unread,

Apr 19, 2005, 5:32:31 PM4/19/05

to

Gregory G Rose wrote:
> In article <1113869242.0...@o13g2000cwo.googlegroups.com>,
> Tom St Denis <tomst...@gmail.com> wrote:
> >... fine Prof. Bernstein indulge my curiosity. To what did you
image
> >applying your attack today that would actually impose a serious
threat?
>
> IPsec traffic tunneled through a security gateway?
> (Note that the cellphone industry in Europe has
> standardized this.)

Use hardware AES or custom craft some code.

I work at a hardware firm and I know that "cellphone bandwidth" capable
AES logic isn't that large or gate costly. It's probably cheaper to
just use AES hardware than the space required to store the software and
a processor efficient enough to get the bandwidth [not to mention the
power usage].

Yes, there are specific cases where this is a problem [e.g. wifi access
point] but those are specific cases. In those particular instances you
can just use an implementation that is immune to timing attacks.

Tom

D. J. Bernstein

unread,

Apr 19, 2005, 6:54:04 PM4/19/05

to

David Wagner wrote:
> Still, there are practical reasons why, in the short
> term, we might like to understand a little better which applications
> are in trouble, and how much trouble they are in.

Some key positions had a signal as small as 1 cycle for 8 of the 256
input bytes; see Figure 5.1. Other key positions had larger signals; see
Figure 5.2 and Figure 5.3.

Let me emphasize at this point that any serious attacker will construct
packets that knock selected server AES table entries out of L2 cache,
increasing the signal to _hundreds_ of cycles. Large target servers (and
their surrounding operating systems) are actually _easier_ to attack in
this way than a tiny sample server. You're deluding yourself if you
think that this is particularly difficult.

But I didn't want to bother constructing those packets. I wanted to see
whether AES could survive an embarrassingly simple attack. (Answer: No.)

Anyway, because the signal strength varied, some key bits appeared long
before others. I was reminded of the cryptanalysis in typical movies
such as Wargames, where tension mounted as key digits were locked in,
one after another.

For example, here were the top k[15] choices, as measured by the
correlations explained in my paper, after a limited number of packets:

65536 packets: 34 (correlation 124), 130 (107), 191 (103), 53 (101), ...
131072 packets: 53 (126), 129 (106), 130 (101) 34 (93), 135 (82), ...
262144 packets: 53 (147), 54 (108), 49 (75), 130 (60), 15 (57), ...
524288 packets: 53 (175), 54 (109), 15 (73), 49 (72), 115 (65), ...

Of course, 53 later turned out to be the correct key byte, and anyone
who understands noise calculations will find it completely unsurprising
that 53 appeared at this point.

Here was the situation for all key bytes after 1048576 packets:

* The lists for k[3] and k[15] were topped by the numbers that later
turned out to be correct.

* The lists _ignoring bottom bit_ for k[9] and k[14] were topped by
the numbers that later turned out to be correct. Before the attack
I had already determined that these packets wouldn't identify the
bottom bit.

* The lists _ignoring bottom three bits_ for k[1], k[2], k[4], k[5],
k[7], k[8], k[10], k[11], and k[13] were topped by the numbers that
later turned out to be correct. Before the attack I had already
determined that these packets wouldn't identify those bits.

* The lists ignoring bottom three bits for k[0] and k[6]---which,
before the attack, I had identified as trouble areas having
relatively weak signals---had the right numbers in third place.
As for k[12], I knew that its signal was so weak that the list
would be completely uninformative at this point.

In other words, out of the 75 key bits that were likely to have appeared
at this point, every single one of them had, in fact, appeared. The
obvious brute-force search through 2^53 keys---obvious meaning that I
knew before the attack that it was the right search---would have
successfully identified the correct key.

Another packet size (600 bytes) added different information: the missing
bits of k[4] and k[8], and all the bits of k[12]. At this point the
obvious brute-force search through 2^39 keys would have successfully
identified the correct key.

The search would have taken me a couple of CPU days with my own fast AES
software; and it was still possible, from what I knew after this number
of packets, for the search to fail. But you're deluding yourself if you
think that the key was some sort of secret at this point.

Anyway, I kept going for 32 times as many packets, and for another
packet size with even more packets. This was massive overkill, of
course. I was again reminded of the typical Hollywood cryptanalysis,
where the cryptanalyst never seems to switch over to brute-force search,
even when it will obviously finish the job very quickly.

At the end, a casual automated analysis---see Appendix D in my paper---
told me that it was utterly inconceivable for my final search through
2^23 keys to fail. A crude searching program finished in a minute,
finding the correct key about halfway through. A serious program would
have taken two seconds.

You ask for an estimate of how much trouble AES users are in. Answer:
Against typical Internet servers running pretty much any AES software,
by forcing several selected lines out of L2 cache (which means sending
packets constructed to attack particular server software), I'd expect to
achieve a signal on the scale of 0.1 microseconds for at least 10% of
all inputs. This signal has to compete with, typically, 10-microsecond
network noise; but typical protocols allow AES keys to be used for
millions of packets, more than enough for the attacker to find the key.

Tom St Denis

unread,

Apr 19, 2005, 7:03:43 PM4/19/05

to

D. J. Bernstein wrote:
> But I didn't want to bother constructing those packets. I wanted to
see
> whether AES could survive an embarrassingly simple attack. (Answer:
No.)

<snip>

WTF? ....

Are you incapable of seeing the distinction between DESIGN and
IMPLEMENTATION?

Yes, you're right it's an attack that we have to keep in the back of
our minds. Yes we were largely duped by the fast code as to the AES
software speeds. That being said... You have yet to find flaw in AES.
What you are breaking are instantiations of AES in software. It's
possible to use AES in a way that is immune to your attack [in software
and hardware].

Your attack also applies to other ciphers because it's an
IMPLEMENTATION ISSUE.

I hate to say it but your continual misunderstanding between design and
implementation is really putting you up for a crank award. You're not
the first to come up with side channel attacks. So take your 15 mins
of fame and sit the f!@# down.

And your Salsa10 cipher/prng/whatever is not the answer. How did you
pick the operations? The rotation values? The order of the operands?
What's the branch of the transform? Is it complete? ...

You can do bitsliced wide-trail designs. Noekeon for instance which is
slower than AES but should be constant time. RC6 is also constant time
on most processors, etc.

blah blah blah....

Tom

D. J. Bernstein

unread,

Apr 19, 2005, 7:11:51 PM4/19/05

to

Tom St Denis wrote:
> The AES specification specifies only the algorithm not it's implementation.

The AES designers and official evaluators

* considered timing attacks in detail,
* claimed that table lookup was ``not vulnerable to timing attacks,''
* claimed that Rijndael gained a ``major speed advantage over its
competitors'' for software protected against timing attacks,
* made the same comment in its summaries of the finalists, and
* made the same comment in its paragraph explaining the selection of
Rijndael.

The problem is that, when they stated ``Table lookup: not vulnerable to
timing attacks,'' they were simply wrong. Table lookup _is_ vulnerable
to timing attacks.

All of this is in the published record of the AES selection process. See
Section 7 of my paper for citations.

Tom St Denis

unread,

Apr 19, 2005, 7:36:55 PM4/19/05

to

D. J. Bernstein wrote:
> Tom St Denis wrote:
> > The AES specification specifies only the algorithm not it's
implementation.
>
> The AES designers and official evaluators
>
> * considered timing attacks in detail,
> * claimed that table lookup was ``not vulnerable to timing
attacks,''
> * claimed that Rijndael gained a ``major speed advantage over its
> competitors'' for software protected against timing attacks,
> * made the same comment in its summaries of the finalists, and
> * made the same comment in its paragraph explaining the selection
of
> Rijndael.
>
> The problem is that, when they stated ``Table lookup: not vulnerable
to
> timing attacks,'' they were simply wrong. Table lookup _is_
vulnerable
> to timing attacks.

Dude, no one is denying the attack works.

Find where in the AES specification it says "the implementation must
use four 8x32 tables in a local cache that is flushable".

The attack splits problems into two categories. Those which it applies
and those which it doesn't.

You're trying to suggest that these sets are the same and therefore AES
as a whole is broken. I'm trying to suggest that doesn't follow
reality whatsoever. Side channel attacks are ... surprisingly not new.

So my advice as the "lovable sci.crypt psychotic fool" is to tell you
"plug your attack properly". Why not play devils advocate a bit. ...
Or why not do what "industry" actually would want to see... a constant
time AES that is remotely fast.

Anyways, as it stands I don't see the immediate threat to any of the
crypto activities I do.... So I'll smile ;-)

Tom

D. J. Bernstein

unread,

Apr 19, 2005, 7:42:32 PM4/19/05

to

Vernon Schryver wrote:
> You are ignoring my larger point that such a timing attack involves
> more than enough traffic to be a denial of service attack against
> almost all systems.

It's amazing how much confusion you can pack into a single sentence.

You are, first of all, wildly overestimating the amount of traffic
needed by this embarrassingly simple timing attack. Yes, I sent 200
million packets (and used packet sizes much larger than the Internet
average), but that was also massive overkill.

Furthermore, you are wildly underestimating the amount of traffic
handled by today's web servers, VPN routers, and other machines that can
reasonably be expected to use AES.

Furthermore, you are confusing traffic _volume_ with traffic _rate_. A
large volume of traffic doesn't become a ``denial of service attack''
unless it's sent at high speed. I was working with a 100Mbps local
network, so my 3600 packets/second weren't a noticeable load.

Finally, you are fundamentally misunderstanding the nature of a timing
attack. A timing attack doesn't require the attacker to _generate_ any
packets. He simply has to _see_ the packet timings.

Your subsequent comment about ``2^27 50 ms round trips'' betrays further
confusion between the concepts of latency and throughput. Ever heard of
TCP windows, Vernon?

> > I must have missed the NIST memo saying ``WARNING: ALWAYS SWITCH AES
> > KEYS AFTER B BLOCKS.'' What's the number B, Vernon? How many blocks
> > should an implementor allow under one AES key before switching keys?

> Isn't it true that for every application of every encryption scheme,
> there exists such a B even without timing or power attacks. For
> example, shouldn't B be related to the entropy in the key?

No. Modern ciphers, such as AES, are designed to remain secure (i.e.,
indistinguishable from a uniform random permutation) for _any_ number
of blocks. If you look at Table 1 in the final AES report then you'll
see explicit consideration of attacks using all 2^128 inputs.

There are _some_ higher-level protocols that impose smaller limits for
various reasons. But your question said ``every.'' The answer is no.

The limits imposed by those protocols are nowhere near small enough to
stop timing attacks, by the way.

D. J. Bernstein

unread,

Apr 19, 2005, 8:17:18 PM4/19/05

to

Roger Schlafly wrote:
> ISTM that the only real solution is to put AES directly on the chip,
> and count on Intel to execute it in a constant number of cycles.

My paper makes quite a few recommendations to CPU designers, and that's
one of them. But this doesn't help people running AES on computers that
are purchased today, or that were purchased five years ago.

Another partial solution is to write constant-time low-speed AES
software for current CPUs by turning an AES circuit into a sequence of
|&^ etc. But this doesn't help people who need high speed.

My paper spends more than ten pages discussing the problem of building
constant-time high-speed AES software for current CPUs. This _appears_
to be possible, given OS support and enough CPU-specific effort by AES
implementors; see Sections 8 through 15 of my paper for more details.
Confidence might be achievable if Intel et al. stop hiding crucial CPU
documentation. There's some slowdown, but not nearly as bad as the |&^
approach.

Of course, another solution is to admit that there was a serious error
in the AES design and evaluation process (``Table lookup: not vulnerable
to timing attacks''), and to switch to a cipher that avoids this error.

Michael Amling

unread,

Apr 19, 2005, 8:42:08 PM4/19/05

to

Paul Rubin wrote:
>
> RSA blinding to prevent timing attacks is normal and expected practice
> on these types of devices. It sounds completely reasonable to expect
> to also have some countermeasures against timing attacks on the
> symmetric ciphers.

Speaking of blinding, IIRC, OCB applies a different secret mask to
each ciphertext block before decrypting it. My understanding is that the
timing attacker has to know the (unmasked) ciphertext to extract the
key. If so, just using OCB would be protection.

--Mike Amling

Michael Amling

unread,

Apr 19, 2005, 8:57:34 PM4/19/05

to

D. J. Bernstein wrote:
>
> Of course, another solution is to admit that there was a serious error
> in the AES design and evaluation process (``Table lookup: not vulnerable
> to timing attacks''), and to switch to a cipher that avoids this error.

Do any of the five AES finalists meet that criterion?

--Mike Amling

David Wagner

unread,

Apr 19, 2005, 10:50:58 PM4/19/05

to

Twofish doesn't. Serpent might, in appropriate bitsliced mode, but it
is noticeably slower than AES. MARS doesn't. I'm not sure whether it
will be possible to write a constant-time implementation of RC6; that may
depend on the target architecture, but given the data-dependent rotation,
I'm worried it may be hard to achieve constant-time status on at least
some architectures.

D. J. Bernstein

unread,

Apr 20, 2005, 1:59:53 AM4/20/05

to

Tom St Denis wrote:
> It's possible to use AES in a way that is immune to your attack

AES can certainly be computed in constant time---with painfully slow
software that, right now, nobody uses. Cryptography would be much easier
if there weren't any speed constraints.

I'm not saying that _all_ applications have speed constraints. Some
applications could tolerate AES software that doesn't use S-boxes. (Hint
to anyone who has written such software: now is a good time to publish.)

But many other applications do have speed constraints. As far as those
applications are concerned, the software you have in mind doesn't exist.

D. J. Bernstein

unread,

Apr 20, 2005, 2:16:40 AM4/20/05

to

Michael Amling wrote:
> Speaking of blinding, IIRC, OCB applies a different secret mask to
> each ciphertext block before decrypting it.

But it obtains its first mask by encrypting the nonce! That first
encryption is the obvious target of a timing attack.

The attacker doesn't need to see the output of those encryptions. He
just needs to see the timings. The timing attack identifies a small set
of likely keys; and then a small amount of sniffed ciphertext, combined
with a brute-force search, identifies the victim's key.

D. J. Bernstein

unread,

Apr 20, 2005, 2:28:48 AM4/20/05

to

A variable-distance rotation can be written as five constant-distance
rotations and some extra operations; see below. This isn't as painful as
the AES slowdown, although it's not pleasant.

There are also timing leaks in integer multiplication on, for example,
the Motorola PowerPC 7450 (G4e).

---D. J. Bernstein, Associate Professor, Department of Mathematics,
Statistics, and Computer Science, University of Illinois at Chicago

unsigned int rotate(unsigned int x,unsigned int n)
{
return (x << n) | (x >> (32 - n));
}

unsigned int rotate2(unsigned int x,unsigned int n)
{
unsigned int bits;
bits = -((n >> 4) & 1);
x = ((rotate(x,16)) & bits) | (x & ~bits);
bits = -((n >> 3) & 1);
x = ((rotate(x,8)) & bits) | (x & ~bits);
bits = -((n >> 2) & 1);
x = ((rotate(x,4)) & bits) | (x & ~bits);
bits = -((n >> 1) & 1);
x = ((rotate(x,2)) & bits) | (x & ~bits);
bits = -(n & 1);
x = ((rotate(x,1)) & bits) | (x & ~bits);
return x;
}

int main()
{
unsigned int x;
unsigned int n;
unsigned int loop;

x = 1;
for (loop = 0;loop < 1000;++loop) {
n = random();
printf("%08x\n%08x\n",rotate(x,n),rotate2(x,n));
x = rotate(x,n);
x += random();
}
}

David Wagner

unread,

Apr 20, 2005, 3:13:54 AM4/20/05

to

Paul Rubin wrote:
>Let's say your application uses counter mode. You can pre-encrypt 100
>blocks at a time. There will be some variation in the time per block,
>which might follow something like a normal distribution with a few
>outliers. So the average time per block (just for the AES operation)
>might be 400 cycles (25 cycles/byte) which a standard deviation of 20
>cycles. That means that encrypting 100 blocks and delaying the output
>til 48000 cycles after the operation starts (400 + 4 sigma) should
>completely mask any timing effects 99.99+% of the time.

Right. I'd been thinking about something like this. When doing many
encryptions, it seems like this might be effective at fairly low overhead.
I don't see anything specific to counter mode here, either; I would think
this would be applicable to CBC mode and any other type of operation that
involves doing many encryptions. In effect, you are amortizing the overhead
of the semi-worst-case delay over many encryptions.

Presumably one would tell how much to delay the output by using the cycle
counter to measure exactly how many cycles the encryption took. Given this,
I wonder whether we could further strengthen your defense by re-trying (or
aborting) if the encryptions took too long. Something like:

loop:
start = get_timestamp();
... do 100 encryptions ...
stop = get_timestamp();
if (stop-start > threshold)
goto loop;
delay(threshold-stop+start);

This is not constant-time, but I'm inclined to conjecture that it
doesn't reveal very much information. Basically, it reveals one bit of
information: Did the 100 encryptions take more than "threshold" cycles?
Moreover, this is a highly biased bit: the bit is zero 99.99+% of the
time, and one 0.01-% of the time. Consequently, one would expect the
information capacity carried by this channel to be exceptionally small.
It may even be possible to characterize this.

Another possible idea is to pre-load the cache before starting the
timer and doing the 100 encryptions. This idea was suggested in Dan
Bernstein's paper, but might further reduce the information leakage.

Do you think there is any chance this might work? Do you see any
subtleties I am missing? One thing I took away from Bernstein's paper
is that this subject is vastly more subtle than one might have thought
on first impression. Consequently, I'm a little scared to be proposing
ideas like this. Still, if there is any way we can strengthen AES
against these attacks, that seems worthy of further investigation.

Rob Warnock

unread,

Apr 20, 2005, 6:22:38 AM4/20/05

to

D. J. Bernstein <d...@cr.yp.to> wrote:

+---------------

| Roger Schlafly wrote:
| > ISTM that the only real solution is to put AES directly on the chip,
| > and count on Intel to execute it in a constant number of cycles.
|
| My paper makes quite a few recommendations to CPU designers, and that's
| one of them. But this doesn't help people running AES on computers that
| are purchased today, or that were purchased five years ago.

+---------------

Actually, depending on the processor, that might not be the case.
There are a number of CPUs [especially ones designed for embedded use]
that permit data to be selectively "locked" into cache lines. So on
those CPUs the AES implementation could pre-load the needed tables
into the cache and lock those lines.

This is not necessarily a performance problem either, if the cache
is multi-way set-associative, since with those you can usually lock
lines selectively into only one set, leaving the others available
for general use.

In some cases this approach may require the AES implementation to be
kernel-resident, since the locking functions might be privileged.

-Rob

-----
Rob Warnock <rp...@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607

Rob Warnock

unread,

Apr 20, 2005, 6:26:27 AM4/20/05

to

D. J. Bernstein <d...@cr.yp.to> wrote:
+---------------

| A variable-distance rotation can be written as five constant-distance
| rotations and some extra operations; see below. This isn't as painful as
| the AES slowdown, although it's not pleasant.

+---------------

Plus, on many architectures [MIPS, AMD 29k, ARM(?), others] a
constant-time "barrel shifter" is used for variable-distance rotations.

D. J. Bernstein

unread,

Apr 20, 2005, 12:26:11 PM4/20/05

to

Rob Warnock wrote:
> There are a number of CPUs [especially ones designed for embedded use]
> that permit data to be selectively "locked" into cache lines.

My paper already mentions this idea and gives a detailed description of
a subtly different instruction that wouldn't require privilege.

But, as I said, this doesn't help people running AES on computers that

are purchased today, or that were purchased five years ago.

> In some cases this approach may require the AES implementation to be

> kernel-resident, since the locking functions might be privileged.

As far as I know, lock-into-cache instructions are always privileged.

thomas...@gmail.com

unread,

Apr 20, 2005, 9:08:01 PM4/20/05

to

D. J. Bernstein wrote:
> Finally, you are fundamentally misunderstanding the nature of a
timing
> attack. A timing attack doesn't require the attacker to _generate_
any
> packets. He simply has to _see_ the packet timings.

Not to mention the fact that he's conflating an attack that compromises
availability with an attack that compromises confidentiality and
integrity.

Vernon Schryver

unread,

Apr 21, 2005, 8:00:44 PM4/21/05

to

In article <slrnd6b5rt...@stoneport.math.uic.edu>,

D. J. Bernstein <d...@cr.yp.to> wrote:

>> You are ignoring my larger point that such a timing attack involves
>> more than enough traffic to be a denial of service attack against
>> almost all systems.
>
>It's amazing how much confusion you can pack into a single sentence.
>
>You are, first of all, wildly overestimating the amount of traffic
>needed by this embarrassingly simple timing attack. Yes, I sent 200
>million packets (and used packet sizes much larger than the Internet
>average), but that was also massive overkill.
>
>Furthermore, you are wildly underestimating the amount of traffic
>handled by today's web servers, VPN routers, and other machines that can
>reasonably be expected to use AES.
>
>Furthermore, you are confusing traffic _volume_ with traffic _rate_. A
>large volume of traffic doesn't become a ``denial of service attack''
>unless it's sent at high speed. I was working with a 100Mbps local
>network, so my 3600 packets/second weren't a noticeable load.
>
>Finally, you are fundamentally misunderstanding the nature of a timing
>attack. A timing attack doesn't require the attacker to _generate_ any
>packets. He simply has to _see_ the packet timings.
>
>Your subsequent comment about ``2^27 50 ms round trips'' betrays further
>confusion between the concepts of latency and throughput. Ever heard of
>TCP windows, Vernon?

- if 200 M packets are not needed, how many are needed?
I made no wild overestimates but merely took D. J. Bernstein's
published numbers. It seems reasonable to assume that he did
not use more than 10 times too many packets.

- given my professional history, I probably have as good a handle on
how much traffic real machines handle as D.J.Bernstein. However,
our respective personal professional histories are irrelevant.
The relevant questions are the number of packets required for a
probably successful attack and whether that number of packets
is so high that it would be impractical against most target
systems. There are some very busy systems, but they are not
single processor, general purpose x86 boxes like D. J. Bernstein's
test system.

- Of course a DoS attack requires a high rate, which was my point
about estimating how much time 200 Mpackets might require on a
real network. 20 M packets would be a DoS attack in any useful
time against any but the largest systems. Even 1% of 200 M packets
would be big deal for most HTTP servers.

Note also that the more legitimate traffic a target handles, the
more noise there will be. You can't assume that a system that
will not notice 200 M packets will be devoted to timing attacks.
Besides network noise, noise in busy systems include effects on
caches.

- Whether the attack generates the timing provbes or merely observes
them is somewhat irrelevant. The traffic must still exist and
must be visbile to the attacker. To measure the timing of 200
M packets, you'll probably either need to be dangerously close
to the target or generating them.

If you don't generate the packets, you will have trouble forcing
lines out of L1 caches or picking the patterns to probe. You
will need to rely on the targets of the attack making the probes
you need. How much will that increase the traffic you will have
to watch, 10X, 100X, 1,000,000X or some other number?

If you don't generate the traffic, then you may need to rely on
the accuracy and precision of the TCP timestamps supplied by the
target of the attack. As with other other recent excitements
about the dangers of precise RFC 1323 timestamps, a target can
easily make those timestamps far less precise. RFC 1323 recommended
resolutions between 1 millisecond and 1 second. How do you get
0.1 microsecond signals out of a tolerable number of timestamps
with 10 millisecond resolution? You can increase the number of
measurements but multiply 200 M by a few powers of 10 and pretty
soon you're talking about a lot of packets.

- Yes, I'm somewhat wrong about RTTs, but right in spirit about problem of
getting 200 M measurements (or how ever many) in a short enough
time to make the attack a generally significant problem.

When you're talking about real traffic, it seems good to talk
about real applications. How do you limit the ratio of timing
probes to overhead such as TCP SYNs? How does your target HTTP
server respond when the first 199,999,999 of those 200 M probes
don't decrypt into soemthing that makes sense? Do you need to
send another TCP SYN and wait an RTT for the SYN-ACK before
sending another probe? (Yes, yo might piggyback data with
the SYN or the SYN-ACK, but that didn't used to work with all
TCP implementations. Has that changed?)

What does your VPN box in response to those first 199,999,999
probes that don't decrypt?

> ...

>The limits imposed by those protocols are nowhere near small enough to
>stop timing attacks, by the way.

It's hard to miss the consistent absense of numbers in D. J. Bernstein's
notes after his initial announcement.

I really like the idea of cryptograpic CRCs , but claiming that AES
is generally broken and useless and suggesting that the choice of AES
was based on chicanery are not good ways to advance cryptographic CRCs.

Vernon Schryver v...@rhyolite.com

Vernon Schryver

unread,

Apr 21, 2005, 7:14:08 PM4/21/05

to

In article <slrnd6b312...@stoneport.math.uic.edu>,

D. J. Bernstein <d...@cr.yp.to> wrote:

>packets constructed to attack particular server software), I'd expect to
>achieve a signal on the scale of 0.1 microseconds for at least 10% of
>all inputs. This signal has to compete with, typically, 10-microsecond
>network noise; but typical protocols allow AES keys to be used for
>millions of packets, more than enough for the attacker to find the key.

Which networks have only 10 microseconds of noise? The wire occupancy
for a 1500 byte packet at 10 Mbit/sec is 1,200 microseconds and 120
microseconds at 100 Mbit/sec. Each full MTU packet that happens to
already be on the wire or get on the wire ahead of an AES timing probe
introduces 100 times as much noise as D. J. Bernstein evidently sees
on his test network. There are wide (or metro) area networks that run
at 10 Mbit/sec, but most are a lot slower, and those that exist are
not devoted to AES timing probes.

Or in other words, if it were so easy to detect 0.1 microsecond
signals, then NTP would routinely deliver at 0.1 microsecond.

Vernon Schryver v...@rhyolite.com

Vernon Schryver

unread,

Apr 21, 2005, 11:20:39 PM4/21/05

to

In article <1114045681.0...@f14g2000cwb.googlegroups.com>,
<thomas...@gmail.com> wrote:

>Not to mention the fact that he's conflating an attack that compromises
>availability with an attack that compromises confidentiality and
>integrity.

Of course availability differs from confidentiality and both differ
from integrity. But if the latter two are not irrelevant while the
the system is unavailabe and drowning in millions of HTTP hits/hour,
then they become irrelevant as soon as the system's owners notice that
something is going on. This isn't like an ISN attack where you need
to blind one system for tens of seconds; you're talking about millions
of HTTP hits per hour for days. 3000 packets/second to a special
purpose application differs from millions of hits per hour on an HTTP
server more than availability differs from confidentiality and integrity.
(Note nearby comments about looking at the timing of only the first
bits in a session.)

Try putting this attack into the context of a real application protocol.
To do that, you must:

- Extract 0.1 microsecond or smaller signals from networks with
distinct sources of noise ranging from tens of nanoseconds to
tens of seconds without spending more time than you can afford.

- Probably get the target to cooperate in making your measurements.
RFC 1323 recommends timestamps with precisions 1000 times coarser
than the timestamps NTP uses, but getting 0.1 microsecond ticks
from NTP is too much fun.

- Not need to have too many other packets for each of your 200 M probes:

+ If you rely on passive observations of the target, how much
additional traffic can you expect to need to see and ignore
before you see your 200 M interesting values?

+ If start your own TCP sessions and do active probing, how do
you avoid spending weeks or months waiting for SYNs to make
round trips before you can send each probe? (You probably
can't keep 10,000 or even 1000 SYNs in flight without triggering
SYN-bomb alarms).

+ If you corrupt passing TCP segments into probes to use existing
TCP sessions, how do you keep the target from dropping the
connection and wasting lots of your time on SYNs and other
chatter? How do yu do this 200 M times in a reasonable span
and without being noticed?

Vernon Schryver v...@rhyolite.com

D. J. Bernstein

unread,

Apr 22, 2005, 12:52:20 AM4/22/05

to

Vernon Schryver wrote:
> Which networks have only 10 microseconds of noise?

Lots. For example, the same network connection that I mentioned before,
several miles through five routers from a UIC host to a UChicago host.

> Each full MTU packet that happens to
> already be on the wire or get on the wire ahead of an AES timing probe
> introduces 100 times as much noise

Only for a stunningly ignorant attacker who doesn't understand how to
eliminate outliers.

D. J. Bernstein

unread,

Apr 22, 2005, 1:40:01 AM4/22/05

to

Vernon Schryver wrote:
> I made no wild overestimates but merely took D. J. Bernstein's
> published numbers. It seems reasonable to assume that he did
> not use more than 10 times too many packets.

Your ``taking'' of my packet count consisted of stupidly misrepresenting
a casual upper bound as a lower bound. Adding fudge factors doesn't make
this reasonable. Your new assumption is flat-out wrong.

Once again: The 200 million packets were massive overkill. Not only does
the paper explicitly say that I used more packets than necessary, but it
also conjectures that 1 million packets would have been sufficient, and
it points out several ways to do even better. In a subsequent posting I
gave more numbers and explicitly said that 2 million packets, plus the
obvious 2-CPU-day brute-force search, would have worked.

> If you don't generate the packets, you will have trouble forcing
> lines out of L1 caches or picking the patterns to probe. You
> will need to rely on the targets of the attack making the probes
> you need. How much will that increase the traffic you will have
> to watch, 10X, 100X, 1,000,000X or some other number?

It won't increase the traffic at all. Normal CBC traffic already has a
complete spread of input bytes. No ``picking'' is necessary.

As for going to extra effort to force lines out of cache, that would
_reduce_ the number of packets needed. This is something I didn't do.
It should be straightforward with chosen packets (which don't have to be
the encrypted packets); perhaps known packets will do it too, but that
would take more effort to figure out.

> If you don't generate the traffic, then you may need to rely on
> the accuracy and precision of the TCP timestamps supplied by the
> target of the attack.

Your comments about the precision of TCP timestamps are a red herring.
The attacker watches the packets as they go by, and times them with his
own highly accurate local clock.

> claiming that AES is generally broken and useless

My statements stand on their own without your incompetent paraphrases.

> and suggesting that the choice of AES was based on chicanery

There was a blatant error in the AES design and evaluation process.
(``Table lookup: not vulnerable to timing attacks.'') Nobody said that
this error was deliberate. Perhaps you don't understand what the word
``chicanery'' means.

> are not good ways to advance cryptographic CRCs.

You're confused. Encrypted CRCs, and some similar state-of-the-art MACs
that are even faster than HMAC-MD5, aren't promoted by attacks on AES.
They normally _use_ AES! Fortunately, they can use any other cipher just
as easily. So you're talking about a completely unrelated topic.

thomas...@gmail.com

unread,

Apr 22, 2005, 9:31:06 AM4/22/05

to

Vernon Schryver wrote:
> In article <1114045681.0...@f14g2000cwb.googlegroups.com>,

> Of course availability differs from confidentiality and both differ
> from integrity. But if the latter two are not irrelevant while the
> the system is unavailabe and drowning in millions of HTTP hits/hour,
> then they become irrelevant as soon as the system's owners notice
that
> something is going on. This isn't like an ISN attack where you need

So your point is, for each of the 200+ significant DDOS attacks that
occur per DAY at every major ISP, operators should be changing keys?
I'm glad to see that you're at least conceding that they need to
something. Sounds like a practical attack to me.

I guess I could address the rest of your message (TCP timestamps? Come
on.) But I think you're deliberately obscuring the point. You don't
even need to generate traffic to carry this attack out: you just need
to be able to see it. The rest of your message dances around ideas like
"reasonable cost" and "reasonable amount of time". I don't know what
these words mean. The objective of the attack is a catastrophic loss
of integrity and confidentiality. What's that worth? I guess it depends
on the target.

I actually really like the idea of talking through the details of how
to optimize this attack and working out the cost tradeoffs. From
hearing you talk about timestamps, in-flight packet corruption, and SYN
alarms, it's obvious that you do too. But I'm not sure sci.crypt is the
place to have that conversation, and I'm completely sure that it's
besides the point right now.

---
Thomas H. Ptacek
Arbor Networks

Vernon Schryver

unread,

Apr 22, 2005, 10:11:48 AM4/22/05

to

In article <slrnd6h0oo...@stoneport.math.uic.edu>,

D. J. Bernstein <d...@cr.yp.to> wrote:

>> Which networks have only 10 microseconds of noise?
>
>Lots. For example, the same network connection that I mentioned before,
>several miles through five routers from a UIC host to a UChicago host.
>
>> Each full MTU packet that happens to
>> already be on the wire or get on the wire ahead of an AES timing probe
>> introduces 100 times as much noise
>
>Only for a stunningly ignorant attacker who doesn't understand how to
>eliminate outliers.

Both of those statements are misleading.

In absolute numbers, there are plenty of networks with only 10
microseconds of noise, but they are rare as a fraction of the total
Internet. You are unlikely to find such a path to your chosen target
without making special arrangements such placing your attacking software
close to your target.

Of course packets that happen to spend any time in a queue are outliers,
but what you call them is irrelevant. What is relevant is how often
they happen and what you do about them. How often will they happen
if your target is 7 hops away (implied by 5 routers) and each of the
14 networks in the round trip are 30% loaded? What are the odds that
a probes and its result will escape being delayed on any of those 14
hops? It looks like about 0.7% to me. It would be "stunningly ignorant"
to naively increase your 2M probes by a factor of 14,000.

If the links are only 10% loaded, the inflation goes down to 5X. At
1% loading, it goes away, which explains your atypical ping variances
(atypical for the Internet as opposed to a private network). That
suggests attacking only at quite network times, which might cost less
than a factor of 3.

Wire occupancy is only the simpliest sort of network noise. If any
of those 10 networks is not a simple FDX link but a shared medium
LAN that uses CSMA/CD, tokens, or some other mechanism for resolving
contention among hosts, the statistics change. Even without the
least contention, the delay statistics on networks like FDDI rings
are messy in the microsecond range.

The notion of a genuinely remote timing attack is hyperbole. You must
get close enough to your target to not be bothered by Internet noise>
If you are going to snoop on legitimate traffic, you must also get
close. AES timing attacks should be considered, but exposing your
HTTP server to the Internet is not in the same league as exposing your
Windows boxes....yes, securing, perhaps in concrete at the bottom of
the nearest lake, all of your Windows boxes sounds like a good
contribution toward securing your HTTP servers.

Vernon Schryver v...@rhyolite.com

Vernon Schryver

unread,

Apr 22, 2005, 10:38:05 AM4/22/05

to

In article <slrnd6h3i4...@stoneport.math.uic.edu>,

D. J. Bernstein <d...@cr.yp.to> wrote:

>Once again: The 200 million packets were massive overkill. Not only does
>the paper explicitly say that I used more packets than necessary, but it
>also conjectures that 1 million packets would have been sufficient, and
>it points out several ways to do even better. In a subsequent posting I
>gave more numbers and explicitly said that 2 million packets, plus the
>obvious 2-CPU-day brute-force search, would have worked.

Ok, how many HTTP servers will not notice an extra 2 M hits over less
than a month? Of those that won't, how many are vulnerable to timing
attacks because their L1 caches are small enough and they don't use
the off-board encryption hardware sold to deal with the CPU costs of
handling lots of HTTPS hits/day? Enough to justify your claims about
the seriousness of this issue?

Your mention of a "typical 600MHz server" suggests that your notion
of typical HTTP servers that won't notice 2 M HTTPS hits differs from
mine. (See <slrnd645h2....@stoneport.math.uic.edu>)

>> You
>> will need to rely on the targets of the attack making the probes
>> you need. How much will that increase the traffic you will have
>> to watch, 10X, 100X, 1,000,000X or some other number?
>
>It won't increase the traffic at all. Normal CBC traffic already has a
>complete spread of input bytes. No ``picking'' is necessary.

That is an interesting claim. It's not that the traffic won't have a
complete spread of input bytes, but whether and if so how often probing
values will be combined into single packets so that they are harder
to use--i.e. require more traffic to resolve the resulting relations.

>> If you don't generate the traffic, then you may need to rely on
>> the accuracy and precision of the TCP timestamps supplied by the
>> target of the attack.
>
>Your comments about the precision of TCP timestamps are a red herring.
>The attacker watches the packets as they go by, and times them with his
>own highly accurate local clock.

Ok, I was giving you the benefit of the doubt about your ridiculous
claims about the ease of detecting 0.1 microsecond signals over
typical Internet paths.

>> and suggesting that the choice of AES was based on chicanery
>
>There was a blatant error in the AES design and evaluation process.
>(``Table lookup: not vulnerable to timing attacks.'') Nobody said that
>this error was deliberate. Perhaps you don't understand what the word
>``chicanery'' means.

You gave the clear and presumably intentionally impression that chicanery
was involved.

Vernon Schryver v...@rhyolite.com

D. J. Bernstein

unread,

Apr 22, 2005, 4:25:43 PM4/22/05

to

Vernon Schryver wrote:
> How often will they happen if your target is 7 hops away (implied by 5
> routers) and each of the 14 networks in the round trip are 30% loaded?

That's a crazy premise. Network load simply doesn't work that way in the
real world. Packets come in waves and bursts and spikes, forcing almost
all hops to have far more capacity than they usually need.

Anne & Lynn Wheeler

unread,

Apr 22, 2005, 4:50:09 PM4/22/05

to

"D. J. Bernstein" <d...@cr.yp.to> writes:
> That's a crazy premise. Network load simply doesn't work that way in the
> real world. Packets come in waves and bursts and spikes, forcing almost
> all hops to have far more capacity than they usually need.

this is one of the issues raised with the original slow start stuff
... some of the prototyping had been done on direct ethernet
connection between two hosts.

I believe that the same month that slow-start presentation was made an
an IETF meeting ... there was a paper published in ACM SIGCOMM
proceedings about slow-start not being stable in real-world
environments (because of things like significant bursty trends).

for a little side drift ... one of the issues has been communication
protocols using windowing algorithms to coordinate the number of
outstanding sent packets with the amount of buffers at receiving nodes
(originally in direct point to point links involving latency either in
transmission and/or at end-point processing).

Windowing tends to only be indirectly related to amount of
intermediate node congestion. Intermediate node congestion tends to be
things like the arrival rate and stuff like back-to-back packet
arrivals. One of the issues raised in the early SIGCOMM proceedings
paper is that returning ACK-packets (which open windows) can get
bunched up ... which results in opening multiple windows at the
sending side. The sending side then tends to transmit multiple
back-to-back packets (for the size represented by the multiple bunched
ACKs). The multiple back-to-back transmission by the sender then tends
to aggrevate saturation and congestion at intermediate nodes (this
bursty characteristic is one of the things contributing to non-stable
slow-start operation).

If the multiple arriving back to back packets contribute to
intermediate node saturation ... a possible solution is to have
time-delays between packet transmission to address intermediate node
congestion. However, it is possible to create an infrastructure that
dynamically adjusts the intermediate transmission delay to address
intermediate node saturation ... and totally ignore window-based
strategies.

for HSDT in the mid-80s, we had done rate-based pacing w/o having to
resort to windowing oriented algorithms for long latency network
connections. a problem at the time of slow-start work was that there
were a whole class of machines and software with primitive timing
faclities that made if difficult to implement rate-based pacing using
a dynamically adaptive inter-packet transmission delay (directly
controlling the rate of packet production in order to influence the
rate of packet arrival at intermediate nodes).

misc. hsdt
http://www.garlic.com/~lynn/subnetwork.html#hsdt

--
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/

D. J. Bernstein

unread,

Apr 22, 2005, 4:56:00 PM4/22/05

to

Vernon Schryver wrote:
> You gave the clear and presumably intentionally impression that
> chicanery was involved.

You're a nutcase, Vernon.

I said that NIST failed to recognize that table lookups do not take
constant time. I pointed to various incorrect comments on this topic
from Rijmen, Daemen, and NIST. I traced the effects of this error.

There's no reason to believe that anybody was trying to trick anybody
else. Everyone was making the same mistake! Fourteen AES submissions
from dozens of cryptographers used table lookups with secret indices.
RC6 didn't, but that was explained solely as a way to reduce costs, not
as a way to avoid timing attacks.

Paul Rubin

unread,

Apr 22, 2005, 6:08:09 PM4/22/05

to

"D. J. Bernstein" <d...@cr.yp.to> writes:

> There's no reason to believe that anybody was trying to trick anybody
> else. Everyone was making the same mistake! Fourteen AES submissions
> from dozens of cryptographers used table lookups with secret indices.
> RC6 didn't, but that was explained solely as a way to reduce costs, not
> as a way to avoid timing attacks.

Serpent also didn't, in one of implementation modes it was designed for.

Vernon Schryver

unread,

Apr 22, 2005, 8:00:02 PM4/22/05

to

In article <slrnd6inet...@stoneport.math.uic.edu>,

D. J. Bernstein <d...@cr.yp.to> wrote:

>> How often will they happen if your target is 7 hops away (implied by 5
>> routers) and each of the 14 networks in the round trip are 30% loaded?
>
>That's a crazy premise. Network load simply doesn't work that way in the
>real world. Packets come in waves and bursts and spikes, forcing almost
>all hops to have far more capacity than they usually need.

Yes, there are the famous "network traffic is self-similar" papers,
but no, "waves and bursts and spikes" and the recommended (but not
sadly universal on the commercial Internet) over-provisioning do not
imply that "far more capacity than they usually need" means that that
median, average and other relevant notions of packet queuing delays
are 0. Besides, some people think that 70% over-provisioning or 30%
loading is far more capacity than usually needed.

See http://www.caida.org/ for some numbers.

Or look at the some of the graphs found with
http://www.google.com/search?q=bandwidth+graph
and
http://www.google.com/search?q=bandwidth+mrtg
A lot of people seem to be running above 30% average loads a lot of
the time at the resolution of their graphs.

Just as "not all of the world is a VAX," not all of the world is a
very lightly loaded academic, lab, or interior commercial network.

Or to move from academic handwaving toward the real world, if it is so
easy to detect 0.1 microsecond signals on say 25% of 8 million attempted
probes by naively discarding the outliers, why is it so much fun to try
to make NTP discipline clocks to within microseconds? See
http://www.google.com/search?q=ntp+performance

Or if packet delays are so well behaved, why is VoIP, which needs
merely 10s of milliseconds instead of micorseconds, so interesting to
implement and still so much different from the classical notion of
"toll quality"?

If the goal is not to show that AES is completely useless and broken,
then why push of 0.1 microsecond measurements over the ordinary Internet
so hard? If only some AES implementations are vulnerable to network
timing attacks and only from nearby hosts, then that's still an
vulernability that should be considered. That it would not be the end
of the cryptography as we know it should not matter except to people
buying or selling something.

Vernon Schryver v...@rhyolite.com

Vernon Schryver

unread,

Apr 22, 2005, 8:02:45 PM4/22/05

to

In article <slrnd6ip7m...@stoneport.math.uic.edu>,

D. J. Bernstein <d...@cr.yp.to> wrote:

>> You gave the clear and presumably intentionally impression that
>> chicanery was involved.
>
>You're a nutcase, Vernon.

Why concentrate on me instead of addressing more interesting things
such as this?:

> >It won't increase the traffic at all. Normal CBC traffic already has a
> >complete spread of input bytes. No ``picking'' is necessary.
>
> That is an interesting claim. It's not that the traffic won't have a
> complete spread of input bytes, but whether and if so how often probing
> values will be combined into single packets so that they are harder
> to use--i.e. require more traffic to resolve the resulting relations.

It seems intuitively clear that significantly more traffic would be
required for a passive timing attack, but how much?

Vernon Schryver v...@rhyolite.com

Richard Outerbridge

unread,

Apr 22, 2005, 11:21:35 PM4/22/05

to

In article <slrnd6ip7m...@stoneport.math.uic.edu>,

"D. J. Bernstein" <d...@cr.yp.to> wrote:

> I said that NIST failed to recognize that table lookups do not take
> constant time. I pointed to various incorrect comments on this topic
> from Rijmen, Daemen, and NIST. I traced the effects of this error.
>
> There's no reason to believe that anybody was trying to trick anybody
> else. Everyone was making the same mistake! Fourteen AES submissions
> from dozens of cryptographers used table lookups with secret indices.

It would be mildly interesting to see the susceptibility of 3DES, the
only official alternative to AES at this point, to this attack. Most
implementations use S-box tables of various sizes, typically either
8*64*4 or 2*64*4*4 (both totaling 2^11 or 2048 bytes). Would the
"bigger sized" version (two tables of 1024 bytes) be less vulnerable
than the "smaller sized" version (eight tables of 256 bytes), since
most of the time any part of the bigger table is less likely to be in
cache, or is the damage more-or-less the same?

outer

Tom St Denis

unread,

Apr 22, 2005, 11:54:44 PM4/22/05

to

Last I checked Skipjack was valid in the US and CAST5 was a suitable
choice in Canada. Though CAST5 is just like AES I guess in the use of
8x32 tables.

Noekeon is a good choice I guess then ;-)

Tom

D. J. Bernstein

unread,

Apr 23, 2005, 2:37:05 AM4/23/05

to

Vernon Schryver wrote:
> Or to move from academic handwaving toward the real world, if it is so
> easy to detect 0.1 microsecond signals on say 25% of 8 million attempted
> probes by naively discarding the outliers, why is it so much fun to try
> to make NTP discipline clocks to within microseconds?

You are, once again, fundamentally confused.

An NTP client is trying to produce an accurate measurement of the time
of day---the time since midnight, for example---from a relatively small
number of packets. This has no relevance to the problem of producing an
accurate measurement of a _small_ time interval from _many_ packets.

> Or if packet delays are so well behaved, why is VoIP, which needs
> merely 10s of milliseconds instead of micorseconds, so interesting to
> implement and still so much different from the classical notion of
> "toll quality"?

Another laughably ignorant analogy.

VoIP needs _constant_ service. When VoIP users bump into the occasional
half-second network overload, they're justifiably annoyed. When an
attacker bumps into the occasional half-second network overload, he
simply waits and tries again.

Vernon Schryver

unread,

Apr 23, 2005, 11:52:49 AM4/23/05

to

In article <slrnd6jr94....@stoneport.math.uic.edu>,

D. J. Bernstein <d...@cr.yp.to> wrote:

>> Or to move from academic handwaving toward the real world, if it is so
>> easy to detect 0.1 microsecond signals on say 25% of 8 million attempted
>> probes by naively discarding the outliers, why is it so much fun to try
>> to make NTP discipline clocks to within microseconds?
>
>You are, once again, fundamentally confused.
>
>An NTP client is trying to produce an accurate measurement of the time
>of day---the time since midnight, for example---from a relatively small
>number of packets. This has no relevance to the problem of producing an
>accurate measurement of a _small_ time interval from _many_ packets.

Perhaps there is some confusion between NTP and the protocols on ports
13 and 37. Never mind that NTP is not strictly a client-server but
almost a peer-to-peer protocol. An NTP peer does need to know the
time of day claimed by its peers, but the substance of the system is
measuriing the differences in the rate at which its local clock ticks
and the rates at which the remote peers tick as precisely as possible
so that the rate of the local clock can be adjusted. When a local
application needs to know the time, it checks the local clock instead
of chattering over the network. The point of NTP is to make the local
clock keep good time by measuring microseconds or less of clock skew.

The main sources of errors for those measuresments of clock ticking
rates are the same as the signal and the noise in an AES timing attack,
hiccups in either host including interrupts and stalling for memory
(e.g. cache delays) and variations in network delays. The basic NTP
transaction consists of
- local system notes the time and sends a packet to the peer.
- peer answers as quickly as possible with its own time
- local system receives answer and notes the local time.
The purpose of those transations is to measure the difference between
the local clock and the remote clock that has accumulated since the
previous transaction. When NTP is clicking at microsecond synchronization,
those measurement are of microseconds or less, albeit multiplied by
delays among transactions.

On idle networks, NTP readily disciplines local ticks to peers very
closely. On the wild Internet, it's a whole lot more fun, which is
why NTP implementations have fancy math filtering that is a lot more
sophisticated than just discarding outliers.

Those transactions are repeated continually at a tiny rate compared
to an AES timing attack. The reported AES attacks ran at about 3000
packets/sec or 10 Mbit/sec, but over the real Internet to a target
that won't invoke DoS defenses and stop answering entirely, you only
hope for 100 Kbit/sec or 30 pps for more than seconds. (Never mind
how you reach 30 pps when each probably requires a round trip for a
TCP 3-way handshake.) 30 pps also seems like an optimistic rate outside
the targets own network for a passive attack. The eventual NTP rate
is about 1 packet every 90 seconds or 0.03% of 30 pps.

>> Or if packet delays are so well behaved, why is VoIP, which needs
>> merely 10s of milliseconds instead of micorseconds, so interesting to
>> implement and still so much different from the classical notion of
>> "toll quality"?
>
>Another laughably ignorant analogy.
>
>VoIP needs _constant_ service. When VoIP users bump into the occasional
>half-second network overload, they're justifiably annoyed. When an
>attacker bumps into the occasional half-second network overload, he
>simply waits and tries again.

VoIP users also retry when too many of their packets are lost or delayed
too much. Details of error recovery protocols matter a lot less than
the frequency at which those network overloads occur. What matters
more is that VoIP requires de-jitter buffering not for network overloads
but because network delays vary a lot. A VoIP implemenation must not
have too much buffering because it is better to have a drop-out than
go into satellite phone mode with 0.5 second delays. Perhaps the
problem of variations in network delays in real networks are more
visible to those who have worked on Internet video conferencing systems.

I notice a lack of response to evidence showing that typical Internet
links are not practically idle.

Perhaps it is time to tally unaddressed issues:

- whether delays round trips would be required for active attacking
because each probe might need a new TCP connection.

- how many additional packets a passive attack needs compared to the
2 million required for an active attack.

- how close an attacker must be to the target to see enough traffic
mount a passive attack

- whether an active attack could be mounted over the Internet and
collect sufficient data before ordinary DoS defenses stopped it.

- whether typical Internet paths are mostly idle, and so whether
variations in network delays make either active or passive attacks
practical over the Internet as opposed to from within the target's
own network.

Vernon Schryver v...@rhyolite.com