Announce: Timing cryptanalysis of RSA, DH, DSS

Paul C. Kocher

unread,

Dec 11, 1995, 3:00:00 AM12/11/95

to

I've just released details of an attack many of you will find
interesting since quite a few existing cryptography products and
systems are potentially at risk. The general idea of the attack is
that secret keys can be found by measuring the amount of time used to
to process messages. The paper describes attacks against RSA, fixed-
exponent Diffie-Hellman, and DSS, and the techniques can work with
many other systems as well.

My research on the subject is still in progress and the current paper
does not include many of my findings. I will eventually publish a full
paper, but am releasing a preliminary draft now to alert the community
as quickly as possible. A copy of the abstract is attached at the end
of this message and the full text can be downloaded in PostScript
format from:

ftp://ftp.cryptography.com/pub/kocher_timing_attack.ps
ftp://ftp.cryptography.com/pub/kocher_timing_attack.ps.gz

I've also made an HTML version which is accessible at:

http://www.cryptography.com

(The HTML uses subscripts and superscripts which aren't supported
in older web browsers. The PostScript version is the "official"
one and looks nicer.)

The results have already been seen by Matt Blaze, Martin Hellman, Ron
Rivest, Bruce Schneier, and many others. While the full significance
of the attack is not yet known, I think everyone who has seen it
considers it important (including Netscape who awarded me a $1000
bugs bounty prize).

ABSTRACT. Cryptosystems often take slightly different
amounts of time to process different messages. With network-
based cryptosystems, cryptographic tokens, and many other
applications, attackers can measure the amount of time used
to complete cryptographic operations. This abstract shows
that timing channels can, and often do, leak key material.
The attacks are particularly alarming because they often
require only known ciphertext, work even if timing
measurements are somewhat inaccurate, are computationally
easy, and are difficult to detect. This preliminary draft
outlines attacks that can find secret exponents in Diffie-
Hellman key exchange, factor RSA keys, and find DSS secret
parameters. Other symmetric and asymmetric cryptographic
functions are also at risk. A complete description of the
attack will be presented in a full paper, to be released
later. I conclude by noting that closing timing channels
is often more difficult than might be expected.

Cheers,
Paul Kocher

*********************************************************************
VERY IMPORTANT: If you send me e-mail, please understand that I
probably won't have time to respond to all who write. Please keep
messages SHORT and send them to p...@cryptography.com (**not** my
netcom address -- misdirected messages will be ignored). PGP when
used for e-mail is not vulnerable to the attack. Please state in
your note whether you would like a reply.
********************************************************************

__________________________________________________________________________
Paul C. Kocher Independent cryptography/data security consultant
E-mail: p...@cryptography.com (please see above before replying)

Joel M. Rubin

unread,

Dec 11, 1995, 3:00:00 AM12/11/95

to

I just saw a small article with your name on page 1 of the N.Y. Times
Fax 8-page Internet Edition. (Monday, December 11, 1995)

They change the edition at about 10:30-11 P.M. Eastern Standard Time
(0330-0400 the next GMT day) so if you read this before then, you might
want to download http://nytimesfax.com/times.pdf.

It is in Adobe Acrobat format.

Of course, there is probably a larger article in the paper edition.

Jim Walters

unread,

Dec 11, 1995, 3:00:00 AM12/11/95

to

Paul C. Kocher wrote:

Nice piece of work. Have you considered the possibility of measuring thermal
fluctuations of the CPU casing?

Jim Walters
try...@halcyon.com

Markus Kuhn

unread,

Dec 11, 1995, 3:00:00 AM12/11/95

to

p...@netcom.com (Paul C. Kocher) writes:

>I've just released details of an attack many of you will find
>interesting since quite a few existing cryptography products and
>systems are potentially at risk. The general idea of the attack is
>that secret keys can be found by measuring the amount of time used to
>to process messages. The paper describes attacks against RSA, fixed-
>exponent Diffie-Hellman, and DSS, and the techniques can work with
>many other systems as well.

I know of at least one cryptographic European pay-TV access control
system, where such a technique has around a year ago been used to
attack a proprietary cryptographic message digest algorithm
implemented on a security smart card processor. Unfortunately, in the
next card generation, the manufacturer not only exchanged the key, but
also the secret proprietary algorithm and this way the timing attack
against the first algorithm became only a academic curiosity. This was
published on the tv-crypt mailing list, but unfortunately never in the
scientific literature.

Although these pitfalls have been documented before in the computer
security literature (especially for password string compare
routines!), ignoring timing leaks IMHO seems to be certainly among the
top three programming errors which smart card operating system
implementor still make today. Others are pitfalls like ignoring CPU
current leaks and EEPROM cell temperature leaks.

Markus

--
Markus Kuhn, Computer Science student -- University of Erlangen,
Internet Mail: <msk...@cip.informatik.uni-erlangen.de> - Germany
WWW Home: <http://wwwcip.informatik.uni-erlangen.de/user/mskuhn>

Hal

unread,

Dec 11, 1995, 3:00:00 AM12/11/95

to

p...@netcom.com (Paul C. Kocher) writes:

>I've just released details of an attack many of you will find
>interesting since quite a few existing cryptography products and
>systems are potentially at risk. The general idea of the attack is
>that secret keys can be found by measuring the amount of time used to
>to process messages. The paper describes attacks against RSA, fixed-
>exponent Diffie-Hellman, and DSS, and the techniques can work with
>many other systems as well.

This is a fascinating result. I have a couple of questions, though.
Focus on the modular exponentiation attack as in Diffie-Hellman.

The attack seems to depend on the fact that some modular
multiplications take longer than others. It is not enough that some
exponents do more multiplications than others. If that were the only
fact used, the timing information would tell only how many
multiplications were done, which would tell how many 1 bits were in the
exponent. Paul Kocher's attack is much more powerful because it uses
the fact that some modular multiplications take longer, and since the
attacker knows the input into the multiply, he can look for
correlations between extra-long overall times and multiplicands which
lead to slow modular multiplies. I think this is the gist of the
attack.

Given this, I am puzzled by one comment in the paper: "Computing
optional Ri+1 calculations regardless of whether the exponent bit is
set does not work and can actually make the attack easier; the
computations still diverge but attackers no longer have to identify the
lack of a correlation for adjacent zero exponent bits."

It seems to me that computing all the Ri+1 calculations (that is, doing
all the multiplies that might be done in a modular exponentiation, but
discarding those which are not needed) would in fact protect against
the attack. I recognize that there may still be some slight time
differences between doing a modular multiply and keeping the result
versus doing it and discarding the result (due to slightly different
lengths in the assembly code). But the amount of variation
should be several orders of magnitude reduced, which should make the
gathering of adequate statistics more difficult.

More importantly, the time variation should no longer depend on the
values of the multiplicands, and as I wrote above it seems to me that
his attack does depend on this.

Perhaps I am missing the point of the attack?

Hal Finney
hfi...@shell.portal.com

Ron Rivest

unread,

Dec 11, 1995, 3:00:00 AM12/11/95

to ip...@ans.net

The simplest way to defeat Kocher's timing attack is to ensure that the
cryptographic computations take an amount of time that does not depend on the
data being operated on. For example, for RSA it suffices to ensure that
a modular multiplication always takes the same amount of time, independent of
the operands.

A second way to defeat Kocher's attack is to use blinding: you "blind" the
data beforehand, perform the cryptographic computation, and then unblind
afterwards. For RSA, this is quite simple to do. (The blinding and
unblinding operations still need to take a fixed amount of time.) This doesn't
give a fixed overall computation time, but the computation time is then a
random variable that is independent of the operands.
-
==============================================================================
Ronald L. Rivest 617-253-5880 617-253-8682(Fax) riv...@theory.lcs.mit.edu
==============================================================================

Joel M. Rubin

unread,

Dec 11, 1995, 3:00:00 AM12/11/95

to

In case you don't already know, there is an article about your work in
Monday's N.Y. Times. I just read a very small version of it in
http://nytimesfax.com/times.pdf. (Adobe Acrobat-format 8-page edition)

The N.Y. Times Fax on the web changes edition about 10:30 or 11 P.M. New
York time so if you want it, get it before then.

Hal

unread,

Dec 11, 1995, 3:00:00 AM12/11/95

to

hfi...@shell.portal.com (Hal) writes:
>It seems to me that computing all the Ri+1 calculations (that is, doing
>all the multiplies that might be done in a modular exponentiation, but
>discarding those which are not needed) would in fact protect against
>the attack.

>[...]

>More importantly, the time variation should no longer depend on the
>values of the multiplicands, and as I wrote above it seems to me that
>his attack does depend on this.

I was mistaken about (the first part of) this - although you can take
the same amount of time on the first iteration by doing this, you
nevertheless have different values being multiplied on subsequent
iterations depending on the exponent bits. Some of those values may
take longer to multiply than others, and so the attacker can still
deduce the value of each bit.

Hal Finney
hfi...@shell.portal.com

Hal

unread,

Dec 12, 1995, 3:00:00 AM12/12/95

to

Although I was wrong to suggest that always doing the "optional"
multiplies would help in the discrete log case, I wonder if the same
thing is true for the RSA decryption attack.

That attack seems much weaker because decryption is usually done using the
Chinese Remainder Theorem. Given n=p*q, we want to raise ciphertext c
to the d power. So the first step is to do cp = c mod p and cq = c mod q
and then raise cp to the d mod p-1 power and cq to the d mod q-1 power. The
problem with applying Paul's attack to the exponentiations is that, not
knowing p and q, we can't know cp and cq. So Paul suggests trying to
attack the first modular reductions by probing with c's about the same
size as p and q and to try to detect the extra time which will be taken
to do c mod p if c is less than p versus greater than p.

I call this attack weaker because the amount of time difference to do a
single modular reduction is not that great when you consider that maybe
1500 modular reductions will be done as part of the exponentiation, with
a 1000 bit key. So this attack is going to require much more accurate
timing than the discrete log one, I think.

In many cases this attack can be recognized because such small c values
are essentially impossible. For example, the Secure Socket Layer
system currently being used by Netscape for secure WWW connections does
an RSA decryption, but the plaintext is a PKCS style padded session
key, and the chance of that encrypting to a 512 bit number is basically
0. Likewise Mark Twain Bank is doing RSA decryptions of blinded
electronic cash as part of the withdrawal protocol, and such small c
values could be recognized there as well. Rejecting small c's should
perhaps be a good precaution in future RSA implementations.

Alternatively if the initial modular reduction can be made approximately
constant time (say, always subtract p, but discard or keep the result),
that small time variation can be greatly reduced.

This is all in the context of the chosen ciphertext attack. I haven't
really considered the known ciphertext attack, which looks harder to
defend against.

(Latecomers, see http://www.cryptography.com.)

Hal Finney

Paul Rubin

unread,

Dec 12, 1995, 3:00:00 AM12/12/95

to

In article <4aivt6$5...@cantaloupe.srv.cs.cmu.edu>,
Ralf Brown <ra...@cs.cmu.edu> wrote:
>In article <pckDJG...@netcom.com>, Paul C. Kocher <p...@netcom.com> wrote:

>}Jim Walters <try...@halcyon.com> wrote:
>}>Nice piece of work. Have you considered the possibility of measuring thermal
>}>fluctuations of the CPU casing?
>}

>}No. Is this a significant, measurable effect that somehow reflects
>}what's going on inside the CPU?
>
>With modern CMOS-technology processors, current draw (and thus heat loading)
>depends primarily on the number of times gates change state. While this
>is measurable in the aggregate, I doubt time scales much finer than 100
>milliseconds could be achieved even using thermocouples directly attached
>to the chip itself, never mind the outside of the casing.

But it might be worthwhile to measure the instantaneous current going
into the chip via its power supply pins. It might even be possible to
do this without removing the chip from the circuit, by using an
inductive probe. By knowing the exact code being run, you could
figure out something about what is happening at each clock pulse.
A really secure design would be a secret cipher in a tamper- and
reverse-engineering resistant package, whose hardware implementation
was carefully designed to balance out all these effects of current
consumption, gate delays (i.e., avoid phase jitter on output pins
depending on which logic path a particular input went thru), etc.
If the NSA was on the ball they did all these things for Clipper,
plus other stuff that we don't know about.

Padgett 0sirius

unread,

Dec 12, 1995, 3:00:00 AM12/12/95

to

While I *think* I understand what is going on (the significance of a 17 usec
delta in a process that has a 188 usec standard deviation of the cycle time
eludes me though), it seems to me that it would require direct connection to
the CPU, preferably with a logic analyzer, possibly at a directly connected
terminal and with knowlege of exactly when the key is being processed.
Certainly not over a packet switched network.

Given that level of access, why not just ask the CPU what the key is ?

A. Padgett Peterson, P.E.
Cybernetic Psychophysicist
Totally Obsessed with TransOceanics
My other car is a Pontiac too
We also walk dogs
PGP 2.7 Public Key Available

Donald T. Davis

unread,

Dec 12, 1995, 3:00:00 AM12/12/95

to

Daniel G. Pouzzner writes:
> [ kocher's] insight has particularly dire implications for several mainstream
> online cryptosystems. Examples are those which are Kerberos-based
> (including Transarc AFS and OSF DCE security) and at least one other,
> DEC Sphinx. These systems make it feasible, in fact quite trivial, to
> precisely time the enciphering of arbitrarily many messages each
> different by intent (timestamping). Moreover, this timing can be
> performed anonymously from essentially anywhere on the internet, ...

this is not correct. kocher's cryptanalysis requires
microsecond-resolution timings, which you can't reliably
get across the net, because network latencies vary by
milliseconds. to remove the latency variability, you'd
have to get the server to perform the same encrytion
many times, so as to average the latencies away. but,
the kerberos server will not encrypt the same message
many times for you, for several reasons.

also, it's worth noting that if we have to make every
encryption take a fixed "worst-case" execution time,
then des' performance is harmed less than rsa, because
des' encryption- times are less variable than rsa's.

kocher's attack is not a practical one for kerberos or
most other cryptographic servers. This is not to belittle
the importance and ingenuity of his work; i take my hat
off to him.
-don davis, boston

Jim Walters

unread,

Dec 12, 1995, 3:00:00 AM12/12/95

to

Paul C. Kocher wrote:
>
> Jim Walters <try...@halcyon.com> wrote:
> >Nice piece of work. Have you considered the possibility of measuring thermal
> >fluctuations of the CPU casing?
>
> No. Is this a significant, measurable effect that somehow reflects
> what's going on inside the CPU?

Sorry.
I guess my thought is is that to get the precision necessary in the timing information
an intrusive technique with far more obvious security implications is necessary. Is
this true?

Jim Walters
try...@halcyon.com

Markus Kuhn

unread,

Dec 12, 1995, 3:00:00 AM12/12/95

to

v...@calcite.rhyolite.com (Vernon Schryver) writes:

>Given the attack seems to involve detecting 17 microsecond and smaller
>differences in the duration of computations, wouldn't it be sufficient
>to do your cyphering on machines to which opponents do not have on-line
>direct? Detecting 17 microsecond differences over a network sounds hard.

Who is talking only about networks? RSA, DH, DSS, etc. are widely used
in smart cards. For smart cards, you even have to provide the CPU
clock signal, so you can count the number of bus clock cycles required
to do the calculation. You get all the precision you need and you can
identify easily even one single conditional branch that has been taken
(2 clk cycles) or not taken (1 clk cycle) this way. Then there are
integer multiplication instructions, where the number of 1 bits in one
of the operands determines the number of clock cycles required to
execute the machine code instruction. All this provides you a lot of
information that can be evaluated during an attack.

Theoretically, you can even get high timing precision over network
channels with a noisy timing by averaging a lot of measurements.
However the experience with NTP suggests that even with very careful
averaging, not much more precision than a little bit below 1 ms is
possible on Ethernet LANS and no precision below 10 ms on the global
Internet. Especially slotted bus protocols (ATM, DQDB, etc.)
effectively destroy timing information and not even averaging can
recover this.

Bob O'Hara

unread,

Dec 12, 1995, 3:00:00 AM12/12/95

to

In article <4agcf3$e...@ixnews2.ix.netcom.com>, jmr...@ix.netcom.com says...

>
>I just saw a small article with your name on page 1 of the N.Y. Times
>Fax 8-page Internet Edition. (Monday, December 11, 1995)

[snip]

>Of course, there is probably a larger article in the paper edition.
>

There was a larger story in the paper edition, prominently on page one.

-Bob

Paul C. Kocher

unread,

Dec 12, 1995, 3:00:00 AM12/12/95

to

Jim Walters <try...@halcyon.com> wrote:
>Nice piece of work. Have you considered the possibility of measuring thermal
>fluctuations of the CPU casing?

No. Is this a significant, measurable effect that somehow reflects
what's going on inside the CPU?

Cheers,
Paul Kocher

__________________________________________________________________________
Paul C. Kocher Independent cryptography/data security consultant

E-mail: p...@netcom.com Voicemail: 415-354-8004, FAX: 415-321-1483

Paul C. Kocher

unread,

Dec 12, 1995, 3:00:00 AM12/12/95

to

Hal <hfi...@shell.portal.com> wrote:
>This is a fascinating result. I have a couple of questions, though.
>Focus on the modular exponentiation attack as in Diffie-Hellman.
>
>The attack seems to depend on the fact that some modular
>multiplications take longer than others. It is not enough that some
>exponents do more multiplications than others. If that were the only
>fact used, the timing information would tell only how many
>multiplications were done, which would tell how many 1 bits were in the
>exponent. Paul Kocher's attack is much more powerful because it uses
>the fact that some modular multiplications take longer, and since the
>attacker knows the input into the multiply, he can look for
>correlations between extra-long overall times and multiplicands which
>lead to slow modular multiplies. I think this is the gist of the
>attack.
>
>Given this, I am puzzled by one comment in the paper: "Computing
>optional Ri+1 calculations regardless of whether the exponent bit is
>set does not work and can actually make the attack easier; the
>computations still diverge but attackers no longer have to identify the
>lack of a correlation for adjacent zero exponent bits."

You'll still see timing differences later because to the decision
whether to use or throw away previous multiplication results
affects the calcuation. In particular, with this altered-but-
breakable system, the value of R in the *next* iteration of the
loop depends on the previous loop's decision. There are
two possibilities: either R still has its previous value
(which is known to the attacker) or R has the new value (which
is also known to the attacker). Timing measurements can
show which of these two possibilities is right.

David Feustel

unread,

Dec 12, 1995, 3:00:00 AM12/12/95

to

Hal (hfi...@shell.portal.com) wrote:

: p...@netcom.com (Paul C. Kocher) writes:

: >I've just released details of an attack many of you will find
: >interesting since quite a few existing cryptography products and
: >systems are potentially at risk. The general idea of the attack is
: >that secret keys can be found by measuring the amount of time used to
: >to process messages. The paper describes attacks against RSA, fixed-
: >exponent Diffie-Hellman, and DSS, and the techniques can work with
: >many other systems as well.

: This is a fascinating result. I have a couple of questions, though.

: Focus on the modular exponentiation attack as in Diffie-Hellman.

: The attack seems to depend on the fact that some modular
: multiplications take longer than others. It is not enough that some
: exponents do more multiplications than others. If that were the only
: fact used, the timing information would tell only how many
: multiplications were done, which would tell how many 1 bits were in the
: exponent. Paul Kocher's attack is much more powerful because it uses
: the fact that some modular multiplications take longer, and since the
: attacker knows the input into the multiply, he can look for
: correlations between extra-long overall times and multiplicands which
: lead to slow modular multiplies. I think this is the gist of the
: attack.

: Given this, I am puzzled by one comment in the paper: "Computing
: optional Ri+1 calculations regardless of whether the exponent bit is
: set does not work and can actually make the attack easier; the
: computations still diverge but attackers no longer have to identify the
: lack of a correlation for adjacent zero exponent bits."

: It seems to me that computing all the Ri+1 calculations (that is, doing

: all the multiplies that might be done in a modular exponentiation, but
: discarding those which are not needed) would in fact protect against

: the attack. I recognize that there may still be some slight time

: differences between doing a modular multiply and keeping the result
: versus doing it and discarding the result (due to slightly different
: lengths in the assembly code). But the amount of variation
: should be several orders of magnitude reduced, which should make the
: gathering of adequate statistics more difficult.

: More importantly, the time variation should no longer depend on the
: values of the multiplicands, and as I wrote above it seems to me that
: his attack does depend on this.

: Perhaps I am missing the point of the attack?

: Hal Finney
: hfi...@shell.portal.com

The NY Times article on Kocher's result left out some text at the top
of the 2nd column on the front page. Is the complete article available
on the net?
--
feu...@netcom.com *** Note New Web Page URL ***
Dave Feustel N9MYI For PGP Public Key, finger feu...@netcom.com
Fort Wayne, IN Or else access http://www.mixi.net/~feustel/
219-483-1857

Mutatis Mutantdis

unread,

Dec 12, 1995, 3:00:00 AM12/12/95

to

p...@netcom.com (Paul C. Kocher) writes:

>I've just released details of an attack many of you will find
>interesting since quite a few existing cryptography products and
>systems are potentially at risk. The general idea of the attack is
>that secret keys can be found by measuring the amount of time used to
>to process messages. The paper describes attacks against RSA, fixed-

Wow..... I'm impressed, novice that I am...

Any more work with symmetric algorithms or hash functions? DES comes
to mind: especially if "approved" (?) implementations are in hardware
only...

William Unruh

unread,

Dec 12, 1995, 3:00:00 AM12/12/95

to

In <4ai3jt$6...@GRAPEVINE.LCS.MIT.EDU> Ron Rivest <rivest> writes:

]The simplest way to defeat Kocher's timing attack is to ensure that the

]cryptographic computations take an amount of time that does not depend on the
]data being operated on. For example, for RSA it suffices to ensure that
]a modular multiplication always takes the same amount of time, independent of
]the operands.

]A second way to defeat Kocher's attack is to use blinding: you "blind" the
]data beforehand, perform the cryptographic computation, and then unblind
]afterwards. For RSA, this is quite simple to do. (The blinding and
]unblinding operations still need to take a fixed amount of time.) This doesn't
]give a fixed overall computation time, but the computation time is then a
]random variable that is independent of the operands.

Any chance of these being implimentw\ed into American PGP? Since it uses
RSAREF, and since RSADSI and PGP are not on the best of terms, what is the
chance that PGP can alter RSAREF or use an altered version to impliment
the above? (Mind you for encrypting email, if the attacker can time your
decryption to ms accuracy, he can probably read the exponent from your
memory.)
--
Bill Unruh
un...@physics.ubc.ca

Vernon Schryver

unread,

Dec 12, 1995, 3:00:00 AM12/12/95

to

In article <4ai3jt$6...@GRAPEVINE.LCS.MIT.EDU> Ron Rivest <rivest> writes:
>The simplest way to defeat Kocher's timing attack is to ensure that the
>cryptographic computations take an amount of time that does not depend on the
>data being operated on. For example, for RSA it suffices to ensure that
>a modular multiplication always takes the same amount of time, independent of
>the operands.
>
>A second way to defeat Kocher's attack is to use blinding: you "blind" the
>data beforehand, perform the cryptographic computation, and then unblind
>afterwards. For RSA, this is quite simple to do. (The blinding and
>unblinding operations still need to take a fixed amount of time.) This doesn't
>give a fixed overall computation time, but the computation time is then a
>random variable that is independent of the operands.

Given the attack seems to involve detecting 17 microsecond and smaller
differences in the duration of computations, wouldn't it be sufficient
to do your cyphering on machines to which opponents do not have on-line
direct? Detecting 17 microsecond differences over a network sounds hard.

Yes, I've heard of NTP, DTS, TSP, and other network time protocols.
NTP is pushing for microsecond accuracy, but that does not seem relevant.

Yes, networks are getting faster, but computers are getting quicker
even faster, so those differences will only get harder to see remotely.

Vernon Schryver v...@rhyolite.com

Ralf Brown

unread,

Dec 12, 1995, 3:00:00 AM12/12/95

to

In article <pckDJG...@netcom.com>, Paul C. Kocher <p...@netcom.com> wrote:

}Jim Walters <try...@halcyon.com> wrote:
}>Nice piece of work. Have you considered the possibility of measuring thermal
}>fluctuations of the CPU casing?
}
}No. Is this a significant, measurable effect that somehow reflects
}what's going on inside the CPU?

With modern CMOS-technology processors, current draw (and thus heat loading)

depends primarily on the number of times gates change state. While this
is measurable in the aggregate, I doubt time scales much finer than 100
milliseconds could be achieved even using thermocouples directly attached
to the chip itself, never mind the outside of the casing.

A brief explanation of CMOS: A CMOS gate contains two complementary
(that's the C in CMOS) metal-oxide-semiconductor (the MOS) transistors in
a totem-pole arrangement. In one state, the "upper" transistor is
conducting but not the "lower" one; vice-versa in the other state. Since
MOS transistors are a type of field-effect transistor, no (OK, negligible)
current is required to drive the input of the next gate. Thus, in a
steady state, essentially no current flows.

|
|
|_|
||
_||_ A CMOS gate
| | |
| |
in ------+ +---- out
| |
|_ |_|
||
||_
| |
|
|

When the gate is switched from one state to the other, however, there are
two sources of current flow: first, both transistors will be partially
conductive for a brief moment. Secondly, the (small) capacitance
represented by the wires connected to the output and the inputs of the
following transistors must be charged/discharged.

--
My employer will | I'net: ra...@telerama.lm.com Fido: Ralf Brown 1:129/26.1
deny knowing of | "Man is the only kind of varmint sets his own trap, baits
this message... | it, then steps in it." -- John Steinbeck, _Sweet_Thursday_

John K. Taber

unread,

Dec 13, 1995, 3:00:00 AM12/13/95

to

feu...@netcom.com (David Feustel) wrote:

>The NY Times article on Kocher's result left out some text at the top
>of the 2nd column on the front page. Is the complete article available
>on the net?

If it was the same NY Times I read, the paragraphs were jumbled. It's all there,
just the end of the paragraph is located at the end of the column on the 1st page.

--
John K. Taber
===========================================================================
The "work ethic" was originally not a prescription for happiness, but a
diagnosis of a neurosis. --Richard Todd in _Worth_

Steve Smith

unread,

Dec 13, 1995, 3:00:00 AM12/13/95

to

p...@netcom.com (Paul C. Kocher) wrote:

>I've just released details of an attack many of you will find
>interesting since quite a few existing cryptography products and

OK, now we've got major publications blasting it out to the general
public that all these 'secure' systems really aren't and that any
grade schooler will be able to crack anything on the net and even card
transactions can't be trusted because they might use a type of public
key system. It doesn't matter if the threat is real or not or whether
it can even be practically implemented - the seeds of doubt are being
sown in the minds of the millions who don't know any better and are
watching in horror as the world goes digital. They already feel
threatened because they don't understand it. So... along comes Uncle
Sam who says, "Fear not for I have a security scheme that will protect
you." Now we have these millions of folks, most of whom are middle
aged or elderly (there are no computer illiterates less than 12 years
old) and who happen to comprise the majority of regular voters, who
now have a solution to their fears being handed to them on a silver
platter. We're going to get a government sponsored crypto system
shoved down our throats yet. We're properly self-obligated to report
system weaknesses amongst ourselves and to the world, and that process
is going to be used against us.

One mans opinion.

Steve

Daniel G. Pouzzner

unread,

Dec 13, 1995, 3:00:00 AM12/13/95

to

Actually, it was my assumption from the start that any practical
application of the timing attack to real-world cryptosystems would
need to allow for the same sorts of complications Kerberos introduces.
The only situation that makes the attack easy (nay, bordering on the
banal) is when a blackbox is targeted with chosen plaintext. Though
the NSA has very likely understood the timing attack for many years,
most if not all blackboxes which were not designed by them need to be
reworked lest they be open to wholesale, convenient compromise.

Admitting that practical application of the timing attack is not easy,
I would expect to use sophisticated stochastic modeling, capable of
"seeing" through the noise of packet RT's. Moreover, there is nothing
forcing the attacker to "fly blind:" an attack engine could be
designed that would send adjacent packets to the kdc, one an initial
ticket request, and one an ICMP ping packet. By subsequently sending a
stream of ping packets such that it is reasonable to expect the first
one to reach the kdc just before the earliest expected kdc response
time, the precise compute time can be winnowed down. MIT kdc's are
single-threaded and the hosts on which they run are often
one-trick-ponies (especially in high-security environments), which
makes it even easier to extract the compute time.

I'm not at all suggesting that such an attack is easy, but it just
might be feasible. Conceivable feasibility with current technology is
the threshhold at which I incorporate defenses into a design.

-douzzer

Padgett 0sirius

unread,

Dec 13, 1995, 3:00:00 AM12/13/95

to

In article <30CE45...@halcyon.com> Jim Walters <try...@halcyon.com> writes:
>I guess my thought is is that to get the precision necessary in the timing
>information an intrusive technique with far more obvious security
>implications is necessary. Is this true?

That is exactly what I have been trying to say.

David Sternlight

unread,

Dec 13, 1995, 3:00:00 AM12/13/95

to

Reading your post through an outrage filter, I find nothing to complain
about except your comment that "We're going to get a government sponsored
crypto system shoved down our throats yet." That is inaccurate. If your
fears are borne out, what is accurate is that the public will flock to a
"voluntary" system offered by the government, for the reasons you mention.
(By the way I don't agree, but that's another thread.) It will then become
a de facto standard because your fellow citizens, rightly or wrongly, want
it.

That isn't "shoving down your throat". It's clever marketing. When Mad Ave
does it we marvel at their skill. Why should the government be any less
smart? You want dunces in Washington? (No comments about what we actually
have, please. I know all the clever remarks about the finest Congress
money can buy.)

Recall MY context here--the government officials who are offering what
they say are highly secure voluntary systems with escrow sincerely believe
that they are doing the citizenry a favor while protecting that same
citizenry against bad guys' use of a similar system if it were unescrowed.

David

Bill Stewart

unread,

Dec 13, 1995, 3:00:00 AM12/13/95

to

pad...@goat.orl.mmc.com (Padgett 0sirius) wrote:

>In article <30CE45...@halcyon.com> Jim Walters <try...@halcyon.com> writes:
>>I guess my thought is is that to get the precision necessary in the timing
>>information an intrusive technique with far more obvious security
>>implications is necessary. Is this true?

>That is exactly what I have been trying to say.

You don't have to be particularly intrusive to get timing information
from a smartcard plugged into your smartcard reader, such as a digital
wallet or Nationalized ID Card. Multiple precision arithmetic is slow,
especially for things like exponentiation on 1000-bit numbers, and
smartcards are already slow. If you've got a smartcard plugged
into your own machine, it's possible to get a certain amount of timing
information from it during transactions across a network; you can even
get a certain amount of timing from machines like the blazingly fast
386/20 I'm typing this on.

Of course, being intrusive helps :-)

Apparently the Digicash people have known about this kind of attack for
a while and have tried to design their software and electronic wallets
to use constant calculation times.
Bill Stewart, Freelance Information Architect, stew...@ix.netcom.com

Bill Stewart

unread,

Dec 13, 1995, 3:00:00 AM12/13/95

to

I've come up with two potential defenses against the timing attack on
Diffie-Hellman; they're adaptable to RSA as well.

The timing attack on Diffie-Hellman depends on assumptions
about what multiplications are being made, and in what order.
But you don't need to do them in order.

============== 1 =============================
The standard approach to calculating Y**x mod m is to calculate
Y[1] = Y, Y[2] = Y**2, Y[3] = Y**4, .... Y[logx] = Y**(logx),
and while you're doing this keep a running total r[i], where
r[i] = (bit[x,i]) ? (r[i-1]*Y[i]) : r[i-1]
all arithmetic modulo m (and all indices possibly off by one :-)

This may be a bit memory-intensive for a smartcard, but there's
no need to calculate these partial products in order; precompute
the Y[i], pick a random permutation of 1..(logx), and compute
the partial products in that order. This still leaks the number
of 0 and 1 bits in x, but it doesn't say what they are.
You probably still should multiply r[i-1]*Y[i] whether you're
going to need it or not; I don't think the method hides enough
information otherwise, but that needs more analysis.

Cost - mostly administrative, plus the memory, since keeping track
of permutations of small integers is cheap relative to bignum
multiplication and modulo calculations. You also need a random
number generator of some sort; LFSRs seem to be an easy way to
do permutations on the fly, so seed them with something decent.

How effective is it? I'm not sure - I'd need to do a lot more analysis
than I've done so far, and the long version of Paul's paper would
help :-) But at first glance it looks like it makes it much harder;
no two calculations are in the same order, so feeding the system
related Y = g**y mod m each time doesn't tell you much.

As a further annoyance to the listener, split the permutation
at random into two or three pieces, compute their products separately,
and then multiply those partial products together. (Don't try this at
home without analyzing whether it may leak more information than it
conceals...) At very minimum, take two numbers you've got lying
around and multiply them every once in a while :-)

=================== 2 ========================

Follow-on defense for low-memory smartcards: This is a bit ugly
and I'm not sure how much it protects your information, but it's some
help for systems that can't store 1024 partial products 1024 bits long,
which smartcards generally can't be expected to do :-)

Pick k random values K1, K2, .. Kk, where k is some medium-sized number;

probably about 10 though maybe more would be better.
Calculate Y[i] = Y**2**i, i=1...log x, as before, but instead of
calculating
r[i] = r[i-1] or r[i-1]*Y[i], i=1...logx,
calculate separate subproducts for i={1...K1}, {K1+1...K2}, ...
{Kk...logx}, and then multiply those subproducts together.
The easy way to do this is keep second running product P, and whenever
you reach Kj, set P = P*r[i], and set r[i]=1 for the next round of
(Kj)+1...K(j+1). You still need to calculate r[i-1]*Y[i] whether you're
using it or not.

For added obnoxiousness, at the cost of about 50% more calculation, you
could calculate Yinv = Y**-1 mod m,
and calculate r[i] and Y**i for i = 1...Kj,
calculate through Y[logx] ignoring the r[]s,
and then calculate r[i] and Y[logx] * Yinv**[logx-i] for i=logx....Kj+1.

===============

Paul C. Kocher

unread,

Dec 13, 1995, 3:00:00 AM12/13/95

to

Vernon Schryver <v...@calcite.rhyolite.com> wrote:
>Given the attack seems to involve detecting 17 microsecond and smaller
>differences in the duration of computations, wouldn't it be sufficient
>to do your cyphering on machines to which opponents do not have on-line
>direct? Detecting 17 microsecond differences over a network sounds hard.
>
>Yes, I've heard of NTP, DTS, TSP, and other network time protocols.
>NTP is pushing for microsecond accuracy, but that does not seem relevant.

A very important aspect of the attack is that you identify small
differences by averaging over many samples. The number of samples
required increases as the square of (noise S.D / signal size),
which would bSe approx 3450 samples in this case. (This gives about a
1 S.D. signal, which should be enough for error-correcting attacks.)

According to ping from Best to Netcom (the former hosts cryptography.com,
and the latter I use for e-mail), I got a S.D. of under 1ms -- much
of which may have been due to timing errors anyway since the clock
for ping has just 1ms accuracy and I was running from an overloaded
machine (load>12). These are major ISPs with good connectivity,
but are located in different cities (Mountain View and San Jose).
An attacker with an account at the same ISP as the one hosting the
target computer ought to get even better results.

It's too early to say for sure whether an attack over the Internet
will work, but everything I have seen suggests it ought to be
quite possible between well-connected sites.

Vernon Schryver

unread,

Dec 13, 1995, 3:00:00 AM12/13/95

to

In article <pckDJI...@netcom.com> p...@netcom.com (Paul C. Kocher) writes:

>A very important aspect of the attack is that you identify small
>differences by averaging over many samples. The number of samples
>required increases as the square of (noise S.D / signal size),
>which would bSe approx 3450 samples in this case. (This gives about a
>1 S.D. signal, which should be enough for error-correcting attacks.)
>
>According to ping from Best to Netcom (the former hosts cryptography.com,
>and the latter I use for e-mail), I got a S.D. of under 1ms -- much
>of which may have been due to timing errors anyway since the clock
>for ping has just 1ms accuracy and I was running from an overloaded
>machine (load>12). These are major ISPs with good connectivity,
>but are located in different cities (Mountain View and San Jose).
>An attacker with an account at the same ISP as the one hosting the
>target computer ought to get even better results.
>
>It's too early to say for sure whether an attack over the Internet
>will work, but everything I have seen suggests it ought to be
>quite possible between well-connected sites.

I've been playing with network time protocols since about 1987, when
I wrote something I called `timeslave` to synchronize the corporate
network to decwrl's "satellite clock". It used a 9600 bit/sec
UDP/IP/Cypress link that often suffered assymetric traffic induced
delays of 3 seconds. (Try running `/usr/etc/timeslave -H ???.netcom.com
-mdddd`, if you can get a root shell on 206.86.0.11 or convince best.com
to make `timeslave` SUID=0. It will blabber all kinds of clock delta
stuff.)

Consider that game as it is played today over LANs and the Internet.
You have two clocks based on oscillators, both oscillating at frequencies
that do not change by more than 1 part in 10**7 in a couple of minutes.
The object is to discover the current difference in the frequencies
of the oscillators and get the clocks to read the same time. You use
many samples spread over days, averaged in as many ways as you can
think of. The current commerical state of the art over local area
networs is 10 to 50 millisecond (e.g. stupid TSP or `timed` in 4.*BSD
and IRIX running for days to tune the value of "timetrim"). With very
good oscillators (i.e. "atomic clocks") and care and competance (e.g.
NTP and Dave Mills), you can readily get below 1 millisecond, but even
Dave Mills does not get 1 microsecond. This clock game is far easier.
The signal, the clock difference, is far simpler than your signal,
clock watchers can use the very powerful effects of measuring the
clock differences over spans of millions of milliseconds, and computer
clocks are designed to be consistent despite CPU caches, schedulers,
interrupts, other job, etc.

Your attack is incredibly neat. We all envy you for thinking of it.
It sounds powerful for attacking a directly connected smart cards or
processes in the same computer. It might be effective over a fast LAN.
However, I bet that it would be cheaper and more reliable to carry a
sack of money, machine guns, or rubber hoses from Best to Netcom than
to use this timing attack over BARRNet. For what most of us care about,
email, electronic commerce, or WWW surfing, it seems irrelevant.

Vernon Schryver v...@rhyolite.com

Henry Baker

unread,

Dec 14, 1995, 3:00:00 AM12/14/95

to

In article <4ank9m$7...@cloner2.ix.netcom.com>, stew...@ix.netcom.com
(Bill Stewart) wrote:

> I've come up with two potential defenses against the timing attack on
> Diffie-Hellman; they're adaptable to RSA as well.
>
> The timing attack on Diffie-Hellman depends on assumptions
> about what multiplications are being made, and in what order.
> But you don't need to do them in order.

Isn't this more-or-less equivalent to Rivest's 'blinding' suggestion?

--
www/ftp directory:
ftp://ftp.netcom.com/pub/hb/hbaker/home.html

Henry Baker

unread,

Dec 14, 1995, 3:00:00 AM12/14/95

to

In article <pckDJI...@netcom.com>, p...@netcom.com (Paul C. Kocher) wrote:

> Vernon Schryver <v...@calcite.rhyolite.com> wrote:
> >Given the attack seems to involve detecting 17 microsecond and smaller
> >differences in the duration of computations, wouldn't it be sufficient
> >to do your cyphering on machines to which opponents do not have on-line
> >direct? Detecting 17 microsecond differences over a network sounds hard.
> >
> >Yes, I've heard of NTP, DTS, TSP, and other network time protocols.
> >NTP is pushing for microsecond accuracy, but that does not seem relevant.
>

> A very important aspect of the attack is that you identify small
> differences by averaging over many samples. The number of samples
> required increases as the square of (noise S.D / signal size),
> which would bSe approx 3450 samples in this case. (This gives about a
> 1 S.D. signal, which should be enough for error-correcting attacks.)

As someone else has already pointed out, protocols involving time-stamps
may give you this high-quality information for free!

Donald T. Davis

unread,

Dec 14, 1995, 3:00:00 AM12/14/95

to

Daniel G. Pouzzner writes:
>
>...any practical application of the timing attack to real-world cryptosystems

> would need to allow for the same sorts of complications Kerberos introduces.

>...I would expect to use sophisticated stochastic modeling, capable of

> "seeing" through the noise of packet RT's.

since mr. pouzzner's response to my post, i've also received mail
from paul kocher on the idea of timing attacks on kerberos. mr.
kocher argued essentially as does mr. pouzzner, that the difference
between attacking kerberos and attacking des is just "more data."
in this note, i calculate below just how much more data, and how
much more time, a networked attack on a kdc needs. the punchline:
by my calculation, timing attack on a kerberos server should take
at least 70 trillion times as long as mr. kocher's benchtop attack.
no, that's not a typo: almost 10^14 times as long.

to put this in perspective: 10^9 seconds is 32 years.

to say the least, i am not persuaded that the kdc's response-times
can usefully reflect its encryption-times. several things besides
net-latency contribute variability to the kdc's response-time, and
not all of them will average out gracefully. the most important of
these is disk-latency, whose effect i describe below. another
problem is that the kdc encrypts and decrypts with three different
keys for every request. overall, though, encryptions take up very
little of the kdc's execution-time, and still less of its response-
time. i intend to prepare (for publication) a more thorough analysis
of kerberos' defenses against timing attack, but here's a summary of
why i think networked timing won't work. the summary is intricate
and long, for which i apologize.

the only kerberos request that offers known plaintext for all kdc
encryptions is the basic login request, in which the kdc prepares
a random session-key, and encrypts it twice: once in its own key,
and once in the user's password. in a login request, the kdc must:

* look up the user's password-key in the key database,
* decrypt it with the password-database's master key,
* prepare a new login session-key,
* encrypt it twice: once with the pw-key, and
* again with the kdc's own secret-key.

there are three encryptions here, with two keys unknown to the
attacker, and one (his own pw) that he does know (actually, the
session-key rng uses des, too, but this is an inessential use).
for the ticket encryptions, the attacker knows the plaintext.
for the database-decryption, he knows only the decrypted result,
his pw-key. note that the timing-attack is a known-plaintext
attack, so the attacker will be able to study only his own requests,
whose session-keys he knows. this greatly limits the number of
login-encryptions he can gather for his statistical analysis,
to 300 per day (the kdc caches requests and its responses for
5 minutes; repetitions of a request get the cached response. a
user's requests can miss the cache 24 * 60 / 5 = 288 times a day).

mr. kocher says that as long as the request-to-response round-trip
timings have a std-dev of less than 1 millisec, then he expects
his analysis will be able to detect the encryption-variability
signal, despite the round-trips' variability. he also believes that
he can reduce the round-trip variance, by discarding as "outliers"
all but the fastest round-trips. this so reduces the amount of data
the attacker must gather, that it more than compensates for the
slower data-collection.

this approach will not suffice, thoough, because the kdc accesses
the disk for every request. thus, a highly variable component of
the kdc's response-time is the disk's rotational latency: the wait
for the desired block to arrive at the r/w head. this delay is a
uniformly-distributed variable, with values between 0 msec and the
disk's rotational period (17 msec for a 3600 rpm motor). the standard
deviation of this distribution is 5 msec. thus, casually gathering
kerberos' response times cannot meet mr. kocher's upper limit of
1 msec for the timings' std-dev.

the only way to reduce this variability would be to discard all but
the very shortest response-times, which result when the disk head
doesn't have to move, and when the request arrives at the kdc just
before the attacker's password arrives at the head. suppose that
we can control the head's prior position by requesting a ticket for
a service whose its secret-key is on the same track as the attacker's
password (assuming there is one -- server-keys are the minority of a
kdb's contents). once the head is positioned, we request new login
tickets. then, recall that we only have 300 requests per day from which
to choose. of these only 1/5, or 60 responses per day, will jointly
have a std-dev of 1 msec. but, even with smaller variance, the timing
analysis requires many timed encryptions. we can estimate how many
more we'll need, with 1 msec std-dev.

fast des implementations on fast machines can encrypt a des-block
in 20 usec. let's suppose, then, that key-dependent variation in des
encryption time is as big as 1 usec, or 5%. then a std-dev in network-
timing of 1 msec makes the attack require (1000/1)^2, or 1 million,
times as many response-times as mr. kocher's non-networked attack on des.
further, usable network timings come much more slowly than cpu timings,
greatly slowing the attack. specifically, instead of getting 50 timings
per millisecond, we can get 60 per day. so, at a rough cut, it seems that
networked attack takes a million times as many samples, but each takes
70 million times as long to collect (1400 sec, instead of 20 usec),
as in the benchtop attack. that's a factor of 7 * 10^13, or a lot of
pentiums.

in conclusion, i find that if we ignore server load, net latency,
and other sources of response-delay variability, an attacker might
be able to gather 60 usable response-timings per day, from which he
must build up a large enough sample to apply mr. kocher's timing
analysis. in that analysis, the attacker must deduce two unknown
keys simultaneously, because every kdc response uses the kdc's two
internal keys. at a rough estimate, networked collection slows the
attack by fourteen orders of magnitude.

it's implausible to claim that an analysis of precise encryption-
timings extends to a networked attack on imprecise response-times.
i think the burden of proof falls to mr. kocher. we were convinced
by his having cracked diffie-hellman. a similar, concrete demonstration
would be persuasive in the present case, too.
-don davis, boston

Hal Murray

unread,

Dec 14, 1995, 3:00:00 AM12/14/95

to

In article <padgett.10...@goat.orl.mmc.com>, pad...@goat.orl.mmc.com (Padgett 0sirius) writes:
> While I *think* I understand what is going on (the significance of a 17 usec
> delta in a process that has a 188 usec standard deviation of the cycle time
> eludes me though), it seems to me that it would require direct connection to
> the CPU, preferably with a logic analyzer, possibly at a directly connected
> terminal and with knowlege of exactly when the key is being processed.
> Certainly not over a packet switched network.

Be careful. I wouldn't want to try to measure anything that accurately
over a normal internet link going through several routers, but this is
cryptography. You have to cover all the bases.

It is easy to measure round trip times to the microsecond, or even
fraction of a microsecond. Alphas have an on chip register that counts
raw CPU clocks. (So do other systems). A 100 MHz machine is slow these
days, but that gives you 10 nanosecond resolution.

Response times on lightly loaded networks and systems can be
very consistent.

So if the network is lightly loaded and the system is lightly loaded,
an attacker might be able to get a lot more information than you would like.

Markus Kuhn

unread,

Dec 14, 1995, 3:00:00 AM12/14/95

to

stew...@ix.netcom.com (Bill Stewart) writes:

>I've come up with two potential defenses against the timing attack on
>Diffie-Hellman; they're adaptable to RSA as well.

[description deleted]

Why must this be so complicated? All smart card microcontrollers
(typically Intel 8051 or Motorola 6805 based CPU cores) which I have
seen so far have an on-chip timer/counter. If your encryption routine
requires e.g. between 22 and 42 ms depending on the keys and data
used, then start the timer before entering the cryptographical code
with the timing leak and just wait after the encryption routine until
the timer tells you that 50 ms have passed. This will cost you less
than 10 lines of assembler code and around 10 minutes of the time of
an experienced microcontroller programmer.

If you work in C on Unix, then use an itimer instead of the hardware
timer.

Adam L. Beberg

unread,

Dec 14, 1995, 3:00:00 AM12/14/95

to

Seems all of our overoptimization has come back to haunt us...
The solution is simple, don't optimize. (or do something else for a while)

For hardware crypto, design the card to produce results in a fixed number
of clock cycles. (This also happens to make syncronous designs easier,
but that's for another group)

For software it is a bit trickier, but the same idea applies,
tho certain OS's are going to fight it. Have the OS hold the result/process
until a fixed delay has occured after the start of the encryption.
In an OS that had encryption in mind during design, the operation can also
be atomized so that intermediate results are never available.
If the attacker can bypass this delay, then they can just read
your key out of memory because the system is already compromised.
Also there is almost zero _system_ performance loss, as the CPU can go
about its business working on that solitare game while the delay passes.

While Paul C. Kocher's results are nice, and have added to the public
paranoia, the workaround is not that complex iff you're willing to wait
a couple milliseconds. (I believe the average lifespan is up to 2398377600
seconds now, so who will miss it)

Security is paramount, delay is mearly an annoyance.
Performance should never be the be-all-end-all.

--
Adam L. Beberg * beb...@mithral.com * (312)-808-6894
Mithral Communications & Design Inc * http://www.mithral.com/
))) In Stereo Where Available ((( * http://mithral.iit.edu:8080/~beberg/

Larry

unread,

Dec 15, 1995, 3:00:00 AM12/15/95

to

beberg@indy (Adam L. Beberg) writes:

>For hardware crypto, design the card to produce results in a fixed number
>of clock cycles. (This also happens to make syncronous designs easier,
>but that's for another group)

For the hardware people: Would it be possible to measure power
differences to detect when a smart card is working away on a crypto
algorithm and when it's just NOP-ing to pass cycles? If so, one could
still perform a timing attack even though the result was returned in a

fixed number of clock cycles.

--L

Adam Shostack

unread,

Dec 15, 1995, 3:00:00 AM12/15/95

to

Larry (l...@panix.com), (in article <4ar456$e...@panix3.panix.com>) wrote:
>p...@netcom.com (Paul C. Kocher) writes:

>>It's too early to say for sure whether an attack over the Internet
>>will work, but everything I have seen suggests it ought to be
>>quite possible between well-connected sites.

>Would it be possible to thwart this by introducing random delays at
>the router/hub/network-interface level so as to force the required
>number of samples into an infeasible range? (Assuming that other
>network traffic isn't already doing this for you.) If this is
>possible, would it be practical to implement without an unacceptable
>degradation of network performance?

Often the router is run by a network group, and the web server
by another. If the web group does convince the network people to
change the routers, then 6 months down the line, at the next router
upgrade, the change will be forgotten, and optimised out.
Alternately, someone will break into a machine close enough to the web
server to get past this layer of obscurity.

Trusting other people to fix your problems, and keep them
fixed, is a long term loss. The problem is in the crypto, the
solution should be in the same place.

Adam

--
"It is seldom that liberty of any kind is lost all at once."
-Hume

Elvis Chen

unread,

Dec 15, 1995, 3:00:00 AM12/15/95

to

Paul C. Kocher (p...@netcom.com) wrote:
: I've just released details of an attack many of you will find
: interesting since quite a few existing cryptography products and

: systems are potentially at risk. The general idea of the attack is
: that secret keys can be found by measuring the amount of time used to

: to process messages. The paper describes attacks against RSA, fixed-

: exponent Diffie-Hellman, and DSS, and the techniques can work with
: many other systems as well.

greetings:

It is not intuitive to me that such timing-attack will be
effective against some private key cryptosystems such as DES or CAST.
It is my understanding that all operations in DES/CAST are done
at bit-level (XOR). Even key-scheduling. Unless it takes variable
amount of time for XOR gate to produce answers between different inputs,
it is not clear to me how timing-attack can be applied to DES/CAST.

Perhaps timing-attack can be applied to PROTOCOLS of DES/CAST, but
not to DES/CAST itself?

thanx for any clearification,

Elvis

Larry Kilgallen

unread,

Dec 16, 1995, 3:00:00 AM12/16/95

to

In article <4artn7$s...@panix3.panix.com>, l...@panix.com (Larry) writes:

> For the hardware people: Would it be possible to measure power
> differences to detect when a smart card is working away on a crypto
> algorithm and when it's just NOP-ing to pass cycles? If so, one could
> still perform a timing attack even though the result was returned in a
> fixed number of clock cycles.

To be truly _smart_ a card should then choose Multiply or Divide for its
NOP instructions.

Larry Kilgallen

David Sternlight

unread,

Dec 16, 1995, 3:00:00 AM12/16/95

to

In article <4atpti$q...@info.evansville.net>, bco...@investorweb.com (Bob
Costa) wrote:

>How about if somebosy puts a WAIT(Random(x)) inside the decryption
>program? Seems like a simple fix, and I will give up a couple of
>seconds of speed for security... No complicated network solutions are
>needed. Or did I miss something???

At first I thought that too. Further reflection suggests that with the
same data in and out, simple filters can remove a random component.
Actually there's some randomness in there you'd have to remove anyway, so
introducing some more won't make much difference.

The thing that would defeat the attack would be for the crypto software to
time the decryption or encryption and add a pad delay to bring the elapsed
time to a fixed value before returning the results, thus frustrating the
cracking software.

David

Bob Costa

unread,

Dec 16, 1995, 3:00:00 AM12/16/95

to

l...@panix.com (Larry) wrote:

>p...@netcom.com (Paul C. Kocher) writes:

>>It's too early to say for sure whether an attack over the Internet
>>will work, but everything I have seen suggests it ought to be
>>quite possible between well-connected sites.

>Would it be possible to thwart this by introducing random delays at
>the router/hub/network-interface level so as to force the required
>number of samples into an infeasible range? (Assuming that other
>network traffic isn't already doing this for you.) If this is
>possible, would it be practical to implement without an unacceptable
>degradation of network performance?

>What about something at the protocol layer, where packets that contain
>vulnerable cyphertext are transmitted to the network in coarse-grained
>quantal units?

>While these are both very similar to suggested CPU workarounds, might
>these network solutions be easier to implement than modifying the
>cryptosystems themselves?

How about if somebosy puts a WAIT(Random(x)) inside the decryption
program? Seems like a simple fix, and I will give up a couple of
seconds of speed for security... No complicated network solutions are
needed. Or did I miss something???

Follow me to InvestorWEB at http://www.investorweb.com/ for the most complete directory of public companies and investment information on the World Wide Web. Authors needed for educational articles on public stock and private equity investing!! Reach me by mail at bco...@investorweb.com.
*****************************************************************
PGPKey fingerprint= 9B 42 8B 1B 83 F0 DD 09 EF 5C A2 46 3A 15 F4 76

Igor Chudov

unread,

Dec 16, 1995, 3:00:00 AM12/16/95

to

David Sternlight (da...@sternlight.com) wrote:
* At first I thought that too. Further reflection suggests that with the
* same data in and out, simple filters can remove a random component.
* Actually there's some randomness in there you'd have to remove anyway, so
* introducing some more won't make much difference.
*
* The thing that would defeat the attack would be for the crypto software to
* time the decryption or encryption and add a pad delay to bring the elapsed
* time to a fixed value before returning the results, thus frustrating the
* cracking software.

Maybe (to allow for very different processing times) the time should be
padded to the minimal value of T0 or (k ** M), where k is a fixed constant
(2 or 3), and M is an integer, whichever is greater?

T0 should be chosen so that most times padding to T0 should suffice.

--
- Igor. (My opinions only) http://www.algebra.com/~ichudov/index.html
For public PGP key, finger me or send email with Subject "send pgp key"

You know you have achieved perfection in design, not when you have nothing
more to add, but when you have nothing more to take away.
- Antoine de Saint Exupery.

Vernon Schryver

unread,

Dec 16, 1995, 3:00:00 AM12/16/95

to

In article <4ar456$e...@panix3.panix.com> l...@panix.com (Larry) writes:
>p...@netcom.com (Paul C. Kocher) writes:
>
>>It's too early to say for sure whether an attack over the Internet
>>will work, but everything I have seen suggests it ought to be
>>quite possible between well-connected sites.
>
>Would it be possible to thwart this by introducing random delays at
>the router/hub/network-interface level so as to force the required
>number of samples into an infeasible range? (Assuming that other
>network traffic isn't already doing this for you.) If this is
>possible, would it be practical to implement without an unacceptable
>degradation of network performance?
>
>What about something at the protocol layer, where packets that contain
>vulnerable cyphertext are transmitted to the network in coarse-grained
>quantal units?
>
>While these are both very similar to suggested CPU workarounds, might
>these network solutions be easier to implement than modifying the
>cryptosystems themselves?

No, introducing random delays would not, in theory, thwart this attack.
Random delays would only make it more difficult. Routers, bridges,
queues in the sender, receiver, and forwarders, other network traffic,
and even traffic on other, unrelated networks that occupy the attention
of routers, bridges, and hosts (e.g. routing protocol traffic) add
plenty of random delays to any real network. Network time protocols
are quite successful and based on the idea that you can filter random
delays to measure time differences, although to orders of magnitude
less precision than this attack requires.

I believe that the random delays in any substantial network are already
so great that this attack is a complete waste of time if it is based
on direct measurements. The precision needed would require a ridiculous
number of samples, except perhaps when the network consists of a single
segment of an otherwise idle LAN.

As someone else has pointed out, if the victim helpfully includes
timestamps with the encrypted data, the attack might become practical
on a real network. I think the victim would need to include timestamps
that imply both the starting and ending of the computation. Otherwise
you would be back to the probably intractible difficulty of deducing
network delays to microsecond precision.

Vernon Schryver v...@rhyolite.com

eic...@cygnus.com

unread,

Dec 16, 1995, 3:00:00 AM12/16/95

to

| It is my understanding that all operations in DES/CAST are done
| at bit-level (XOR). Even key-scheduling. Unless it takes variable

Rotation is also commonly used. It turns out, though, that most
(possibly all) modern 32-bit architectures have (1) constant time
rotates (2) slow branches [due to pipelining.] The latter matters when
you only have a shift and not a rotate - one suggested implementation
of rotate-left(1) is "check the high bit; shift-left(1); if the high
bit was set, set the low bit" but this involves a test and branch, and
doesn't generalize to multiple-bit shifts. A more common
implementation would be "rotate-left(A,N): B=shift-right(A,wordsize-N);
A=shift-left(A,N); A= A .OR. B" which is constant time. (This also
maps closely to a reasonable expression of rotate in terms of the C <<
and >> bitshift operators.)

The particular architectures I've investigated include SPARC, PPC601,
and x86 (x>=3 have a constant time shift and rotate, but for x<=2,
the time is a function of the number of bits, *not* the data being
shifted.) I believe HPPA and others to be similar but don't have
reference books handy.

There is still a potential vulnerability in the use of large tables
for DES internal state (a traditional optimization) due to the cache
access times, but that's both harder to identify (as it depends on
vastly more factors) and provides fewer bits of leakage anyhow. More
usefully, newer DES implementations tend to use *smaller* tables to
get the additional speed from having them fit entirely within the
processor cache, eliminating the attack altogether.

As for use of DES within protocols -- as Don Davis has pointed out,
Kerberos in particular caches key schedules for the Kerberos Master
Key and the Ticket Granting Service Key, eliminating attacks on the
key scheduling pass.

The attacks on "expensive math" cryptography are still very important,
and undiminished by these results -- but most bit-fiddling algorithms,
at least on "modern" CPU's tend to be resistant. Even small cpu's like
the 80186 or 68HC11 have rotate operators that don't depend on the
data being rotated, which is sufficient for constant-cycle-count DES
implementation at least.

_Mark_ <eic...@cygnus.com>
Cygnus Support
Cygnus Network Security <network-...@cygnus.com>
http://www.cygnus.com/data/cns/

William Unruh

unread,

Dec 16, 1995, 3:00:00 AM12/16/95

to

In <david-16129...@lax-ca23-01.ix.netcom.com> da...@sternlight.com (David Sternlight) writes:

>In article <4atpti$q...@info.evansville.net>, bco...@investorweb.com (Bob
>Costa) wrote:

>>How about if somebosy puts a WAIT(Random(x)) inside the decryption
>>program? Seems like a simple fix, and I will give up a couple of

>At first I thought that too. Further reflection suggests that with the

>same data in and out, simple filters can remove a random component.

>Actually there's some randomness in there you'd have to remove anyway, so

>introducing some more won't make much difference.

Sorry, but I do not understand what you mean. All the data you have is
the total time. putting in the same input could tell you the
characteristics of the noise, but is does not help you in removing the
noise from the next instance. You can average the lots of signals, but
that helps only as the square root of the tries ( which is uselss if say
you put in a signal with a 1 sec std deviation- you would need 10^8
tries to average out that noise to 100microsec needed) So yes, putting
in such a ramdom wait would help, as long as you made sure that the wait
was not quantised in units larger than the encryption time. (eg, if the
wait was quantised in 1 sec intervals, while the decryption time was
1ms, teh wait would be easy to subtract out)
--
Bill Unruh
un...@physics.ubc.ca

Tom Perry

unread,

Dec 17, 1995, 3:00:00 AM12/17/95

to

Larry (l...@panix.com) wrote:
: beberg@indy (Adam L. Beberg) writes:

: >For hardware crypto, design the card to produce results in a fixed number
: >of clock cycles. (This also happens to make syncronous designs easier,

: >but that's for another group)

: For the hardware people: Would it be possible to measure power

: differences to detect when a smart card is working away on a crypto
: algorithm and when it's just NOP-ing to pass cycles? If so, one could
: still perform a timing attack even though the result was returned in a
: fixed number of clock cycles.

As a hardware person, I'll respond with that most useful of all
answers - It depends... :-)

If the hardware and software have been designed to resist this sort of
attack, then the answer is "No." Ideally, the smart card should present a
constant power load to the terminal. A shunt regulator on the smart card
together with appropriate filtering could be used to achieve this objective.

On the other hand, if the smart card was designed using commonly
accepted design practices and did not consider this possible attack,
it is quite possible to extract significant information from its
time-varying load characteristics. This is particularly true if you are
able to control the clock going to the smart card.

A person having an intimate knowledge of the hardware design and
software algorithms being executed __may be able__ to trace the
execution of the algorithm, determining which branches were taken and,
in some cases, the overall nature of the data being transferred; i.e.,
mostly ones, mostly zeroes. This is particularly true of processor
designs which are highly sequential in nature, as opposed to having a lot
of internal parallelism. In cases of "large" numbers such as those
commonly used in encryption, since they require multiple bus cycles to
transfer, it may be possible to characterize the nature of the data in
each part of the number; again, mostly ones, mostly zeroes.

Combined with the techniques described in Paul Kocher's paper, this could
provide a very formidable attack. I'm not saying that this would be
easy to do, only that it seems feasible to mount such an attack with a
modest budget -- thousands, or tens of thousands of dollars, rather than
millions. This should make a very interesting research project for
someone who hopefully has good ethics and sound moral character :-).

Regards,
Tom Perry

Frank O'Dwyer

unread,

Dec 18, 1995, 3:00:00 AM12/18/95

to

d...@cam.ov.com (Donald T. Davis) wrote:
>this approach will not suffice, thoough, because the kdc accesses
>the disk for every request. thus, a highly variable component of
>the kdc's response-time is the disk's rotational latency: the wait
>for the desired block to arrive at the r/w head.

This does not follow, unless the kdc accesses the raw disk. If
the kdc accesses a _filesystem_, then most OSes will operate
some form of disk caching. This is also true of some hardware.

Cheers,
Frank O'Dwyer

Greg Limes

unread,

Dec 19, 1995, 3:00:00 AM12/19/95

to

-----BEGIN PGP SIGNED MESSAGE-----

_____________________________ signed message starts here

In article <4aq7mg$7...@condor.acc.iit.edu>,

beberg@indy (Adam L. Beberg) writes:

| For software it is a bit trickier, but the same idea applies,
| tho certain OS's are going to fight it.

This is not an OS issue.

Write the software to do precisely the same operations, then route the
results appropriately; with adequate care, not only are the operations
constant, but the cache effects are constant as well.

Code that currently says

if (condition)
result = (large expression);

can be rewritten (taking adequate care that condition evaluates in a
time-constant fashion to "0" or "1") to be

result[condition] = (large expression);

Of course, the fact that result[1] is the data actually used
later in the task is immeterial. If you care about the cache
dirtiness of result[0] and result[1], just add

condition ^= 1;
result[condition] = result[condition];

(hey -- don't forget to leave the optimizer off! :-)

Alternately,

tmp = (large expression);
result[0] = (result[0] & cmask) | (tmp &~ cmask);
result[1] = (result[1] & ~cmask) | (tmp & cmask);

(with cmask calculated as 0 or ~0 appropriately).

I'm not [yet] a regular reader of this group, I tend to dip into it as
time allows, but this sort of thing falls out of what I do for a
living (altho it is a bit of a stretch -- many CPUs go faster when
they don't branch, which leads naturally to constant-time-execution
algorithms). Please forgive me if the above was obnoxiously obvious,
but it seems to have been overlooked as a solution at least in the
articles I have on hand.

[but I agree -- care *must* be taken to close this covert channel if
you think that someone can monitor it.]
____________________________________________ sigs follow

|| Greg "0x1873DB65 128B3043AA888EF7 DD5097D284FD5A5C" Limes
|| GCS/E d- C+$ UI+++++$ E+ W++ N+ PS+ PGP++ b+++ DI++ e++ y++

-----BEGIN PGP SIGNATURE-----
Version: 2.6.2
Comment: Processed by Mailcrypt 3.3, an Emacs/PGP interface

iQCVAwUBMNcSS/kjmVkYc9tlAQFY/wP9H8gt6VPf90PqSVfKakeD3gbkxXs07Ytr
5UpXtKQ2bGjU5yidZBl0NKuFv1gwJszq3wwfNmXqIxSpyJbTiIKFmyIIUmr/zCo9
HjnugZEA/XpF1dqQjejAiJ2u8dGbJmPwjBWxn6XSdqaSg8tltjuovkCMJjZxghjV
rTlyITHT0DI=
=TgGn
-----END PGP SIGNATURE-----

WHMurray

unread,

Dec 19, 1995, 3:00:00 AM12/19/95

to

In article <4atpti$q...@info.evansville.net>, bco...@investorweb.com (Bob

Costa) writes:

>How about if somebosy puts a WAIT(Random(x)) inside the decryption
>program? Seems like a simple fix, and I will give up a couple of

>seconds of speed for security... No complicated network solutions are
>needed. Or did I miss something???
>
>

Perhaps. You have certainly raised the work factor. However, it has
been suggested that over a sufficiently large number of tries it will
average out, leaving the background timing visible. Ron Rivest has
suggested what he calls "blinding," in which one ensures that the timing
seen is constant and not a function of the key.

Nonetheless, while it is an interesting attack and will obviously have to
be resisted in future implementations, it was never of interest in
classical cryptography because timing would never have been visible to an
attacker. In modern systems it need not be. This is simply another case
in which theoretical cryptanalysis contributes more to security than to
practical attacks. I do not see any security people becoming alarmed to a
level that would justify a front page article in the New York Times.

This is orders of magnitude less serious than the fundamental internet
vulnerability pointed out by the Berkeley students and that was dismissed
out of hand.