Security advisory: timing attack in HMAC verification

1,392 views
Skip to first unread message

Nate Lawson

unread,
May 26, 2009, 1:35:56 PM5/26/09
to Keyczar Discuss
There is a security flaw in the current version of the Google Keyczar
library. The impact is that an attacker could forge signatures for
data that was "signed" with the SHA-1 HMAC algorithm (the default
algorithm).

Firstly, I'm really glad to see more high-level libraries being
developed so that programmers don't have to work directly with
algorithms. Keyczar is definitely a step in the right direction.
Thanks to all the people who developed it.

The problem is that the HMAC verify function (python src/keyczar/
keys.py, Java src/org/keyczar/HmacKey.java) leaks timing information
based on how long a verify operation takes to fail. The function is
defined as follows for the HMAC mode:

Python:
def Verify(self, msg, sig_bytes):
return self.Sign(msg) == sig_bytes

Java:
return Arrays.equals(hmac.doFinal(), sigBytes);

Since the return value is a SHA-1 hash string, the operation devolves
to a byte-by-byte compare against sig_bytes. In both python and Java,
this is a classic sequence comparison that terminates early once an
incorrect match is found. This allows an attacker to iteratively try
various HMAC values and see how long it takes the server to respond.
The longer it takes, the more characters he has correct.

It may be non-intuitive, but the symmetric nature of MACs means the
correct MAC value for an arbitrary message is a secret on-par with key
material. If the attacker knows the correct MAC for a message of his
choosing, he can then send that value to forge authentication of the
message to the server.

I've implemented a simple test server using the python version of
Keyczar. It verifies an HMAC and sends back "yes" or "no" if the value
is correct. I then wrote a client in C that connects to the server and
tries various values for the HMAC. It tries each value multiple times
and records a set of TSC differences for each. These can be fed to a
program like ministat to decide when a significant difference has been
confirmed (based on mean and stddev).
http://www.freebsd.org/cgi/man.cgi?query=ministat&apropos=0&sektion=0&manpath=FreeBSD+8-current&format=html

I can confirm that localhost tests have a discernible difference,
based on whether the first byte is correct. I have not optimized the
attack to work over a LAN or the Internet yet. However, this does not
mean remote attacks are infeasible. Where jitter and other noise is
present in the samples, an attacker just needs to collect more data to
average it out. Remote timing attacks on SSL have been demonstrated
where the timing difference was only a few native multiplies.
http://crypto.stanford.edu/~dabo/abstracts/ssl-timing.html

I recommend changing the verify function to use a timing-independent
compare, such as the following. A similar change should be made to the
Java library.

###
correctMac = self.Sign(msg)
if len(correctMac) != len(sig_bytes):
return False
result = 0
for x, y in zip(correctMac, sig_bytes):
result |= ord(x) ^ ord(y)
return result == 0
###

This function is data-independent, except for revealing the total
length of the correctMac string. Since this is not considered
important to security, it is acceptable. Of course, this might not be
true for another use of this same code, so it cannot be blindly used
in other applications. Crypto flaws are subtle.

Please let me know if you have any questions. I hope this feedback can
help improve Keyczar.

Thanks,
Nate

--
Nate Lawson
Root Labs :: www.rootlabs.com
Solving embedded security, kernel and crypto challenges

Steve Weis

unread,
May 26, 2009, 1:50:23 PM5/26/09
to Keyczar Discuss
Thanks to Nate for this find. I'll be updating the Python and Java
implementations with a fix soon.
>  http://www.freebsd.org/cgi/man.cgi?query=ministat&apropos=0&sektion=0...

Steve Weis

unread,
May 28, 2009, 10:22:22 PM5/28/09
to Keyczar Discuss
I've checked in the fixes for this issue and updated the downloads on
the project page.

Thanks again to Nate for this find.

Nate Lawson

unread,
May 29, 2009, 3:10:22 PM5/29/09
to Keyczar Discuss
On May 28, 7:22 pm, Steve Weis <stevew...@gmail.com> wrote:
> I've checked in the fixes for this issue and updated the downloads on
> the project page.
>
> Thanks again to Nate for this find.

Thanks, Steve. I reviewed the diffs and your changes look good. Thanks
for fixing it so quickly.

Tim Yardley

unread,
Jun 1, 2009, 4:46:49 PM6/1/09
to Keyczar Discuss
All;

I agree that the current fix is definitely a step in the right
direction. However, it concerns me that the length is now leaked.
Nate points this out in his original post and I think it was glossed
over in that it doesn't matter for this context. It could matter in
an HMAC context if truncation was being used, as the verification
scheme would then disclose what the minimum truncated value was.

Would it not be better to do a similar procedure, but instead always
compare for constant time based on the size of the input? I remember
seeing some discussion on python-3000 about making zip raise an error
if the two arrays are of different size, but at the moment at least
that is not the case. Even so, another method other than zip could be
used. The proposed code would eliminate that issue I think as well.
As such, might the code be better suited being changed to something
like the following:

===
import itertools

correctMac = self.Sign(msg)
result = 0
if len(correctMac) >= len(sig_bytes):
short = sig_bytes
long = correctMac
else:
short = correctMac
long = sig_bytes
for x, y in zip(long, itertools.islice(itertools.cycle(short), 0,
len(sig_bytes))):
result |= ord(x) ^ ord(y)
if len(correctMac) != len(sig_bytes):
return False
else:
return result == 0
===

This has the benefit (and drawback) of always evaluating the MAC but
doesn't expose the length of the correct MAC. It also constantly
evaluates to the number of bytes given for the input, rather than the
output of the sign. And always returns false if their original length
doesn't match, which prevents false positives. Is there a drawback I
am not seeing?

The above code could be done more efficiently, so feel free to modify.

/tmy

Steve Weis

unread,
Jun 3, 2009, 3:07:48 AM6/3/09
to keyczar...@googlegroups.com
Hi Tim. I agree that currently if the attacker controls the length of one of the inputs in the comparison, then they could discover the length of the other input. In this case, they could learn how long "correctMac" is.

Just to speculate about other potential timing attacks in the suggested code, I think an attacker could also learn len(correctMac) if:
- there is any timing difference between the two branches of the first conditional
- itertools.cycle() had periodic timing based on the length of its input, e.g. if outputting the first element in a cycle were slower than iterating to any other element

I don't think either of these sound feasible, but don't know enough about what's going on under the hood in Python to say for sure.

Thomas Dixon

unread,
Jun 4, 2009, 1:00:19 PM6/4/09
to Keyczar Discuss
How about...

###
import operator

correct = self.Sign(msg)
result = (reduce(operator.or_, [ord(x)^ord(y) for x,y in zip
(sig_bytes, correct.ljust(len(sig_bytes), '0'))]) == 0)
###

Thomas Dixon

unread,
Jun 5, 2009, 6:31:40 PM6/5/09
to Keyczar Discuss
I decided to flesh it out a little more to be essentially just a
refactoring of Tim's code. The previous version also should have had
the variables in the list comprehension (correct, sig_bytes) inverted.
In this one I use 'supplied' as opposed to 'sig_bytes' to be a little
more descriptive. Apologies if I'm continuing coding a solution to
which there is no problem.

###
from operator import or_

correct = self.Sign(msg)

if len(correct) >= len(supplied):
long, short = correct, supplied
else:
long, short = supplied, correct

result = (reduce(or_, [ord(x)^ord(y) for x,y in zip(long, short.ljust
(len(long), '0'))]) == 0 and len(short) == len(long))
###

Nate Lawson

unread,
Jun 6, 2009, 11:06:36 PM6/6/09
to Keyczar Discuss
Tim, while I made sure to point out the issue that the MAC length is
leaked, I don't think it's important to make the code more complicated
to try to hide this case. I don't see how a truncated MAC is any
different than a regular one. If you've truncated it so much that an
attacker can perform a brute force attack, then the MAC is insecure
and hiding its length is only security through obscurity. In fact,
Keyczar should not allow truncated MACs below some security threshold
(80 bits?)

Also, as Steve points out, your modified version (and Thomas Dixon's
from June 5) has a smaller but still present timing leak. The short-
circuited evaluation of the "and len != ..." case is still a
difference. I don't think it's worth the effort to try to protect
against this attack, given that it doesn't seem to give the attacker
any advantage.

Tim Yardley

unread,
Jun 12, 2009, 5:05:40 PM6/12/09
to Keyczar Discuss
Nate;

I humbly disagree about truncation and length exposure. There are a
number of NIST documents that use, or recommend the use, of truncation
in certain circumstances. Also, the security of the HMAC and it's
truncation isn't quite cut and dry. You can read more about NISTs
recommendations in that realm by looking at Special Publication
800-107 (http://csrc.nist.gov/publications/nistpubs/800-107/NIST-
SP-800-107.pdf).

The notion of security through obscurity in relation to truncation is
a viable argument for the moment, but that doesn't mean that it will
hold water. At the moment, there are many algorithms that are not
broken that may one day be broken. Truncation is one of those still
relatively unexplored arenas. A key component in breaking some of
these crypto mechanisms in the future, may end up being the length of
the valid MAC. Why take the chance and leak that information when
there is no need to?

Steve pointed out that there could be timing differences related to
the conditional checks, and sure.. that's possible. You can't do much
to factor those away as branch prediction, compiler differences,
optimization levels, etc will play a factor. Regardless, the
statistical significance of those timing differences should be near
non-existent in comparison to the rest of the operations. One could
argue that they would therefore be such a tight constraint that the
attacker would not be able to use them to their advantage.

Further, if you wanted to *try* to fact them out, make everything
check the conditional and its corollary rather than allow for an else
branch. That eliminates branch prediction.

I haven't tested the validity of Thomas's refactor, but a rough glance
looks okay. I think the slight extra complexity is well worth a
potential risk down the road and I'd urge the maintainers of Keyczar
to consider it as a replacement solution, with some comments
justifying the reason for the slight extra complexity.

/tmy

Sébastien Martini

unread,
Jun 21, 2009, 3:59:26 PM6/21/09
to Keyczar Discuss
Hi,
>  http://www.freebsd.org/cgi/man.cgi?query=ministat&apropos=0&sektion=0...
The C++ implementation also had this vulnerability this commit fixes
it http://code.google.com/p/keyczar/source/detail?r=422 the tarball
has also been updated.

Reply all
Reply to author
Forward
0 new messages