Branch Prediction - Thy New Enemy?

Tom St Denis

unread,

Aug 24, 2006, 8:52:07 AM8/24/06

to

A paper about using branch prediction as a side channel vector of
attack.

Whoop!

http://eprint.iacr.org/2006/288

Tom

xmath

unread,

Aug 24, 2006, 5:04:55 PM8/24/06

to

Tom St Denis wrote:
> A paper about using branch prediction as a side channel vector of
> attack.
>
> Whoop!

Heh, am I glad I've just (as of version 200608242056) removed the last
bits of data-dependent branching from my ecdh and sign/verify code.
Only generating a key-pair for signing remains vulnerable, but it's
awkward to avoid for 1/k mod q (patches are welcome of course :-)

- xmath

Kristian Gjųsteen

unread,

Aug 25, 2006, 3:12:11 AM8/25/06

to

xmath <xmath...@gmail.com> wrote:
>Only generating a key-pair for signing remains vulnerable, but it's
>awkward to avoid for 1/k mod q (patches are welcome of course :-)

You can always to k^(q-2) mod q, of course, which is fairly cheap since
q is small, but there are probably better methods...

--
Kristian Gjųsteen

xmath

unread,

Aug 25, 2006, 4:43:02 AM8/25/06

to

Kristian Gjøsteen wrote:
> You can always to k^(q-2) mod q, of course, which is fairly cheap since
> q is small

Yea, I've been considering using that. I already do for p, but because
p has a nice form the (hardcoded) square-multiply chain I got from
djb's curve25519 is quite short and simple too. I suppose for q I
could do the top 128 bits with a square-loop (it's just 2^252) and then
do the bottom half (a random-looking value between 2^214 and 2^125)
with a generic square-multiply loop.

> but there are probably better methods...

I'd love to hear about them. Haven't found any that have timing
independent of k.

- xmath

David Wagner

unread,

Aug 25, 2006, 5:14:16 PM8/25/06

to

xmath wrote:
>Only generating a key-pair for signing remains vulnerable, but it's
>awkward to avoid for 1/k mod q (patches are welcome of course :-)

Have you considered techniques like those described in the paper below to
make that part of your code into straight-line code with no data-dependent
branches (or, at least, none that depend upon any secrets)? There are
simple tricks known for eliminating such branches, though they typically
introduce a non-trivial performance overhead.

David Molnar, Matt Piotrowski, David Schultz, and David Wagner.
"The Program Counter Security Model: Automatic Detection and Removal
of Control-Flow Side Channel Attacks", ICISC 2005.
http://www.cs.berkeley.edu/~daw/papers/pcmodel-long.pdf

If you care to post just the minimal segment of the code that you're
worried about, I might be able to suggest some ways to eliminate
data-dependent branches.

xmath

unread,

Aug 25, 2006, 8:54:41 PM8/25/06

to

David Wagner wrote:
> xmath wrote:
> >Only generating a key-pair for signing remains vulnerable, but it's
> >awkward to avoid for 1/k mod q (patches are welcome of course :-)
>
> Have you considered techniques like those described in the paper below to
> make that part of your code into straight-line code with no data-dependent
> branches (or, at least, none that depend upon any secrets)?

Yes, I know the basic principle of branch-free code. The mod-p math and
most of the mod-q math already has fixed timing. (here p is the field
prime, and q the EC group order)

Only 1/k mod q remained because I still used egcd for division which
has rather intrinsic data-dependent timing.

Because this only affected keygen for signature keys, and no other
operation, it didn't seem high-priority to me to fix this. It's still
nagging me though, so I'm working on proper replacements for the mod-q
math (the current version is inefficient anyway), and will use a simple
k^(q-2) for invert, since I know of no better way.

- xmath

David Wagner

unread,

Aug 25, 2006, 9:31:25 PM8/25/06

to

xmath wrote:
>Only 1/k mod q remained because I still used egcd for division which
>has rather intrinsic data-dependent timing.

Out of curiousity, why is it intrinsic? Our paper described some
general techniques for transforming code to have data-independent
timing. If this is an example where those techniques fail, I would
be interested to learn -- but I don't see any reason why they would
fail to apply. Did you take a look at the paper?

It seems to me -- without having given this any thought at all -- that
it should be pretty straightforward to apply our techniques to the
extended Euclidean algorithm. egcd is basically just a loop, whose
maximum number of iterations is known in advance (and is not dependent
on any secrets), and whose body I would expect can be rendered to have
data-independent timing. But quite possibly I'm missing something and
very shortly will be eating my words. :-)

>Because this only affected keygen for signature keys, and no other
>operation, it didn't seem high-priority to me to fix this.

Ok.

>will use a simple k^(q-2) for invert, since I know of no better way.

Raising to the (q-2)-th power may be more efficient than the general
techniques described in our paper; I don't know.

Mike Scott

unread,

Aug 26, 2006, 5:55:37 AM8/26/06

to

The same idea seems to be mentioned here

http://www.iacr.org/conferences/crypto2006/rumpsched.html

Mike

xmath

unread,

Aug 26, 2006, 6:57:33 AM8/26/06

to

David Wagner wrote:
> Out of curiousity, why is it intrinsic? Our paper described some
> general techniques for transforming code to have data-independent
> timing. If this is an example where those techniques fail, I would
> be interested to learn -- but I don't see any reason why they would
> fail to apply. Did you take a look at the paper?

I browsed through it, didn't read it cover to cover.

> It seems to me -- without having given this any thought at all -- that
> it should be pretty straightforward to apply our techniques to the
> extended Euclidean algorithm. egcd is basically just a loop, whose
> maximum number of iterations is known in advance (and is not dependent
> on any secrets), and whose body I would expect can be rendered to have
> data-independent timing. But quite possibly I'm missing something and
> very shortly will be eating my words. :-)

The code is near the top, divmod and egcd32:
http://cds.xs4all.nl:8081/ecdh/curve25519_i64/curve25519_i64.c

The current issues are: divmod needs to know the highest non-zero byte
of the divisor, and currently simply requires the size of the divisor
to be specified such that the highest byte is non-zero. It also has
timing dependent on this. Egcd is variable in number of iterations.

Now, finding the highest non-zero byte certainly can be done
branch-free. Making the main loop have a fixed number of inner
iterations would be trickier, but can probably be done by appropriate
masking of bytes that have been read out of bounds (I already do this
for r[-1] and d[-1]). As you said, egcd has a known maximum number of
iterations, however if you want to keep running past this you certainly
have to be careful with how to deal with division by zero, which even
with the above modifications, divmod would choke on.

Altogether, I suppose it is possible, but it would get a bit messy.

> >will use a simple k^(q-2) for invert, since I know of no better way.
>
> Raising to the (q-2)-th power may be more efficient than the general
> techniques described in our paper; I don't know.

I'm wouldn't be surprised if it is. At least it'll be much simpler :-)

BTW, this is also perhaps a good moment to mention something I find out
when looking through some of gcc's output: on PowerPC, at least the
gcc I got here produces branching code-sequences for x > 0 and x <= 0
for signed x, when using the result numerically. Same for the
equivalent x >= 1 and x < 1. It seems it produces branch-free
sequences for all other comparisons of word-size arguments, signed or
unsigned, constant or variable.

gcc version 4.0.0 (Apple Computer, Inc. build 5026)

- xmath

Tom St Denis

unread,

Aug 26, 2006, 8:27:06 AM8/26/06

to

xmath wrote:
> BTW, this is also perhaps a good moment to mention something I find out
> when looking through some of gcc's output: on PowerPC, at least the
> gcc I got here produces branching code-sequences for x > 0 and x <= 0
> for signed x, when using the result numerically. Same for the
> equivalent x >= 1 and x < 1. It seems it produces branch-free
> sequences for all other comparisons of word-size arguments, signed or
> unsigned, constant or variable.
>
> gcc version 4.0.0 (Apple Computer, Inc. build 5026)

Part of my job involves hacking large code like DB2 and I've seen GCC
4.1.0 use the CMOV* series of opcodes [whereas ICC v9 doesn't hahahaha]
to avoid branches.

Like I've always said, GCC is smart, it knows things.

:-)

Tom

David Wagner

unread,

Aug 26, 2006, 10:51:09 PM8/26/06

to

xmath wrote:
>Altogether, I suppose it is possible, but it would get a bit messy.

Ok. Yup, that sounds right to me: possible, but thoroughly messy.

>BTW, this is also perhaps a good moment to mention something I find out
>when looking through some of gcc's output: on PowerPC, at least the
>gcc I got here produces branching code-sequences for x > 0 and x <= 0
>for signed x, when using the result numerically. Same for the
>equivalent x >= 1 and x < 1. It seems it produces branch-free
>sequences for all other comparisons of word-size arguments, signed or
>unsigned, constant or variable.

Neat, I didn't know that! Thanks.