AES register saving tricks

Skip to first unread message

Wei Dai

Apr 6, 2009, 8:02:58 AM4/6/09
The straightforward way of computing the AES round function requires 8
32-bit registers, but the x86 architecture provides only 7 useable ones. I
noticed a neat trick that Brian Gladman used in his AES x86 assembly code to
avoid spilling a register, and have an improvement upon it of my own.

The basic problem is, for each of the four 32-bit register representing the
current AES state, we have to extract four bytes, send them to
8-bit-to-32-bit S-boxes, then MOV or XOR the S-box outputs into four
different registers representing the next round's state. It seems that after
we're done with the first register representing the current state, we'd have
to use 3 registers for the current state, 4 for the next, and 1 more as a
scratch register for byte extraction, so 8 appears necessary.

Gladman's trick is to process parts of two registers representing the
current state, then combine the remaining parts into one register with a
rotate, mask and OR. This saves a register because the outputs of the first
4 S-box lookups are now stored into only 3 output registers (with one XOR
being done) instead of 4 registers. My improvement is to combine the parts
with a single 8-bit register move, like "mov al, cl", instead of 3
operations, thus saving 2 instructions per round.

(If anyone looks at my code, it actually uses an MMX register as well,
because for -fPIC compatibility, one general purpose register has to be used
to point to the S-boxes.)

Another register saving trick I used is to copy round keys to the stack, and
then use the ESP register as a loop counter. This avoids having to fully
unroll the loops, without incurring additional memory accesses or costing
another register for the loop counter.

Reply all
Reply to author
0 new messages