GCM with 2KB/key tables

Skip to first unread message

Wei Dai

Apr 2, 2009, 5:27:24 PM4/2/09
to crypto-op...@googlegroups.com, Jeffrey Walton
This note describes a technique for computing GCM using 2KB of memory for
each key, which I haven't seen before in the literature or in published
code. The source code is available in C++ and SSE2 inline assembly, at
line 253. Speed seems to be within 1 cpb of the "simple, 4-bit tables
method" that uses 8KB of memory for each key, and within a factor of 2
(counting just the GMAC part of GCM) of the "simple, 8-bit tables method"
that uses 64KB of memory for each key.

In the "simple, 4-bit tables method", the following algorithm is used (see
page 11 of http://www.cryptobarn.com/papers/gcm-spec.pdf for notation) to
compute X * H, with x_i being a 4-bit piece of X, and x_i * H * P^4i being
done with a table lookup using x_i as the index. It needs 32 tables since
there are 32 possibilities for P^4i.

Z = 0
for i = 0 to 31
Z = Z + x_i * H * P^4i
end for
return Z

My algorithm is this, with x_i the same as above.

Z_0 = Z_1 = Z_2 = Z_3 = 0
for i = 0 to 3
for j = 0 to 3
Z_j = Z_j + x_(32i+8j) * H * P^32i
Z_j = Z_j + x_(32i+8j+4) * H * P^(32i+4)
end for
end for
return ((((Z_3 * P^8) + Z_2) * P^8) + Z_1) * P^8 + Z_0

There are four P^32i and four P^(32i+4), so 8 tables are needed per key. The
last step is done using a shared table of 512 bytes, similar to Shoup's

Reply all
Reply to author
0 new messages