D. J. Bernsteinunread,
Dec 16, 2018, 7:16:20 AM12/16/18
to pqc-...@list.nist.gov, pqc-co...@nist.gov
There's a big ongoing project to optimize constant-time variants of
Euclid's algorithm. This is a cross-cutting project with applications to
quite a few post-quantum submissions (e.g., optimizing constant-time
half-gcd computation inside Goppa/BCH decoding) and to other pre-quantum
and post-quantum primitives (e.g., optimizing constant-time inversion
for Curve25519 and CSIDH). People working on various aspects of this, in
alphabetical order: Daniel J. Bernstein, Ming-Shing Chen, Wen-Ding Li,
Yi-Jen Lin, Hsuan-Chu Liu, Chia-Chi Lu, and Bo-Yin Yang.
Some of the optimizations have now been implemented in the context of
modular inversions inside sntrup4591761 key generation:
* The previous key-generation software took 6 million cycles on a
Haswell CPU core, the usual optimization target.
* The new software takes just 980000 cycles on the same CPU core.
The new software has been contributed to the latest SUPERCOP release for
public verification of speed. This justifies the claim in the NTRU Prime
submission that most of the key-generation cycles can be eliminated.
(The submission also points out other reasons to not worry about the
performance of Streamlined NTRU Prime key generation: an application
generating a sequence of many keys can replace each inversion with three
multiplications; IND-CCA2 allows each key to be reused many times;
forward secrecy does not require constant generation of new keys. See
also Google's new NTRU-HRSS experiment.)
I expect further speedups in Streamlined NTRU Prime and in various other
submissions relying on Euclid-type algorithms. For example, it should be
very easy to adapt the inversion-mod-3 part of the new software to save
60000 cycles in NTRU-HRSS key generation.