There are many submitted signature systems and encryption systems ready
to drop into SUPERCOP for benchmarking---most obviously, all 77 systems
included in libpqcrypto; some KEMs (e.g., mceliece8192128) are already
included in SUPERCOP. This raises questions of the best schedule for
* posting the KEM result pages,
* including all the libpqcrypto systems in SUPERCOP, and
* similarly getting other submissions ready for benchmarking. (See
https://libpqcrypto.org/changes.html to get an idea of what this
means---the "Namespacing" part is mainly for production use but
"Randomness", "Compilation", etc. are important for benchmarking.)
So far we've been holding back on all of this, for the following
reasons:
* Putting early emphasis on CPU time runs a serious risk of
distracting the community from things that are more important.
Obviously security is job #1. Also, for most of the submissions,
cost in typical applications will be dominated by key size,
ciphertext size, etc., not by CPU time.
* Having a flood of new primitives at the end of 2017 created a
temporary problem for the community to allocate enough human
resources for serious CPU-time optimization---even if we focus on
just Haswell. Early measurements of CPU time will often be heavily
influenced by this, and this is arguably unfair to teams that rely
on outside assistance for optimization.
* Regarding other submissions: Almost all of the 2330 implementations
benchmarked in SUPERCOP were submitted by the implementors to
SUPERCOP, with a few exceptions for wrappers around commonly
available libraries. My understanding is that the NIST submission
rules give us permission to modify and redistribute code for
benchmarking, but, realistically, the implementors are in the best
position to check that they're getting the expected results and to
be in control of any necessary updates.
* NIST has stated that "performance considerations will not play a
major role in the early portion of the evaluation process."
However, there are also several arguments for already collecting data
during the first round:
* There are already a bunch of reports, tables, and graphs running
around that claim to be performance data but that actually consist
primarily of random noise---e.g., timing reference code optimized
for readability, ignoring much faster implementations; or timing
very slow RNGs (see
https://blog.cr.yp.to/20170719-pqbench.html,
https://blog.cr.yp.to/20170723-random.html, and Saarinen's recent
email).
* Sometimes users really do care about CPU time, and they shouldn't
be hearing that post-quantum crypto is less practical than it
actually is. See my email dated 3 Mar 2018 01:35:50 -0000.
* Many teams have already completed serious optimizations, and should
have the opportunity to point to public results from an established
neutral benchmarking system as certifying the resulting speeds.
* As far as I can tell, NIST hasn't stated that CPU time will be
_ignored_ during the first round. There are at least two reasons to
believe the opposite: first, NIST has already posted some time
information; second, NIST has specifically asked "What constitutes
unacceptable key sizes or performance?"
* At some point the community is going to have to make a transition
from "Teams haven't had enough time for optimization" to "Systems
that don't have fast Haswell code are presumed to be incapable of
running fast on Haswell." The end of the first round will be at
least a year after submission, and it's certainly not clear that
this _isn't_ enough time for Haswell optimization of everything.
It seems that there are also some first-round efforts in liboqs and
pqcbench, but I'm not sure what their policy rationales are.
Here's a strawman proposal for the way forward:
* In July we'll start a big benchmarking run that includes the
libpqcrypto systems and any other working software that the
implementors actively submit to SUPERCOP by the end of June.
(Regarding "working": see
https://bench.cr.yp.to/tips.html.)
* After that we'll post results as usual, including KEM results. Of
course the results on smaller CPU cores than Haswell won't mean
much, since Haswell has been the main optimization focus so far.
* Around September we'll start adding other submissions, with the
goal of having all submissions benchmarked by 30 November.
Feedback and suggestions are welcome, especially regarding how the above
issues (or any other issues missing from the list) should be weighed.
---Dan