First SUPERCOP results for NISTPQC

387 views

Skip to first unread message

D. J. Bernstein

unread,

Oct 5, 2018, 11:32:50 AM10/5/18

to pqc-...@list.nist.gov

Publicly verifiable SUPERCOP benchmarks are now available for many of
the NIST submissions. The benchmarks show that these submissions are
faster---in some cases _much_ faster---than previous benchmark tables
indicate. For example, here are median cycle counts for the measured
KEMs targeting IND-CCA2 security on "titan0" (Intel Haswell):

* Key generation: 44056 cycles for ntrulpr4591761, 55148 cycles for
kyber512, 61940 cycles for babybear, etc.

For comparison, NIST reported (also on Haswell, I believe) 9579562
cycles for ntrulpr4591761, 440519 cycles for kyber512, and 66197
cycles for babybear. I've also tried the current pqcbench, which
reports 11297244, 160387, 70805; and liboqs, which reports 149392
for kyber512.

* Encapsulation: 34944 cycles for ntruhrss701, 45208 cycles for
sntrup4591761, 79736 cycles for ntrulpr4591761, etc.

For comparison, NIST reported 3738930, 22761101, 45677602. pqcbench
reports 1270827, 11185643, 22241114.

* Decapsulation: 64712 cycles for ntruhrss701, 79536 cycles for
kyber512, 94532 cycles for sntrup4591761, etc.

For comparison, NIST reported 10952767, 783765, 59551582. pqcbench
reports 3707602, 271818, 30596880. liboqs reports 250322 for
kyber512.

Full tables of supercop-20180818 results on this machine appear here:

https://bench.cr.yp.to/results-kem.html#amd64-titan0
https://bench.cr.yp.to/results-sign.html#amd64-titan0

Click on the machine name (blue "titan0") to see a list showing which
implementations and compiler options were selected for each primitive.
(For verification, anyone can then do a fast SUPERCOP run with that
compiler on another Haswell.) Click on a primitive name in that list to
see slowdowns, warnings, and errors for each implementation and each
compiler option.

There are also supercop-20180818 results for some other machines (many
contributed by Romain Dolbeau), but for the moment I recommend focusing
on the titan0 results because (1) Haswell is the single most common
optimization target and (2) titan0 has a typical selection of compilers
(whereas genji460 is also a Haswell but uses icc, typically producing
worse results).

Some important caveats:

* Security is job #1. One aspect of security is avoiding timing
attacks, but many implementations don't run in constant time. The
slowdown for constant-time code depends on the primitive.

* For most of the submissions, cost in typical applications is
dominated by key size, ciphertext size, etc., not by CPU time.

* The community has done serious Haswell optimization of _some_
primitives but certainly not all---never mind all the other CPU
microarchitectures of interest. Subsequent speedups will vary from
one primitive to another.

* There may be cases where existing software achieves speeds not
reflected in the current reports. Submitters should look for any
unexpected slowdowns and send updated software to the ebats mailing
list. If the software is already there but doesn't work, please
check the online error reports to see what went wrong.

* Only about 40% of the unbroken submissions are covered so far. I've
started tackling the unholy mess of further software. Warning to
potential users: NISTPQC, despite being an important and timely
project, has produced the largest regression _ever_ in the quality
of cryptographic software. This will not be easy to fix.

* Some submissions are moving targets, with new parameters and larger
redesigns already happening during the first round. Obviously the
stable submissions are higher priorities for benchmarking.

I'd suggest followup discussions (1) on the ebats mailing list for
detailed work on getting more primitives and more implementations
benchmarked, and (2) on pqc-forum for the big-picture interactions
between benchmarking and the competition.

I'm currently planning to start another big benchmarking run around the
beginning of November. Given code-quality problems, I don't think I'll
get through all of the remaining submissions by then, and the selection
is going to depend on my assessment of what's most productive. Teams
working on integrating their own code into SUPERCOP (for the procedure
see https://bench.cr.yp.to/tips.html) should let me know, or speak up on
the ebats mailing list.

---Dan

signature.asc

D. J. Bernstein

unread,

Nov 13, 2018, 10:55:40 AM11/13/18

to pqc-...@list.nist.gov

A month ago I wrote:
> Given code-quality problems, I don't think I'll get through all of the
> remaining submissions by then, and the selection is going to depend on
> my assessment of what's most productive.

I've just released supercop-20181113, which includes 315 implementations
of 159 primitives (84 kem, 16 encrypt, 59 sign) from the following 37
submissions (see below for discussion of what's missing here):

BIG QUAKE, Classic McEliece, CRYSTALS-DILITHIUM, CRYSTALS-KYBER,
DAGS, FrodoKEM, GeMSS, Gravity-SPHINCS, Gui, HILA5, KINDI, LAC, LAKE,
LEDAkem, LIMA, LOCKER, LOTUS, LUOV, McNie, Mersenne-756839, MQDSS,
NewHope, NTRUEncrypt, NTRU-HRSS-KEM, NTRU Prime, NTS-KEM,
Odd Manhattan, Picnic, pqRSA, qTESLA, Rainbow, Ramstake, SABER, SIKE,
SPHINCS+, Three Bears, Titanium.

Regarding more recent _primitives_, NIST has stated that "the selection
of candidates for the second round will primarily be based on the
original submissions", so I've focused on the original primitives. When
I've changed implementations to fit them into SUPERCOP, I've tried to
avoid making any software changes that risk creating new deviations from
the specifications. (However, it's possible that I've made mistakes
here, and in general there's very little assurance that any of the
implementations match the specifications. Perhaps NIST should request
Sage implementations for the second round.)

Regarding faster _software_ for the original primitives, NIST has stated
that "we'd really like to see more platform-optimized implementations".
Let me repeat an important caveat that I wrote before:

The community has done serious Haswell optimization of _some_
primitives but certainly not all---never mind all the other CPU
microarchitectures of interest. Subsequent speedups will vary from
one primitive to another.

Even when the work _has_ been done, there's no guarantee that I've seen
the latest implementations, although I did put some effort into looking
around online for faster software published post-submission. It's up to
submission teams to check that they end up with the expected speeds, and
to contribute implementations to SUPERCOP (producing the same SUPERCOP
checksums!) demonstrating any claims of better speeds.

I also tried integrating the following submissions but ran into problems
that I think need to be handled by the submission teams:

* BIKE: Inconsistent checksums---the outputs depend on something
other than randombytes() and the official inputs.
* DualModeMS: Crashed into the SUPERCOP per-benchmark time limit,
even with the fastest implementation (sse2).
* HiMQ-3: SUPERCOP says "crypto_sign_keypair writes after output".
* LEDApkc: crypto_encrypt() allows only very short message lengths.
* LIMA: crypto_encrypt() allows only very short message lengths.
* NTRUEncrypt: 1024 still doesn't seem to work even after the
previously announced fixes.

LIMA has a separate KEM that I integrated, and I integrated 443 and 743
for NTRUEncrypt, but the other 4 submissions aren't included at all.
Regarding PKEs limited to very short message lengths, see Section 2.1.2
of https://eprint.iacr.org/2001/112.pdf, where Shoup explains the
importance of handling longer messages:

Some currently available public-key encryption schemes, like
RSA-OAEP, only allow for very short messages, and leave it to
application engineers to design their own “hybrid” scheme to encrypt
long messages (i.e., by encrypting a session key and then using
symmetric-key cryptography to encrypt the payload).

However, it seems unrealistic to expect application engineers to
correctly design such a secure hybrid scheme.

Of course many primitives use short fixed-length PKEs internally, but
these PKEs are normally packaged and benchmarked as KEMs. SUPERCOP's
crypto_encrypt() is a higher-level API.

I skipped 1 additional submission (EMBLEM and R.EMBLEM) because the code
organization was sufficiently out of whack with the NIST requirements to
make me think that I should study the documentation. I don't think I'll
need to wait for the submission team here.

I skipped the following 7 submissions because my current understanding
(please correct me if I'm wrong!) is that the security level of the
submitted parameters against known attacks is not high enough for
benchmarks of those parameters to be useful:

CFPKM
DRS
Giophantus
Guess Again
Lepton
pqsigRM
RaCoSS

SUPERCOP doesn't have a policy against benchmarking low-security
primitives (e.g., it benchmarks MD5), but of course security is a factor
to consider in deciding how to allocate software effort.

Finally, I didn't even bother looking at the following 14 submissions
where the design teams claim patents on the submissions: Compact LWE,
Ding Key Exchange, DME, FALCON, HQC, KCL (OKCN/AKCN/CNKE), Lizard,
Ouroborus-R, pqNTRUSign, QC-MPDC KEM, RLCE, Round2, RQC, WalnutDSA. Some
comments on this:

* The primary goal of benchmarking is to predict the performance that
users will see. Patented cryptographic primitives generally have
far fewer users than unpatented cryptographic primitives, and are
correspondingly much less important as benchmarking targets.

* SUPERCOP doesn't have a policy against accepting code from patent
holders, but none of these teams have sent submissions to SUPERCOP.

* Any particular submission is unlikely to be standardized, so a
statement releasing or partially releasing a patent in case of
standardization is unlikely to have any effect.

For similar reasons, patented cryptographic primitives are less
important than unpatented cryptographic primitives as targets of
security analysis. This fact reduces the amount of security analysis
that the patented primitives receive, so security problems in the
patented primitives are more likely to persist without being detected.

I'm not saying that these are the only submissions covered by patents.
I've been working on an analysis of what various patents cover, but this
isn't ready yet.

---Dan