Exaggerating the impracticality of post-quantum crypto

D. J. Bernstein

unread,

Mar 2, 2018, 8:36:03 PM3/2/18

to pqc-...@list.nist.gov

There are now quite a few tables and graphs running around that are
providing information about the speed and security of submissions to
NIST's post-quantum project. Unfortunately, it seems to me that most of
this information is corrupted by the following types of inaccuracy:

* The CPU time is seriously overstated.
* The cost of known attacks is seriously understated.

The same tables and graphs do manage to convey that changing parameters
produces larger attack costs, but obviously this is at the expense of
even more CPU time.

Here are ten examples to illustrate how much the CPU time is being
overstated. These are actual cycle counts that I see with gcc-5.4 -O3
-march=native -mtune=native -fomit-frame-pointer -fwrapv on a Haswell
CPU core, vs. the reported cycle counts:

* 38964 vs. 3738930 (>95x slower): ntruhrss701 encryption
* 44560 vs. 389154 (>8.7x slower): rainbow1a verification
* 57840 vs. 22761101 (>393x slower): sntrup4591761 encryption
* 92844 vs. 45677602 (>491x slower): ntrulpr4591761 encryption
* 102660 vs. 611052 (>5.9x slower): lightsaber encryption
* 143448 vs. 291116 (>2.0x slower): dilithium2 verification
* 157220 vs. 1366784 (>8.6x slower): kyber1024 encryption
* 202452 vs. 1528702 (>7.5x slower): newhope1024cca encryption
* 262664 vs. 11755199 (>44x slower): mceliece8192128 encryption
* 3688044 vs. 185066255 (>50x slower): mqdss48 verification

There are more examples like this. Presumably these come from NIST
reporting timings of slow portable C code (often reference code
optimized for readability) even when much faster code is available.

There are other cases with smaller gaps, but my preliminary assessment
is that this comes primarily from programmers (and compilers) not being
familiar with the relevant optimization techniques. Reporting the
timings is justifiable if no faster code is available---and gives teams
an incentive to find implementors who can do better---but it's important
to keep in mind that most teams will need some time to do this. NIST
seemed to be recognizing this last year, writing things like

NIST expects that as the evaluation process moves beyond the first
round, we will see the wider cryptographic community (in particular,
those skilled in platform-specific optimizations) provide optimized
implementations for most submissions for a wide variety of platforms

and

It is important to note that performance considerations will not play
a major role in the early portion of the evaluation process

so it's weird to see NIST publishing timings early in the first round.

My main point here is that the timings that have been reported,
tabulated, and graphed consist primarily of random noise. These timings
have very little predictive power regarding the (generally much smaller)
timings that users will end up with.

As for security, the numbers being tossed around lack justification. A
month ago (email to the list dated 2 Feb 2018 14:01:43 -0000) I asked
various questions that remain unanswered regarding these numbers. In
general the numbers appear to be massive underestimates of the costs of
known attacks, and one would expect the people pushing such numbers to
be able to answer questions regarding what appear to be serious errors.

Occasionally I've heard people try to describe such underestimates as
"protection against future attacks". However:

* The call for submissions specifically required analyses of "known
attacks". Saying that one should _also_ consider possible future
attacks is not an excuse for giving users inaccurate information
about the performance of known attacks.

* Practically all of these underestimates are actually produced as
simplifications---the producers think it's too hard to come up with
something more accurate. This superficial approach is disconnected
from what matters for predicting future attack risks: understanding
which attack avenues have not yet been adequately explored.

* It's already standard for cryptographic recommendations to be based
on two separate things: analyses of known attacks, and concerns
about future attacks. For example, NIST's SHA-3 report expressed
concern that finalist JH had "received much less analysis than the
other finalists"---but did it use this concern as an excuse for
understating the costs of known attacks against JH? Of course not!

It's one thing to consider the risk of an unknown attack succeeding in
time T. It's quite a different thing to make users believe, incorrectly,
that an attack is _already known_ that succeeds in time T.

All of these errors---the exaggerations of how expensive post-quantum
crypto is, and the exaggerations of the power of known attacks---give
implementors an inflated idea of the cost of switching to post-quantum
crypto. This delays deployment of post-quantum crypto (for applications
that would otherwise be agile enough to deploy before standardization),
helps attackers, and helps promote snake-oil "crypto" such as QKD. Of
course these problems occur to some extent even when people are given
accurate information, but this doesn't excuse the inaccuracies.

---Dan

Mike Hamburg

unread,

Mar 2, 2018, 10:25:30 PM3/2/18

to D. J. Bernstein, pqc-...@list.nist.gov

These are good points, Dan. My graphs page defaults to showing bytes vs core-sieve precisely because the preliminary performance numbers are untrustworthy. In consideration of your note about performance, I’ve also now altered it to have a big warning on that axis setting.

As for security, I agree that understating the cost of attacks is a problem. In fact, it’s quite possible that many of the lattice crypto submissions are overspecced, including my own submission.

I’m not as concerned as you are about security estimates hindering adoption, though. The unclear security estimates are mostly a problem with lattice systems. And as you so convincingly argued last week, it’s probably best if these at least go through the standardization process before we start rolling those out.

— Mike

> --
> You received this message because you are subscribed to the Google Groups "pqc-forum" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
> Visit this group at https://groups.google.com/a/list.nist.gov/group/pqc-forum/.

Alperin-Sheriff, Jacob (Fed)

unread,

Mar 5, 2018, 3:23:21 PM3/5/18

to pqc-...@list.nist.gov

Dan,

NIST eagerly awaits the release of 3rd party benchmarking results of the submissions, including but not limited to eBACS.
As I stated when I posted the results, they are for the mandatory optimized implementations. Given that many submissions still in play did not provide optional additional implementations utilizing assembly/vector instructions, this is the only way to do an apples-to-apples comparison of all submissions at the present time. We recognize and hope other stakeholders recognize that these implementations are likely to be significantly slower than more practical platform-optimized implementations.

On this note, we hope to see new implementations by members of the community using assembly/vector instructions of those submitted algorithms for which we did not receive any as part of the official submissions (and to the extent anyone feels necessary, improved implementations of the rest that did submit additional implementations in their official submissions), so that we can get better ideas of how these algorithms are likely to perform in practice.

Just for reference, of the 65 schemes that have not officially withdrawn yet, 38 submissions did not contain additional implementations using assembly/vector instructions (i.e. that are likely faster). They are BIG QUAKE, CFPKM, Compact LWE, DAGS, Ding Key Exchange, DME, DRS, EMBLEM and R.EMBLEM, FALCON, Giophantus, Guess Again, HiMQ-3, HQC, KCL, LAKE, Lepton, Lima, LOCKER, LUOV, McNie, Mersenne-756839, NTRUEncrypt, Odd Manhattan, Ouroboros-R, Post-Quantum RSA-Encryption, Post-quantum RSA-Signature, pqNTRUSign, pqSigRM, QC-MDPC KEM, qTESLA, RaCoSS, Ramstake, RankSign, RLCE-KEM, Round2, RQC, SABER, WalnutDSA.

That said, we do also plan to release a preliminary baseline for the additional implementations we have already received in the not-too-distant future (i.e. before the workshop). We also reiterate that it is important to note that performance considerations will not play a major role in the early portion of the evaluation process.

D. J. Bernstein

unread,

Mar 7, 2018, 4:57:46 AM3/7/18

to pqc-...@list.nist.gov

Alperin-Sheriff, Jacob (Fed) writes:
> Just for reference, of the 65 schemes that have not officially
> withdrawn yet, 38 submissions did not contain additional
> implementations using assembly/vector instructions (i.e. that are
> likely faster).

I don't see enough information to reproduce the object code that NIST
benchmarked, but it seems quite likely that _all_ of the submissions
used vectorization and/or hand-optimized assembly and/or hand-optimized
intrinsics in this object code.

How is this possible, given NIST's requirement for the source code to be
portable ANSI C without such instructions? Common answers include

* vectorization generated by compilers and
* hand-written assembly and/or intrinsics hidden inside preexisting
libraries that were exempted from the requirement.

For example, NIST allowed submissions to use the GMP library. Checking
for gmp.h suggests that the library is used in 8 of the 38 submissions
you mentioned: Compact LWE, Mersenne-756839, Odd Manhattan, pqRSA (which
NIST counts as two submissions), Ramstake, RankSign, and RQC. GMP, in
turn, uses platform-specific optimizations, which presumably affected
the numbers in NIST's benchmarks. Whether or not the statement about
"additional implementations" is correct, it's missing the point.

Is NIST going to replace GMP with an unoptimized big-integer library? Or
is NIST going to continue giving these submissions a speed boost through
GMP's use of platform-specific optimizations, while prohibiting such
optimizations in the subroutines used in most submissions? Of course the
marketing department wants to say that NIST's measurements are fair and
balanced, but looking at real examples such as GMP shows that these
measurements were collected under incoherent ground rules.

The situation becomes even worse when one looks at vectorization added
by gcc. Spot checks suggest that _every_ submission ends up using _some_
vector instructions---but at a small and apparently random fraction of
the locations where vectorization is possible. For lucky submissions,
this fraction includes critical inner loops, producing a big speed
boost, although there's generally also much more room for improvement.

Of course it's possible to study gcc in detail and figure out structure
in what's going on---and search for modifications to C code that make
gcc produce better results without compromising some theoretical notion
of portability. I used to do this sort of thing many years ago but then
switched to code-generation tools that achieve better performance with
much less programming time: intrinsics, assembly, etc.

The big picture is that CPUs are evolving farther and farther away from
the naive machine model expressed by the textbook-level subset of C, and
compilers are having more and more trouble closing this gap. This is why
it's less and less common for real-world performance-critical code to be
written in this subset. Real-world portability is achieved not by
banning instructions, but by having enough code alternatives to cover
all target processors.

> apples-to-apples comparison

What we care about is the time that the crypto needs. What NIST has
actually measured is something that's generally _much_ larger:

time that the crypto needs
+ time lost to deficiencies in this unrealistic language subset
+ time lost to deficiencies in compilers for this language
+ time lost depending on how implementations are written (which could
be blamed on the implementor, but this presumes that the artificial
setup is something implementors should be spending time on).

As I said before, this is primarily random noise, and has very little
predictive power regarding the timings that users will end up with.

> mandatory optimized implementations

NIST's use of the word "optimized" here is contradicted by NIST's
restrictions on the language---restrictions that make a huge difference
in performance on the reference platform. In response to objections from
various people, NIST stated the following on pqc-forum:

When we evaluate performance, we’ll be interested in the performance
of the best-available implementations on a particular platform.

Reporting the performance of the best implementations available gives
teams a healthy incentive to produce software showing the performance
that the crypto is really capable of. This requires platform-specific
optimizations, and the majority of submissions already provided _some_
platform-specific optimizations in "additional implementations" or via
preexisting libraries.

Unfortunately, what NIST actually measured is something that's generally
much slower, namely performance of implementations under artificial
restrictions. This replaces the healthy incentive with an unhealthy
incentive to produce faster implementations under these restrictions.
These are generally much slower than unrestricted implementations
(unless all the critical code is in libraries exempted from the
restrictions); these aren't what users who care about performance will
use; but these still consume software-engineering resources and
verification resources.

Mike Hamburg writes:
> The unclear security estimates are mostly a problem with lattice
> systems. And as you so convincingly argued last week, it’s probably
> best if these at least go through the standardization process before
> we start rolling those out.

You're attributing to me statements that I never made. Pointing out
security risks---and pointing out false statements regarding those
risks---is very different from endorsing any particular strategy for
handling those risks. Furthermore, in the message you're replying to, I
mentioned applications that are "agile enough to deploy before
standardization", and I objected to misinformation delaying deployment
of post-quantum crypto in those applications.

For the record, I was happy to see Google rolling out a cryptosystem in
the NTRU family, at least _trying_ to protect user data against future
quantum computers. It seems entirely possible that

* quantum computation will eventually be cheap enough for wide-scale
attacks by various countries that have been recording data but
* this crypto is unbreakable even by such adversaries,

and in this scenario there's a tremendous societal benefit to what
Google started doing.

Recognizing _risks_ of breaks in the crypto, and taking sensible
risk-management steps, isn't the same as saying that there _are_ breaks.
It also isn't the same as saying that Google should have delayed. On the
contrary: the benefits were potentially massive, and Google could afford
the engineering costs. Rumors indicate that Google was then threatened
with a patent lawsuit, but that's a different issue.

---Dan

Alperin-Sheriff, Jacob (Fed)

unread,

Mar 7, 2018, 12:33:24 PM3/7/18

to D. J. Bernstein, pqc-...@list.nist.gov

Dan,

We covered all this many many months ago in Question 3 of the FAQ (which was written and modified pursuant to feedback we received between the release of the CFP and the submission deadline). https://csrc.nist.gov/Projects/Post-Quantum-Cryptography/faqs ...

Reply all

Reply to author

Forward