Many more machines have supercop-20191221 measurements now, thanks
especially to a ton of student time getting computers purchased and
installed and getting SUPERCOP to run in some tricky environments.
The obvious caveat for _users_ looking at these results is that most
NISTPQC submissions have optimized code for only one large CPU at this
point, namely Haswell. On other CPUs, the speeds are often vastly slower
than what's possible, with slowdowns that vary from one submission to
another. For example, there are a billion Cortex-A7 devices (sold, I
mean; SUPERCOP has 1) that often run much faster with NEON-vectorized
code. Compilers cannot be assumed to do a good job of vectorization.
The data sets should be useful for _implementors_ planning optimization
work. Submission teams that want to go beyond NIST's highlighted CPUs
(Haswell and Cortex-M4) should be able to report, e.g., the Cortex-A7
speeds achieved---while refraining from comparing these to unoptimized
Cortex-A7 speeds of other submissions! (To avoid any accusations of bias
in supporting this option, I'll avoid advertising speed results on
non-NIST-highlighted CPUs for submissions I'm involved in.) Hopefully
any code updates submitted by 31 March will be reflected in SUPERCOP
results by NIST's mid-April target date for round-2 input, although a
Raspberry Pi 2 might take longer than a Haswell. Of course, there will
also be followup SUPERCOP runs for code sent in later.
To support further directions of code improvement, there are now also
much more complete tables of code sizes (which a _few_ submissions have
been working on) and extensive new tables of namespace violations, along
with the original tables of compiler error messages, checksum failures,
etc. Implementors can start from
https://bench.cr.yp.to/primitives-kem.html
https://bench.cr.yp.to/primitives-sign.html
and follow links to primitive-specific pages such as
https://bench.cr.yp.to/impl-kem/mceliece6960119.html
which show (1) cross-platform graphs of implementation speeds and (2)
links to tables for specific machines such as
https://bench.cr.yp.to/web-impl/amd64-hiphop-crypto_kem-mceliece6960119.html
showing the tables of speeds, code sizes, compilation errors, etc.
The rest of this message is about namespace violations. As an example,
https://bench.cr.yp.to/web-impl/amd64-hiphop-crypto_sign-pqrsa15.html
shows that the "ref" code---which should have had all names within
crypto_sign_pqrsa15_ref---defines "crypto_sign_pqrsa15_gmp_export"
rather than "crypto_sign_pqrsa15_ref_gmp_export". This might seem minor,
but if some "avx" code had made the same mistake then typical systems to
create "fat binaries" (selecting an implementation at run time, as in
"raid6: using ssse3x2 recovery algorithm" from the Linux kernel) would
have failed to link the "ref" and "avx" code together. There are ways
to make this sort of linking work, but a disciplined use of namespaces
is much more portable.
Most NISTPQC submission code has much more serious namespace violations:
e.g., defining a function called "verify" creates a huge risk of
collisions with other code. For benchmarking it's particularly important
to eliminate namespace violations in primitives used as subroutines: for
example, crypto_hash() namespace violations by the Keccak team in some
AVX-512 implementations led to compilation failures in fast ThreeBears
code, and there's nothing the ThreeBears code could reasonably have done
to protect itself from this.
Namespace violations in top-level primitives don't cause problems for
the benchmarks, but there are many reasons that it's good to factor out
subroutines (easier testing, easier auditing, easier optimization, often
much less time spent on benchmarking), and in any case it would be good
to fix code issues that will need to be fixed for real-world deployment.
For code readability it's good to keep short names such as "verify" in
the code, but use "#define verify ..." in *.h files to put the names
into the right namespace. Including "crypto_sign.h" automatically does
this for the crypto_sign_keypair(), crypto_sign(), crypto_sign_open()
API functions, and similarly "crypto_kem.h" handles its API functions.
---Dan