on post-quantum crypto module testing now and in the future

354 views

Skip to first unread message

Markku-Juhani O. Saarinen

unread,

Sep 2, 2021, 1:01:41 PM9/2/21

to pqc-forum

Dear PQC Forum,

I'm sure that NIST has already been thinking about compliance testing of Post-Quantum Cryptography, perhaps more than anyone. But I think that this is something where implementers, vendors, and especially cryptographic design teams can do more to help.

In the semiconductor industry, people repeat the mantra of "design for test" a lot. PQC algorithms tend to be quite a bit more complex than older types of algorithms. So perhaps the community can present arguments on how to get good coverage testing for the correctness of the implementations, and also for testing implementation security.

After standardization, PQC algorithms can be used in "certified cryptographic modules" which are tested in third-party security testing labs against FIPS 140-3 or some Common Criteria profile. Much of commercial cryptography works like this, as a lack of that FIPS certificate will prevent government sales, and also because those standards are handy for specifying requirements and acceptance criteria in contracts. However those third-party labs will typically only test the specific things that they are asked to test, on paper, so these considerations are essential.

I have a talk scheduled today on Thursday at ICMC ‘21 in the post-quantum track (right after Dustin Moody): 16:30 (ET) "PQC Modules: Requirement Specifications, Integration, and Testing (Q23b)". https://icmconference.org/

Here are the slides: https://mjos.fi/doc/20210902-icmc-q23b-saarinen-pqc-reqspec.pdf

The talk is mostly about how we do testing currently, not the best possible way it can be done in the future.

On this topic I do have a miscellaneous wish-list for NIST and the community:

1. Testing coverage (and KATs)

Test cases for failures: This may seem awfully mundane to academic cryptographers but is essential to implementers. It is obvious that malformed and mismatching KEM ciphertexts and signatures need to be tested. I may use random bit flips for that, and mismatching public keys and secret keys, but perhaps there is more that ought to be done.

Technical lemmas and formal models for internal components: Sometimes internal traces are not particularly useful as internal representations can vary (lazy reduction, masked representations, etc.). But formal models for internal components are extremely useful. Some candidate specifications have “technical lemmas” that can be converted into formal assertions, but more would be great. I’ve also found some errors in those, so it seems like those technical lemmas have not necessarily been formally derived.

Fully deterministic functions with a seed argument: For automated testing (ACVTS style), the primitives could be specified as fully deterministic functions, which take a random "seed" as an extra API argument. This way one doesn't need to test with a dummy RBG. The spec can just state how much "full entropy" (SP 800-90C term) the algorithm needs and use an XOF expander internally. Many candidates do this already. The common use of SHAKE for this purpose was originally done for performance, but it has many advantages beyond that.

Additional test vectors: Detached signatures (i.e. signatures without the message). Furthermore, since hash-and-sign is clearly not preferable anymore, having a definition of how to use other hash functions with the signature schemes would be great too. The already ratified SP 800-208 hash-based signatures do randomized hashing in two different ways (XMSS vs LMS/HSS), and it now looks like there might be even more coming up. Perhaps one could have a uniform way of doing this.

2. Let cryptographers specify serialization

Clearly defined octet string encodings for everything: There are implementation aspects of PQC algorithms that are frankly frightening if left for system integrators to decide. Previous NIST standards were not very strict on these.

We know how more modern Elliptic Curve systems have moved from ASN.1 point encodings to carefully considered octet-to-number transformations that have become essential parts of the algorithms definitions themselves.

Most submission teams have reasonably efficient encodings in the specifications. I wish that bit-level specifications for ciphertext, detached signatures, public keys, and private keys will be contained in the upcoming standards, together with rules for input validation.

3. Try to make sure that FIPS 140-3 non-invasive testing will catch side-channel attacks

In the crypto module world, side-channel countermeasures are called "non-invasive attack mitigations." TVLA (in the ISO 17825 variant) has emerged as a way to do basic side-channel testing (Timing, DPA, Emissions) of PQC modules. It is not perfect but it is suitable for third-party testing labs to use, as there are reasonably clear fail/pass criteria and only a limited amount of creativity required when applying the test.

My impression is that ISO 17825 testing will probably become mandatory in FIPS 140-3 near future (i.e. a reference to that standard will go into an SP 800-140F update).

What to test: Design teams and the community can comment on which TVLA tests need to be run (e.g. non-specific random key vs static key, malformed ciphertext vs good ciphertext, etc). As noted elsewhere, these must capture at least decapsulation failures.

Do we need to improve on ISO 17825? When looking at some side-channel key attack, one should estimate how likely it is to be applicable to a module that has passed TVLA testing at some level. This will determine the practical impact of those attacks in a near-future market where many modules generally try to be at least ISO 17825 compliant. Or if entirely new tests need to be introduced.

Cheers,

- Markku

Dr. Markku-Juhani O. Saarinen <mj...@pqshield.com> PQShield, Oxford UK.

D. J. Bernstein

unread,

Sep 7, 2021, 11:34:42 PM9/7/21

to pqc-...@list.nist.gov

Some notes on the extent to which these issues are already addressed in
SUPERCOP's testing framework, which is considerably more advanced than
the test procedures that I've seen from NIST.

Markku-Juhani O. Saarinen writes:
> PQC algorithms tend to be quite a bit more complex than older types of
> algorithms.

Definitely. Also, for ECC we have decades of experience with what
implementors get wrong and how to design tests to catch these bugs,
whereas for post-quantum crypto we have far less experience. There's
an extensive literature on risk analysis and more specifically on
proactively predicting bugs, and I've been working on applying these
techniques to post-quantum crypto, but there's a ton more to do here.

> Test cases for failures:

SUPERCOP automatically tests some mutations of ciphertexts; same for
signatures; and has a particularly stringent series of tests for
comparison functions. These tests were deployed _before_ announcements
of various bugs that these tests can easily catch, illustrating that
proactive efforts can succeed---but, again, it's important to study ways
that mistakes can still pass the tests that we do have.

> Technical lemmas and formal models for internal components:

An important feature of SUPERCOP is that implementors can add internal
subroutines as crypto_* primitives. Each primitive is then run through
all of SUPERCOP's usual testing (and benchmarking) mechanisms. This
modularization can also be helpful for verification, as illustrated by
my message dated 3 Sep 2021 22:52:53 +0200.

See https://bench.cr.yp.to/tips.html#new-subroutines for the general
mechanisms, and see crypto_kem/sntrup761/factored for an example of an
implementation using several such subroutines. Any implementation using,
e.g., crypto_hash_sha512() is also using an example of this mechanism,
but the extensibility of the mechanism is important.

> Fully deterministic functions with a seed argument:

SUPERCOP specifies a central source of randomness, encourages
implementations to be fully deterministic functions on top of that, and
uses this determinism for its known-answer tests ("checksum"). Any
implementation failing the known-answer tests is explicitly rejected.

(Implementations that use, e.g., the OpenSSL RNG are still accepted for
benchmarking and are subjected to some other tests, but then SUPERCOP
doesn't allow the primitives to advertise checksums in the source tree.)

There's a lot to say from a software-engineering perspective regarding
the details of SUPERCOP's randomness interface. One of the important
details is that randomness is provided through linkage to a
"randombytes" function that's callable on the fly---so subroutines are
free to ask for randomness at any moment, as in the general literature
on algorithms---rather than through a "seed argument". This simplifies a
huge number of implementations of cryptographic primitives, at the cost
of a small amount of one-time central effort in SUPERCOP to provide a
deterministic "randombytes" for testing. For further discussion see,
e.g., my pqc-forum message dated 27 Mar 2017 13:50:51 +0000.

> For automated testing (ACVTS style), the primitives could be specified
> as fully deterministic functions, which take a random "seed" as an
> extra API argument. This way one doesn't need to test with a dummy
> RBG.

Test vectors for an algorithm always require some sort of deterministic
generation of the algorithm's inputs and of whatever randomness is used
by the algorithm. SUPERCOP handles all of this centrally and carries out
automated testing _without_ asking thousands of implementations to
handle the complications of a seed argument.

> Additional test vectors: Detached signatures (i.e. signatures without
> the message).

https://cr.yp.to/papers/supercop-20200813.pdf summarizes, inter alia,
the advantages of a signed-message interface:

In the SUPERCOP API, there is a simple all-in-one function to extract
a verified message from a signed message. There are no extra steps to
get wrong or forget: the inputs and outputs are in wire format, and
the signature is not treated as a separate object that one can forget
to verify.

Detached signatures are a much more dangerous interface. (Random recent
example: https://www.cvedetails.com/cve/CVE-2021-32685/; more generally,
see https://capec.mitre.org/data/definitions/475.html.) For historical
reasons, detached signatures are a commonly supported interface, and
occasionally they're claimed to be critical for performance in certain
applications---but I haven't seen clear evidence backing this claim,
whereas the security dangers are amply documented.

> Furthermore, since hash-and-sign is clearly not preferable anymore,
> having a definition of how to use other hash functions with the
> signature schemes would be great too.

To the extent that such things are specified, it's helpful for a test
framework to handle subroutines in a modular way, as in SUPERCOP.

> 2. Let cryptographers specify serialization

It's not as if cryptographers have ever been prevented from doing this!
See, e.g., https://cr.yp.to/papers.html#curve25519, Section 2, which
does this and explains advantages of doing this. One can also find some
NISTPQC specs that fully define all functions down to byte strings; see
Section 4.4 of https://classic.mceliece.org/nist/mceliece-20201010.pdf
for further discussion.

Anyway, the SUPERCOP API has always been defined in terms of byte
strings for the same reasons. The SUPERCOP tests are for byte-string
outputs produced by byte-string inputs.

> 3. Try to make sure that FIPS 140-3 non-invasive testing will catch
> side-channel attacks

SUPERCOP already includes TIMECOP for searching for deviations from the
constant-time coding discipline. In principle this type of mechanism can
be made 100% robust, and there's work underway in this direction.

Regarding protection in environments where more powerful side channels
are available, I agree that TVLA is widely used today and catches many
problems, but I worry that the efforts to standardize it are going to
end up producing meaningless certificates for implementations broken by
more advanced attacks. We've seen a recent massacre of "protected"
NISTPQC implementations; do any of these _not_ pass TVLA? Why should we
think that the work going into TVLA certification is going to be useful
in stopping those attacks?

Anyway, so far SUPERCOP hasn't done anything with side channels beyond
timing. It does accept "protected" software for benchmarking, and I do
hope that at some point the research into EM attacks etc. will discover
real limits to those attacks, limits that can be tested automatically.

---Dan