ML-KEM negative, intermediate, and edge case test vectors

Filippo Valsorda

unread,

Dec 17, 2023, 11:21:59 AM12/17/23

to pqc-...@list.nist.gov

Hello all,

I have published a comprehensive set of ML-KEM test vectors at https://c2sp.org/CCTV/ML-KEM.

The dataset includes:

Negative test vectors for invalid encapsulation keys, testing every value in the range [q, 2¹²) and every position in the key.
"Unlucky" vectors that require an unusually large number of XOF reads. (Specifically, 575 bytes, which would ordinarily happen with probability 2⁻³⁸.)
Vectors that fail if strcmp() is used in ML-KEM.Decaps.
Accumulated vectors (derived from the reference pq-crystals implementation) for testing randomly reachable edge cases without checking in large amounts of data, including an extended run of one million tests.
Intermediate values for testing and debugging each intermediate step and partial algorithm. (Previously announced).
References to other test vectors.

Please reuse these test vectors (they are made available under the terms of the CC0 1.0) and send feedback here, as GitHub issues, or privately via email. In particular let me know if you encounter any issues, if you successfully apply the test suite to your implementation, or if you have any idea for other edge or negative cases to test.

The implementation being upstreamed into the Go standard library currently passes all the tests above.

Cheers,

Filippo

P.S. Some implementers might also find interesting "Enough Polynomials and Linear Algebra to Implement Kyber".

Sophie Schmieg

unread,

Dec 18, 2023, 7:42:39 PM12/18/23

to Filippo Valsorda, pqc-...@list.nist.gov

hi all,

Here is my set of test vectors, which I'm currently in the process of adding to Wycheproof (which unfortunately ended up taking a while), which contain some additional edge cases. Specifically, I have, for round 3 and for the NIST Draft standard, as well as the discussed potential modification of the draft standard that does "silently reduce" instead of failing on unreduced vectors:

The vectors of the round 3 submission package
Vectors where public or private keys are not reduced mod q
Vectors where the various parts of Kyber are too short or too long
Edge cases where the secret and/or the error are zero
Vectors where the ciphertext is random bytes
Bit flips in ciphertext
message all zero/all 0xff
Values of rho where SHAKE expands more than usual (same as Filipo's except that I brute forced with a few more cores, and got it to read up to 591 bytes. At this point I would like to thank our detection and response team for not killing my job(s) hashing vast amounts of random seeds and looking for zeroes in the output)
Values of rho where the matrix has relatively large values (maximizing the sum of all entries)
Values of rho where the matrix contains an unusual amount of zeroes in NTT form (I found a seed with 3 zeroes mod prime factor of (3329), and a number of seeds with 2 zeroes)
Values of rho for which the matrix fails to be invertible mod (3329), which is otherwise a property that a random matrix is expected to have with high probability.

As mentioned, the test vectors should be available in Wycheproof sometime soon, with the generator script being published as well (except for the brute forcing part, since that used internal infrastructure). I tested the vectors against round3 implementations, but not against independent implementations of the draft standard, so if the test vectors fail there, the problem might be my generator script. You might want to wait for the Wycheproof release which should have a few more tests run against the vectors before merging them anywhere. Also note that the round 3 specification is underdefined when it comes to the behavior on non-reduced public/private keys, and my test vectors reject on those cases, following BoringSSL's implementation.

--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/b48899fb-f169-4f31-891f-296af7855945%40app.fastmail.com.

--

Sophie Schmieg | Information Security Engineer | ISE Crypto | ssch...@google.com

vectors.tar.gz

Filippo Valsorda

unread,

Dec 28, 2023, 10:43:14 AM12/28/23

to Sophie Schmieg, pqc-...@list.nist.gov

Hi Sophie,

Thank you for these! I can confirm that the {encaps,decaps}768draft vectors pass against my implementation of the draft standard.

https://github.com/FiloSottile/mlkem768/commit/9e9fcc27798302c1b046f20d954079ac9322d8ed

Looking forward to their inclusion in Wycheproof.

Cheers,

Filippo

Attachments:
vectors.tar.gz

D. J. Bernstein

unread,

Jan 2, 2024, 9:45:14 AMJan 2

to pqc-...@list.nist.gov

'Sophie Schmieg' via pqc-forum writes:
> - Values of rho where SHAKE expands more than usual (same as Filipo's

> except that I brute forced with a few more cores, and got it to read up to
> 591 bytes. At this point I would like to thank our detection and response
> team for not killing my job(s) hashing vast amounts of random seeds and
> looking for zeroes in the output)

Precomputing a test from, say, 2^64 hash calls would provide reasonable
assurance if the goal were to test an application calling Kyber keygen
2^32 times.

However, this type of test doesn't provide adequate protection against
the following failure mode:

* An implementor is implementing Kyber enc.

* The implementor uses a fixed-size buffer for the full XOF output
(because then the XOF call is just one line of code with a typical
SHAKE library, which is easier than having an outer loop around an
incremental API, even assuming the library provides such an API).

* The buffer size is chosen as the minimum that passes the available
tests (because that's how code is often written in the real world).

* The attacker does more hashing to find a public key where the code
reads past the end of the buffer.

* Secret application data happens to be right after the buffer (this
depends on memory layout but can certainly happen).

The resulting ciphertexts then expose the secret application data. The
attack can be repeated to filter out noise. There will be some omissions
in the leaked secrets since 12-bit fields >=3329 are skipped, but the
attacker should be able to reconstruct passwords, cipher keys, etc. I
think "KyberBleed" is a good name for this.

Structurally, Kyber enc has the unusual property of doing XOF rejection
sampling _based on attacker input_. A test based on 2^64 hash calls
doesn't protect against an attacker doing more calls.

One strategy for a safer test, not completely black-box but still easy
to automate, is to replace the Keccak library with a library returning
artificial results that push rejection sampling up to the maximum
conceivable number of loops. However, this doesn't work if Keccak is
built into the implementation being tested, as in the Kyber reference
code. One workaround is to use the techniques of

https://eprint.iacr.org/2023/1861

to automatically identify the Keccak calls inside the machine code, and
then artificially overwrite the outputs.

---D. J. Bernstein

P.S. Maybe it's time for some errata regarding claims of how easy Kyber
is to implement safely? Lulling people into complacency is dangerous.

signature.asc

Sam Jaques

unread,

Jan 3, 2024, 12:02:10 PMJan 3

to pqc-forum, D. J. Bernstein

Hi all,

The NSA suggested:

> O2a) Keep SHAKE as the XOF. One can select a very large d

> to call for SHAKE and if more than d bits of random are needed,

> have the algorithm return fallure. An appendix can then note

> that in practice for the sake of efficiency fewer bits can be

> called and the internal state saved so more can be squeezed

> out as needed. Alternatively, have verbiage to the effect that

> if you need more than d+32k bits, call SHAKE (m, d+32(k+1))

> and discard the first d+32*k bits.

Would that reliably prevent the "KyberBleed" attack?

Best,
Sam

D. J. Bernstein

unread,

Jan 3, 2024, 2:33:49 PMJan 3

to pqc-forum

Sam Jaques writes:
> The NSA suggested:
[ ... ]

> Would that reliably prevent the "KyberBleed" attack?

It's a documentation suggestion, so it can't stop KyberBleed in the case
where the programmer never even bothers reading the documentation. The
more powerful tests that I outlined, if implemented and used, would be
able to stop KyberBleed in that case.

In the more subtle case where the programmer is paying _some_ attention
to the documentation, I would think that NSA's suggestions _encourage_
KyberBleed. The first suggestion is to "select a very large d to call
for SHAKE"; if this suggestion is copied into the ML-KEM standard, then
programmers who see it will see that it's a simple SHAKE call---exactly
the second step in the KyberBleed scenario---even if they don't notice
this possibility on their own. The remaining suggestions require extra
lines of code and don't help pass tests, so I'd think those suggestions
are more likely to be ignored.

The critical question is then how programmers will choose the buffer
size d. Bumping d gradually up until tests pass is easy to imagine, and
is the third step in the KyberBleed scenario.

---D. J. Bernstein

signature.asc

Francisco

unread,

Feb 2, 2024, 4:15:44 AMFeb 2

to D. J. Bernstein, pqc-forum

Hello all,

In order to validate Sophie's set of test vectors, particularly, keygen{512,
768, 1024}draft, I had to modify the entropy values. It looks like these test
vectors were generated by an implementation that swaps steps 1 and 2 of
Algorithm 15 ML-KEM.KeyGen() of FIPS 203. I confirmed this by looking at the
serialization of the private key, which is suffixed by the `z` value and
equates to the last 32 bytes of "entropy".

In practice, if you want to validate these test vectors and avoid changing your
implementation, you can do so by moving the last 32 bytes of "entropy" to the
beginning.

With this change, my implementation passes all the keygen test vectors.
I haven't yet tested the rest (encaps/decaps).

Perhaps Sophie could confirm this, and update these test vectors before
inclusion on Wycheproof?

Best regards,
--Francisco Vial-Prado

Sophie Schmieg

unread,

Feb 2, 2024, 1:51:23 PMFeb 2

to Francisco, D. J. Bernstein, pqc-forum

You are right, nice catch. I called the CPA keygen before getting the randomness for the FO rejection. It seems like the reference implementation does the same thing (which explains why the official test vectors passed, and why Fillipo did not notice it). I will either swap the order for Wycheproof, or maybe just make two explicitly different entropy fields.

--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.

To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/fg_2tkiKva9A0wZ9U7XM2cUF3nVqdxLvgWgQ9oTCc2KiRg_YTNDMNftrJ_zuhXfmTG7rZNuFNlX6Z5GcEKzPGh0AeKaGfwgMGWf8Xrjkyv4%3D%40vialprado.com.

Filippo Valsorda

unread,

Feb 2, 2024, 2:42:36 PMFeb 2

to Sophie Schmieg, Francisco, pqc-...@list.nist.gov

2024-02-02 19:51 GMT+01:00 'Sophie Schmieg' via pqc-forum <pqc-...@list.nist.gov>:

You are right, nice catch. I called the CPA keygen before getting the randomness for the FO rejection. It seems like the reference implementation does the same thing (which explains why the official test vectors passed, and why Fillipo did not notice it). I will either swap the order for Wycheproof, or maybe just make two explicitly different entropy fields.

Indeed, the reference implementation test_vectors.c also draws d before z, which is reflected (and documented) in my accumulated vectors. https://c2sp.org/CCTV/ML-KEM#accumulated-pq-crystals-vectors

At this point, I think it would be clearer to separate them explicitly.

By the way, I really appreciated how the X-Wing specification approached this, by explicitly specifying a "derandomized version". I wish all specifications did this. https://www.ietf.org/archive/id/draft-connolly-cfrg-xwing-kem-00.html#section-4-2.1.3

Sophie Schmieg

unread,

Feb 2, 2024, 2:46:39 PMFeb 2

to Francisco, D. J. Bernstein, pqc-forum

I only brute forced rho to about 2^44 SHAKE calls or so, it is definitely true that an attacker could brute force substantially more, although the returns evaporate rapidly, as this is an exponential problem.

The main mistake I was trying to guard against here is implementations of SHAKE that do not support multiple squeezes of the Keccak state, not so much to guard against buffer overflows, although that is of course another possible problem that could occur.

The naive way of implementing Kyber is to read 3 bytes of the XOF stream, and attempt to produce up to 2 integers [1], with no buffer exceeding the Keccak state ever being allocated. The more optimized version of this code would read out the entire Keccak io buffer, to avoid unnecessary copies [2].

Since the algorithm is rejection sampling, any implementor would have to put a while loop with an indeterminate end condition, making it unlikely that someone would instantiate a buffer just large enough to pass the tests, as at that point we are not talking about a subtle flaw anymore, but a rather obvious one. This is particularly true if the developer doesn't run the full test vector suite when initially developing their implementation, and only runs across a larger test vector later, at which point they hopefully realize that the number of bytes read from the Keccak stream is unbounded.

Mocking out the keccak implementation is definitely a way an implementer could test this, but it would force a certain structure on the code that could lead to other vulnerabilities. In particular, an implementation with a mockable XOF could, if the mock is accessible outside the test environment, get an attacker to change the hash function dynamically. For that reason, I would prefer not encouraging such an API, and rather accept the risk of a buffer overflow here, at least in my experience the bad API is more likely to cause damage in this particular case.

In fact, I am somewhat conflicted about the keygen and encaps test vectors for this very reason: They force mocking out the entropy, which, if the mock ends up in the API, can lead to catastrophic misuse. Developers like setting arguments they do not understand the purpose of to a constant, and quite often to zero. That would lead to a ciphertext that looks reasonable (and even differs between public keys), but is trivially broken.

You can see such an API difference in BoringSSL as well: The HRSS implementation has externally supplied entropy in its API [3], while the Kyber implementation does not [4]. Acceptable in this case, since nobody should be using HRSS without knowing exactly what they are doing, but definitely a concern for a more widely used API.

[1] As an example, the original BoringSSL code before any optimization did this: https://boringssl-review.googlesource.com/c/boringssl/+/57845/1, line 225 in kyber.c

[2] As an example, the current BoringSSL code does this: https://github.com/google/boringssl/blob/master/crypto/kyber/kyber.c#L301

[3] https://github.com/google/boringssl/blob/master/include/openssl/hrss.h

[4] https://github.com/google/boringssl/blob/master/include/openssl/kyber.h

Roderick Chapman

unread,

Feb 3, 2024, 6:47:43 AMFeb 3

to pqc-...@list.nist.gov

On 02/02/2024 19:46, 'Sophie Schmieg' via pqc-forum wrote:

> The naive way of implementing Kyber is to read 3 bytes of the XOF
> stream, and attempt to produce up to 2 integers [1], with no buffer
> exceeding the Keccak state ever being allocated.

...which, in a naive fashion, is exactly what my code does.

Other than obvious inefficiency, is there anything wrong with that?

- Rod

Reply all

Reply to author

Forward