ML-KEM is not MAL-BIND-K-CT

3,579 views
Skip to first unread message

Sophie Schmieg

unread,
Mar 26, 2024, 2:00:31 PM3/26/24
to pqc-forum, Peter Schwabe, cre...@cispa.de
hi all,

In the excellently titled paper "Keeping up with the KEMs" [1], Cremers et al introduced various security notions that go beyond IND-CCA, in particular around binding various aspects of the key encapsulation to various others and formally proved whether or not they were met in the case of ML-KEMs and others. The paper did not answer the question in the case of ML-KEM for the strongest introduced attacker model positively or negatively. It turns out that ML-KEM in fact does not meet this property, as the example below shows.

Given the strength of the attacker model, I don't currently know whether this attack matters, but for the very least it is something to keep in mind for future protocol designers. There are several ways that should mitigate this attack, such as changing the serialization format of ML-KEM to not contain the hash of the public key in the private key, verifying the hash of the public key on deserialization of the private key (making it superfluous), or storing the private key as the seed used to generate it.

In order to show that a KEM is not MAL-BIND-K-CT, one has to construct a private key sk, a public key pk, encapsulation entropy m, and a decapsulation ciphertext c such that:
Encaps(pk, m).K = Decaps(sk, c)
Encaps(pk, m).c != c
In other words, both encapsulator and decapsulator are using attacker supplied key material and agree on the shared secret, but disagree on the ciphertext that was used to create it.
An example for ML-KEM-768 with such values is attached (all values encoded in hex).
The attack works by first creating two key pairs honestly, getting (sk1, pk1), (sk2, pk2), then replacing sk1.h with H(pk2). Take m to be random, and compute c as Encaps'(pk1, m, H(pk2)), where Encaps' replaces the hash of the public key in the line 2 of algorithm 16 in the FIPS 203 draft standard with the last argument. Setting sk = sk1, pk = pk2 completes the attack.

As I said, I'm not sure if this property is relevant in any real world protocol, given that attacker supplied private keys at the very least should hopefully be rare, but there are certainly similar binding problems that have been exploitable before, in particular the invisible salamander problem in AES-GCM, see [2], [3].
mal-bind-k-ct-ml-kem.txt

Sophie Schmieg

unread,
Apr 6, 2024, 11:48:51 AM4/6/24
to pqc-forum, Peter Schwabe, cre...@cispa.de
I've written up this construction in a short paper [1], and included a MAL-BIND-K-PK counterexample as well. I also proved that both misbinding properties are only occurring due to the way private keys are serialized and fixed by using a single seed to generate the private key instead.
In both cases, the proof only depends on the properties of the FO transform itself, and should be generalizable to other KEMs depending on the exact flavor of FO used.

[1] https://eprint.iacr.org/2024/523

Deirdre Connolly

unread,
Apr 6, 2024, 2:56:16 PM4/6/24
to Sophie Schmieg, pqc-forum, Peter Schwabe, cre...@cispa.de
Thank you very much Sophie for this analysis!

> I also proved that both misbinding properties are only occurring due to the way private keys are serialized and fixed by using a single seed to generate the private key instead.

I'm very interested in changing the ML-KEM key serializations to be seed-based as described, as having an MAL-BIND-K-PK and MAL-BIND-K-CT ML-KEM will make integrating it into higher protocols maximally easy and safe.


--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/CAEEbLAbhQmOqPGUyd7qts_1F_UkMWpY0BDQtQfWMZj1QysyLmA%40mail.gmail.com.

Filippo Valsorda

unread,
Apr 6, 2024, 4:03:20 PM4/6/24
to Deirdre Connolly, Sophie Schmieg, pqc-forum, Peter Schwabe, cre...@cispa.de
2024-04-06 20:55 GMT+02:00 Deirdre Connolly <durumcr...@gmail.com>:
Thank you very much Sophie for this analysis!

> I also proved that both misbinding properties are only occurring due to the way private keys are serialized and fixed by using a single seed to generate the private key instead.

I'm very interested in changing the ML-KEM key serializations to be seed-based as described, as having an MAL-BIND-K-PK and MAL-BIND-K-CT ML-KEM will make integrating it into higher protocols maximally easy and safe.

Agreed, I hope we'll standardize on encoding ML-KEM private keys as seeds. If not in the FIPS standard, in the practical PKCS#8 representation.

Keys have implicitly two representations: one on the wire, one in memory. The former should be optimized for interoperability, to minimize the margin of error and reduce required validation. The latter will make tradeoffs to enable faster operations, and doesn't even need a fixed bytes format.

The cost of going from one to the other is not very relevant for private keys: I can't think of applications that need to load private keys and do one or a few decapsulations with it as fast as possible. Either keys are ephemeral, in which case they can go straight into the in-memory representation, or they are reused a lot, in which case the deserialization cost is amortized.

The current ML-KEM private key format is a weird hybrid: it's not as compact and safe as a seed, but doesn't include the full matrix that would make decapsulations faster. I'm not sure when we'd use that format.

D. J. Bernstein

unread,
Apr 7, 2024, 7:53:18 AM4/7/24
to pqc-...@list.nist.gov
Filippo Valsorda writes:
> I hope we'll standardize on encoding ML-KEM private keys as seeds.
[ ... ]
> Keys have implicitly two representations: one on the wire, one in
> memory.

I'm confused. Can someone please say what exactly the security concerns
are here that are

* triggered by Kyber's private-key format, and
* resolved by compressing the private key to a seed, but
* not triggered by Kyber's format for private keys _in RAM_?

Do the claimed security advantages of a seed depend on whether data is
encrypted before it reaches the DRAM modules? Or on whether swap is
encrypted, for data swapped to disk? If a private key is encrypted and
stored on a SAN, then it's definitely on the wire (even for notions of
"wire" that don't include buses); does it then need to be a seed?

If the advertisement of a seed as a private-key format is that this
format meets this "MAL-BIND-K-CT" property, then why exactly does
decompressing the seed for decapsulation not violate the property?

Also, do redundancies in other stored data (the encoding of credit-card
numbers, for example) trigger the same security issue, whatever exactly
the issue is?

---D. J. Bernstein

P.S. I'm also deeply skeptical of the idea that this decompression cost
is a problem, even if it's decompression on every decap, but I've seen
objections to smaller expenditures. Decisions regarding tradeoffs
between performance and security should be backed by clear statements of
what the security goals are and what the costs are.
signature.asc

Bas Westerbaan

unread,
Apr 10, 2024, 10:21:35 AM4/10/24
to Filippo Valsorda, Deirdre Connolly, Sophie Schmieg, pqc-forum, Peter Schwabe, cre...@cispa.de
I support using a seed as the primary portable private key format.

Primarily it's simpler, smaller, and saves us from thinking about private key validation. 

To make things concrete, we could go for the changes to FIPS 203 at the end of the e-mail [1].

To put things into perspective, on my Ice Lake laptop, keygen, encapsulation and decapsulation for ML-KEM-768-ipd currently run in 18,000, 19,000 and 21,000 cycles respectively.

Using a seed as private key makes decapsulation directly using this seed more expensive: 31,000 cycles in total.

Expanding the seed is essentially the same as keygen: 18,000 cycles.

Decapsulation using fully expanded private key is cheaper: 13,000 cycles.

In practice TLS stacks already have this split: they keep A around after generating client keyshare, and use that for faster decapsulation when they receive the server keyshare.

Best,

 Bas


[1]

1. Rename K-PKE.KeyGen to K-PKE.ExpandPrivate; remove lines 1, 2, 20, and 21; add rho and sigma as arguments; and return (\hat{A}, \hat{t}, \hat{s}).

2. Add a new function ML-KEM.UnpackPrivate that takes a 32-byte dk as argument, and actas as follows:

(rho, sigma, z) = J(dk)
\hat{A}, \hat{t}, \hat{s} = K-PKE.ExpandPrivate(rho, sigma)
return (\hat{A}, \hat{t}, \hat{s}, rho, z)

Note that the other place J is used includes the ciphertext as input, which makes them input-length separated.

Also note that the call to J here has a capacity of 104 bytes, making it in essence a "SHA3-832".

3. Change ML-KEM.KeyGen to:

dk <$- B^32
(\hat{A}, \hat{t}, \hat{s}, rho, z) = ML-KEM.UnpackPrivate(dk)
ek = ByteEncode_12(\hat{t}) || rho
return (ek, dk, (\hat{A}, \hat{t}, \hat{s}, rho, z))

4. Change K-PKE.Encrypt to accept \hat{A}, and \hat{t} directly instead of ek_PKE, removing lines 2–8.

5. Change K-PKE.Decrypt to accept \hat{s} as argument directly instead of dk_PKE.

6. Change ML-KEM.Decaps to accept (\hat{A}, \hat{t}, \hat{s}, rho, z) instead of dk, removing lines 1–4. Pass \hat{s} into K-PKE.Decrypt  instead of dk_PKE. Pass \hat{A} and \hat{t} into K-PKE.Encrypt instead of ek_PKE.

7. Change ML-KEM.Encaps to include the lines 2–8 removed from K-PKE.Encrypt before the call to K-PKE.Encrypt, to which \hat{A} and \hat{t} are passed instead of ek.

Bas Westerbaan

unread,
Apr 10, 2024, 10:23:25 AM4/10/24
to Filippo Valsorda, Deirdre Connolly, Sophie Schmieg, pqc-forum, Peter Schwabe, cre...@cispa.de
On Wed, Apr 10, 2024 at 4:21 PM Bas Westerbaan <b...@cloudflare.com> wrote:
I support using a seed as the primary portable private key format.

Primarily it's simpler, smaller, and saves us from thinking about private key validation. 

To make things concrete, we could go for the changes to FIPS 203 at the end of the e-mail [1].

To put things into perspective, on my Ice Lake laptop, keygen, encapsulation and decapsulation for ML-KEM-768-ipd currently run in 18,000, 19,000 and 21,000 cycles respectively.

Using a seed as private key makes decapsulation directly using this seed more expensive: 31,000 cycles in total.

Expanding the seed is essentially the same as keygen: 18,000 cycles.

Decapsulation using fully expanded private key is cheaper: 13,000 cycles.

In practice TLS stacks already have this split: they keep A around after generating client keyshare, and use that for faster decapsulation when they receive the server keyshare.

Best,

 Bas


[1]

1. Rename K-PKE.KeyGen to K-PKE.ExpandPrivate; remove lines 1, 2, 20, and 21; add rho and sigma as arguments; and return (\hat{A}, \hat{t}, \hat{s}).

2. Add a new function ML-KEM.UnpackPrivate that takes a 32-byte dk as argument, and actas as follows:

(rho, sigma, z) = J(dk)
\hat{A}, \hat{t}, \hat{s} = K-PKE.ExpandPrivate(rho, sigma)
return (\hat{A}, \hat{t}, \hat{s}, rho, z)

Note that the other place J is used includes the ciphertext as input, which makes them input-length separated.

Also note that the call to J here has a capacity of 104 bytes, making it in essence a "SHA3-832".

Oops, that would be "SHA3-416".

Filippo Valsorda

unread,
Apr 10, 2024, 11:08:08 AM4/10/24
to Bas Westerbaan, Deirdre Connolly, Sophie Schmieg, pqc-forum, Peter Schwabe, cre...@cispa.de
2024-04-10 16:21 GMT+02:00 Bas Westerbaan <b...@cloudflare.com>:
I support using a seed as the primary portable private key format.

Primarily it's simpler, smaller, and saves us from thinking about private key validation. 

To make things concrete, we could go for the changes to FIPS 203 at the end of the e-mail [1].

I think we could make less invasive changes and get mostly the same result, by defining the portable private key format as (d || z) without changing any of the derivation steps, which is what some implementations and X-Wing are already calling a "seed". https://www.ietf.org/archive/id/draft-connolly-cfrg-xwing-kem-02.html#section-4-2.1.4.1.1

We also don't need FIPS 203 to actually spell out how to use more efficient in-memory representations. Those can just be implementation optimizations already allowed by the specification, since equivalent processes are allowed. In fact, we just need to add the decompression steps at the top of ML-KEM.Decaps.

If we were at an earlier stage, I would agree with the more extensive changes, but at this point in order to minimize the last-minute document changes and avoid interoperability issues, I think adopting (d || z) would be easier and safer.

(It also has the advantage that should NIST decide not to make the change in FIPS 203, we could agree to use that seed format in PKCS#8 and other interchange formats, making it the de-facto private key format. I still hope the change could make it into FIPS 203.)

Sophie Schmieg

unread,
Apr 10, 2024, 11:44:52 AM4/10/24
to Filippo Valsorda, Bas Westerbaan, Deirdre Connolly, pqc-forum, Peter Schwabe, cre...@cispa.de
With d || z you unfortunately only deal with the MAL-BIND-K-CT attack, but not the MAL-BIND-K-PK attack. So you would need to specify it as a 48 byte value o that is used to seed AES-CTR-DRBG to obtain d and z, if you want to stay fully within NIST. Not a big problem, just a bit obnoxious.
Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
0 new messages