Updates for FIPS 203

2,492 views
Skip to first unread message

Moody, Dustin (Fed)

unread,
Apr 19, 2024, 2:05:59 PM4/19/24
to pqc-forum

NIST is pleased to have received over 90 pages of public comments received from 42 commenters on draft FIPS 203: Module-Lattice-based Key-Encapsulation Mechanism (ML-KEM).  The comments are posted at https://csrc.nist.gov/pubs/fips/203/ipd

In response to these, and comments posted on the pqc-forum, NIST has been incorporating small changes to the draft to improve clarity, testing and validation capability, implementation guidance, and consistency with other NIST documents.  The changes, and non-changes, were summarized in the talk “FIPS 203 Update” by Quynh Dang at the 5th NIST PQC Standardization Conference.  This post summarizes the changes NIST plans to make to the draft FIPS 203. 

Clarity

  1. The indices of matrix A in K-PKE.Encrypt() in the draft FIPS 203 draft and Kyber.CPAPKE.Enc() in the Kyber specification v3.02 were swapped.  NIST will revert the indexing to match that of Kyber v3.02.
  2. Several editorial changes were made to improve clarity and readability.
  3. We plan to work with byte strings instead of bit strings wherever possible

Consistency with other NIST documents

  1. Some notation and language have been modified to add more consistency with draft FIPS 204 (ML-DSA).
  2. Existing standards did not allow use of SHAKE as a stream, so NIST will specify a XOF API and rewrite the SampleNTT algorithm in draft FIPS 203 to use this API.  More details can be found in a follow up post on the API for XOFs.
  3. SP 800-56C currently only applies to shared secrets generated using SP 800-56A and SP 800-56B.  NIST will now allow the key derivation methods in 56C to apply to keys generated as specified in FIPS 203 in place of shared secrets.  NIST cautions that if your goal is to have an IND-CCA2-secure combined KEM, great care is needed. If you combine the key derivation with another key establishment or key exchange procedure, then you need to assess the security of the combined procedure.  More details regarding KEMs will be provided in the forthcoming SP 800-227.

Testing and validation capability, implementation guidance

  1. Keygen and Encaps currently generate random seeds internally which complicates CAVP testing.  NIST plans to specify “lower-level” derandomized API to enable CAVP testing with well-defined KATs and to allow the storing of keys as seeds.  The “top-level” API will remain the same (randomized KeyGen and Encaps).  More details can be found in a follow up post on derandomization
  2. We have introduced the use of “shall not” regarding use floating-point arithmetic.
  3. We plan to explicitly allow implementers to limit the number of iterations in while loops which appear in the SampleNTT algorithm. We intend to give a minimum allowable iteration limit, such that deviation from the behavior of the while loop is cryptographically rare for a correct implementation.

 

There is one additional change that we are considering but we have not yet come to a decision.  During Decaps, the implicit rejection key \bar{K} is set to J(z||c).  NXP PQC team public comment mentions that side-channel protections against \bar{K} will be easier if we swap the order of input to J and instead compute J(c||z).  Pierre-Augustin BERTHET’s public comment recommends computing J(z||H(c)).  Should we make a change at all?  If so, which change is preferable?

Please let us know if you have any feedback or questions about the planned changes to FIPS 203.


Angela Robinson

NIST PQC

D. J. Bernstein

unread,
Apr 22, 2024, 3:07:26 PM4/22/24
to pqc-forum
'Moody, Dustin (Fed)' via pqc-forum writes:
> We plan to explicitly allow implementers to limit the number of
> iterations in while loops which appear in the SampleNTT algorithm.

This complication sounds dangerous, even assuming a limit carefully
computed as something that should never be reached.

If the attacker finds a seed that triggers a bug or fault that causes
the limit to be reached, then evidently the buffer hasn't been filled,
so enc will leak what was in the buffer before---maybe secrets. Looping
reduces this from a violation of confidentiality to at most a
denial-of-service attack.

It _might_ help to have language saying that implementations should
abort if they hit the limit, but people won't know how to test any of
this, and will often release data even after hitting the limit, in much
the same way that software often fails to notice invalid signatures.

Furthermore, if an implementation does abort, what's the surrounding
application supposed to do? Maybe tries enc again, so it's back to the
loop, at least for a bug or a stable fault; but maybe gets confused,
triggering its own release of secret data.

Interfaces that normally succeed but _can_ fail are generally a problem
for tests. Sometimes there are good reasons to introduce these failures,
but here I don't see what the reason is.

---D. J. Bernstein
signature.asc

Samuel Lee

unread,
Apr 23, 2024, 2:03:30 PM4/23/24
to pqc-forum, D. J. Bernstein, pqc-...@list.nist.gov
An important piece of feedback on:

NIST plans to specify “lower-level” derandomized API to enable CAVP testing with well-defined KATs and to allow the storing of keys as seeds.  The “top-level” API will remain the same (randomized KeyGen and Encaps).  More details can be found in a follow up post on derandomization

I think this is a great development both for testing and for allowing a compact representation of ML-KEM private keys, as one can store the seed rather than the decapsulation key blob.

However, it also introduces the potential for a key confusion attack potentially exposing a route to additional cryptanalysis of public information.

With the current definition of K-PKE.KeyGen in FIPS 203, a 32-byte private seed value d which is assumed to be random is expanded into a private key.


The problem is that for a given value of d, when used to generate an ML-KEM 512, ML-KEM 768, or ML-KEM 1024 key, a lot of the internal details of the generated keys are shared.
Specifically, the 2x2 A matrix for ML-KEM 512 is a submatrix of the 3x3 A matrix for ML-KEM 768 which is a submatrix of the 4x4 matrix for ML-KEM 1024.
Similarly, the 4-element secret s for ML-KEM 1024 is the concatenation of the 3-element secret s and the first element in secret e for ML-KEM 768.

If an application can be induced to expand a private seed value to different ML-KEM flavors, with the resulting public encapsulation keys being shared, an attacker with access only to the public information could infer a lot more details about the underlying secrets.

This seems plausible in a bad implementation of ML-KEM in a TPM (or some other key isolation service), where the key is persisted in storage as a seed, and the metadata only indicates it is a generic ML-KEM key.
In this scenario, an attacker could plausibly request the encapsulation key for a given key handle against several ML-KEM flavors. The attacker can easily detect a vulnerable implementation as the public ρ will be the same for the different encapsulation keys.


I strongly think there should be some encoding of the key parameters being used in the initial SHA3-512 of d K-PKE.KeyGen() to avoid these tight mathematical relationships between keys using the same seeds to reduce scope for this kind of attack.

Concretely, line 2 becomes:
(ρ, σ) <- G(id || d) where id is some byte constant unique to each ML-KEM flavor (or some equivalent construction).

The cost is close to 0, and it makes ML-KEM key generation less brittle to misuse.


A similar comment applies to the other suggestions about using a private seed as a decapsulation key (i.e. Official comment on FIPS 203 ipd: seed as decapsulation key (google.com)).

Best,
Sam

Bobby McGee

unread,
Apr 23, 2024, 2:58:31 PM4/23/24
to pqc-forum, Samuel Lee, D. J. Bernstein, pqc-...@list.nist.gov
RE:  "limit the number of iterations in while loops" for FIPS 203, 204.  As Prof. Bernstein says, it's hard to see what this accomplishes and not clear what should happen if such a limit is reached.  If NIST really wanted to remove variable timing, the "obvious" solution is to get rid of rejection sampling and introduce bias in the PRNG.  I'm not advocating this (it would be a huge change whose consequences would need to be thoroughly explored), but there seems to be a dichotomy.  Either you have rejection sampling and wait or you don't; I'm not sure sticking timeouts into the cryptography is a good solution, especially if those limits are left to implementors.  If NIST really wants to do this, it should be done uniformly and an exit procedure outlined.  In any case, this is a pretty big change in my opinion.

Scott Fluhrer (sfluhrer)

unread,
Apr 23, 2024, 3:10:18 PM4/23/24
to Bobby McGee, pqc-forum, Samuel Lee, D. J. Bernstein, pqc-...@list.nist.gov

The only use case I can see is if you are in a ‘hard real time’ environment, where the spec says “you MUST respond to X in Y msec” (and failing to generate a Kyber public key can lead to an acceptable response).

 

That said, it isn’t great – as Dan said, this is a quite rare failure mode, and rare failure modes are just as bad in real time designs as they are in security (and for largely the same reason – you have something that is quite difficult to test in practice…)

 

From: pqc-...@list.nist.gov <pqc-...@list.nist.gov> On Behalf Of Bobby McGee
Sent: Tuesday, April 23, 2024 2:59 PM
To: pqc-forum <pqc-...@list.nist.gov>
Cc: Samuel Lee <samue...@microsoft.com>; D. J. Bernstein <d...@cr.yp.to>; pqc-...@list.nist.gov <pqc-...@list.nist.gov>
Subject: Re: [pqc-forum] Updates for FIPS 203

 

RE:  "limit the number of iterations in while loops" for FIPS 203, 204.  As Prof. Bernstein says, it's hard to see what this accomplishes and not clear what should happen if such a limit is reached.  If NIST really wanted to remove variable timing, the "obvious" solution is to get rid of rejection sampling and introduce bias in the PRNG.  I'm not advocating this (it would be a huge change whose consequences would need to be thoroughly explored), but there seems to be a dichotomy.  Either you have rejection sampling and wait or you don't; I'm not sure sticking timeouts into the cryptography is a good solution, especially if those limits are left to implementors.  If NIST really wants to do this, it should be done uniformly and an exit procedure outlined.  In any case, this is a pretty big change in my opinion.

On Tuesday, April 23, 2024 at 12:03:30 PM UTC-6 Samuel Lee wrote:

An important piece of feedback on:
NIST plans to specify “lower-level” derandomized API to enable CAVP testing with well-defined KATs and to allow the storing of keys as seeds.  The “top-level” API will remain the same (randomized KeyGen and Encaps).  More details can be found in a follow up post on derandomization

I think this is a great development both for testing and for allowing a compact representation of ML-KEM private keys, as one can store the seed rather than the decapsulation key blob.

 

However, it also introduces the potential for a key confusion attack potentially exposing a route to additional cryptanalysis of public information.

 

With the current definition of K-PKE.KeyGen in FIPS 203, a 32-byte private seed value d which is assumed to be random is expanded into a private key.


--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/2529401a-a7a7-4c47-b811-5b2da8e53ea0n%40list.nist.gov.

D. J. Bernstein

unread,
Apr 23, 2024, 5:12:00 PM4/23/24
to pqc-...@list.nist.gov
'Scott Fluhrer (sfluhrer)' via pqc-forum writes:
> The only use case I can see is if you are in a ‘hard real time’
> environment, where the spec says “you MUST respond to X in Y msec”
> (and failing to generate a Kyber public key can lead to an acceptable
> response).

That would normally be enforced by a separate high-level monitor
watching the whole sequence of computations and, if necessary, giving a
"computations didn't terminate in time" response. I don't know why
anyone would want to randomly sprinkle failures into individual steps.

---D. J. Bernstein
signature.asc

Brent Kimberley

unread,
Apr 23, 2024, 5:15:36 PM4/23/24
to D. J. Bernstein, pqc-...@list.nist.gov
Fault injection is all the rage these days...


From: pqc-...@list.nist.gov <pqc-...@list.nist.gov> on behalf of D. J. Bernstein <d...@cr.yp.to>
Sent: Tuesday, April 23, 2024 5:12:03 p.m.
To: pqc-...@list.nist.gov <pqc-...@list.nist.gov>

Subject: Re: [pqc-forum] Updates for FIPS 203
--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
THIS MESSAGE IS FOR THE USE OF THE INTENDED RECIPIENT(S) ONLY AND MAY CONTAIN INFORMATION THAT IS PRIVILEGED, PROPRIETARY, CONFIDENTIAL, AND/OR EXEMPT FROM DISCLOSURE UNDER ANY RELEVANT PRIVACY LEGISLATION. No rights to any privilege have been waived. If you are not the intended recipient, you are hereby notified that any review, re-transmission, dissemination, distribution, copying, conversion to hard copy, taking of action in reliance on or other use of this communication is strictly prohibited. If you are not the intended recipient and have received this message in error, please notify me by return e-mail and delete or destroy all copies of this message.

Brent Kimberley

unread,
Apr 23, 2024, 5:19:07 PM4/23/24
to D. J. Bernstein, pqc-...@list.nist.gov, Brent Kimberley
Unfortunately, it treads very close to questions like what is time?  ;)

From: 'Brent Kimberley' via pqc-forum <pqc-...@list.nist.gov>
Sent: Tuesday, April 23, 2024 5:15:25 PM
To: D. J. Bernstein <d...@cr.yp.to>; pqc-...@list.nist.gov <pqc-...@list.nist.gov>

Bas Westerbaan

unread,
Apr 24, 2024, 4:59:41 AM4/24/24
to Moody, Dustin (Fed), pqc-forum

There is one additional change that we are considering but we have not yet come to a decision.  During Decaps, the implicit rejection key \bar{K} is set to J(z||c).  NXP PQC team public comment mentions that side-channel protections against \bar{K} will be easier if we swap the order of input to J and instead compute J(c||z).  Pierre-Augustin BERTHET’s public comment recommends computing J(z||H(c)).  Should we make a change at all?  If so, which change is preferable?


Has anyone analysed whether this affects security? [1] If not, we should not make the change.

Best,

 Bas

[1] It's not directly obvious that they don't change security. With a collision on the state of J, we can get c,c' with J(c || z) = J(c' || z) despite z being secret. That's not the case with J(z || c). J(z || H(c)) moves the issue to collisions for H.

 

Brent Kimberley

unread,
Apr 24, 2024, 7:42:41 AM4/24/24
to D. J. Bernstein, pqc-...@list.nist.gov

Here is one first order classical approach to protection:

The physical device under supervision has a capability, C/t [units per second]:

               The supervision period is DT [seconds]:

                              The high-water arm setpoint is Ch/t [units/sec]

                              The low-water alarm setpoint is Cl/t [units/sec]

 

The monitor traverses the arrow of time:

 

Possible issues:

  • Monitor failure (Implement multiple monitors)
  • The statistics channel could be compromised.  (Perform data triangulation)
  • Edge failure (erasure coding – implement FEC)
  • Non classical physics (Send time pulses over statistics channel)
  • The list goes on.

David A. Cooper

unread,
Apr 24, 2024, 3:52:23 PM4/24/24
to pqc-forum
Perhaps it would be helpful if we explain the reason we are considering this.

We agree that implementations should not limit the number of iterations in the while loops, and that algorithms such as SampleNTT in ML-KEM should continue to run to completion. However, we have been informed that it is common for implementations to limit the number of iterations in loops such as these, even if the specification does not allow for it. While we could try to test for whether an implementation is doing this, as has been mentioned previously, it is unlikely that this could be caught through known answer testing. This is because the work required to find a seed that would result in the loop exceeding a certain limit grows exponentially with the size of the limit.

If it is the case that some implementations are likely to limit the number of iterations whether the standard allows it or not, then we thought it would be better to guide such implementers towards setting a limit that is known to be sufficiently high that exceeding it would be very rare (e.g., less than 1 in 2^128) rather than leaving them to choose a limit on their own.

We continue to welcome feedback on this and the other issues we raised, and will carefully consider that feedback as we make the final edits to the standards.

Blumenthal, Uri - 0553 - MITLL

unread,
Apr 24, 2024, 4:10:35 PM4/24/24
to David A. Cooper, pqc-forum
Let me understand: you assume that (at least some) vendors would deliberately violate the standards by putting in loop iteration limits, but those same vendors would abide by the standard requiring those limits to be high enough to remain secure?

TNX
 
Regards,
Uri

On Apr 24, 2024, at 15:53, 'David A. Cooper' via pqc-forum <pqc-...@list.nist.gov> wrote:


Perhaps it would be helpful if we explain the reason we are considering this. We agree that implementations should not limit the number of iterations in the while loops, and that algorithms such as SampleNTT in ML-KEM should continue to run
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside the Laboratory.
 
ZjQcmQRYFpfptBannerEnd

Bas Westerbaan

unread,
Apr 24, 2024, 4:14:32 PM4/24/24
to David A. Cooper, pqc-forum
If it is the case that some implementations are likely to limit the number of iterations whether the standard allows it or not, then we thought it would be better to guide such implementers towards setting a limit that is known to be sufficiently high that exceeding it would be very rare (e.g., less than 1 in 2^128) rather than leaving them to choose a limit on their own.

This makes sense. 

D. J. Bernstein

unread,
Apr 24, 2024, 9:33:07 PM4/24/24
to pqc-...@list.nist.gov
Just now there was a news article

https://arstechnica.com/cars/2024/04/linux-is-now-an-option-for-safety-minded-software-defined-vehicle-developers/

quoting Elektrobit's Moritz Neukirchner saying the following:

When you look at how safety is typically being done, look at
communication---you don't safety-certify the communication specs or
Ethernet stack, but you do a checker library on top, and you have a
hardware anchor for checking down below, and you insure it end to end
but take everything in between out of the certification path.

The architecture here is important. A safety property such as responding
in a particular amount of time is handled by a global monitor enforcing
that property, not by an ad-hoc mishmash of failure mechanisms inserted
into the individual computation layers.

'David A. Cooper' via pqc-forum writes:
> However, we have been informed that it is common for implementations
> to limit the number of iterations in loops such as these, even if the
> specification does not allow for it.

This raises transparency questions: Who made this claim? Was there a
definition of "common", a definition of "such as these", evidence for
the claim, an argument that these limits are a good thing, and an
evaluation of whether the argument holds up to scrutiny?

So far the simplest explanation for what's going on here is that some
commentator is following dangerous system-engineering practices and
asking NIST to loosen some standards to support those practices.

> While we could try to test for whether an implementation is doing
> this, as has been mentioned previously, it is unlikely that this could
> be caught through known answer testing.

Yes, more advanced (non-black-box) tests would be useful. I outlined one
approach in email dated 2 Jan 2024 15:44:55 +0100, with a reference to
https://eprint.iacr.org/2023/1861. The context there was the KyberBleed
potential failure mode, which comes from incorrectly limiting a buffer
size rather than limiting a loop length, but tests would work similarly
for both cases; the point is to identify and replace the hash outputs.

> If it is the case that some implementations are likely to limit the number
> of iterations whether the standard allows it or not, then we thought it
> would be better to guide such implementers towards setting a limit that is
> known to be sufficiently high that exceeding it would be very rare (e.g.,
> less than 1 in 2^128) rather than leaving them to choose a limit on their
> own.

Hmmm. Right now there are third-party test vectors that are tantamount
to documenting a limit that's too small. I would recommend picking that
limit and issuing a warning (maybe as an SP) along the following lines:

(1) If an implementation were to limit the loop to this number of
iterations then it would crash into that limit for every X
trillion keys, keys that attackers can easily afford to generate
and that will legitimately show up on occasion.

(2) Such an implementation would not be conformant. The standard
requires looping until the full matrix is generated, not limiting
this loop to a prespecified number of iterations.

I don't see how telling people a should-never-be-reached limit (again,
even assuming it's correctly computed) is improving the situation; on
the contrary, it's a complication that can easily convert faults and
bugs from denial-of-service attacks into confidentiality failures. As an
analogy, https://cr.yp.to/papers.html#smartfacts extracted RSA secret
keys from certified smart cards because of faults that _should_ never
have happened, but that did, in fact, happen.

---D. J. Bernstein
signature.asc

Simon Hoerder

unread,
Apr 25, 2024, 4:12:33 AM4/25/24
to pqc-...@list.nist.gov
Hi,

At a conference I heard someone making a claim that they implemented
constant time Kyber by always sampling the same amount of bytes from
SHAKE and if those were insufficient for rejection sampling raising an
error. I don’t recall who it was or which conference it was but it was
poking a couple of known issues:

1) We always talk about „constant time“ when we mean „timing independent
of secret data“. Customers have learnt to look for constant time and
crypto vendors have to spend extra effort in customer service to educate
customers about „timing independent of secret data“. It can be done but
extra effort is rarely appreciated.

2) Counting clock cycles to verify „constant time“ is relatively easy.
I‘m under the impression that a lot of CI/CD environments do that.
Verifying „timing independent of secret data“ can either be done via
careful TVLA-style testing, code-review or formal verification.
TVLA-style testing is not supported by many CI/CD environments. Code
review regularly fails to identify time related leakages from HW
features as the various architectural side channel attacks have shown.
And I would love to hear that there is a formal verification tool that
can verify HW-SW co-designs but I‘m not aware of any that can. Pure HW
implementations are going to be rare and formal verification tools that
make assumptions about the underlying HW without verifying them will
once again miss a lot of timing leakages for the same reasons as code
reviews tend to miss them.

3) ML-KEM is already introducing failure possibilities via implicit
rejection. I believe implicit rejection is going to cause more head
scratching among developers who need to integrate ML-KEM into a system
than an implementation that raises a time-out error. Time-out errors are
a known quantity for integrators, implicit rejection is not. I wouldn't
be surprised at all if a fair number of ML-KEM implementations start
being naughty and turn implicit rejection into explicit by changing the
return code. Certification schemes will have to deal with that as well.
As a side note: The FIPS 203 draft states in section 6.3 that:
> In either case, decapsulation outputs the resulting shared secret key K′.
This is wrong in case that c != c'. The encapsulation side does not have
the c received by the decapsulation side and it does not know z. So when
c != c', K' can not be a _shared_ secret.

I share the opinion that crypto algorithms should avoid anticipating
unrelated problems such as time limits imposed by a safety context but
all PQC algorithms introduce new problems into very complex deployment
scenarios when you start looking at critical infrastructure or any other
area requiring high security. People will struggle with that and I
expect we'll see a range of weird/naughty workarounds until everyone's
failed enough to establish best-practice for every integration scenario.

Personally, I think NIST should publish FIPS 203 without artificial
limits and take the time to study those limits a bit more. If required,
NIST can still publish an SP800 that allows FIPS-203 with limits and
support those with a well thought out justification and integration
guidance.

Best,
Simon
>> *From:* pqc-...@list.nist.gov <pqc-...@list.nist.gov> *On Behalf
>> Of *Bobby McGee
>> *Sent:* Tuesday, April 23, 2024 2:59 PM
>> *To:* pqc-forum <pqc-...@list.nist.gov>
>> *Cc:* Samuel Lee <samue...@microsoft.com>; D. J. Bernstein
>> <d...@cr.yp.to>; pqc-...@list.nist.gov <pqc-...@list.nist.gov>
>> *Subject:* Re: [pqc-forum] Updates for FIPS 203
>>
>> RE:  "limit the number of iterations in while loops" for FIPS 203,
>> 204.  As Prof. Bernstein says, it's hard to see what this accomplishes
>> and not clear what should happen if such a limit is reached.  If NIST
>> really wanted to remove variable timing, the "obvious" solution is to
>> get rid of rejection sampling and introduce bias in the PRNG.  I'm not
>> advocating this (it would be a huge change whose consequences would
>> need to be thoroughly explored), but there seems to be a dichotomy.
>> Either you have rejection sampling and wait or you don't; I'm not sure
>> sticking timeouts into the cryptography is a good solution, especially
>> if those limits are left to implementors.  If NIST really wants to do
>> this, it should be done uniformly and an exit procedure outlined.  In
>> any case, this is a pretty big change in my opinion.
>>
>> On Tuesday, April 23, 2024 at 12:03:30 PM UTC-6 Samuel Lee wrote:
>>
>> An important piece of feedback on:
>> /NIST plans to specify “lower-level” derandomized API to enable
>> CAVP testing with well-defined KATs and to allow the storing of
>> keys as seeds.  The “top-level” API will remain the same
>> (randomized KeyGen and Encaps).  More details can be found in a
>> follow up post on derandomization/
>> <https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/5CT4NC_6zRI>).
>> https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/2529401a-a7a7-4c47-b811-5b2da8e53ea0n%40list.nist.gov <https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/2529401a-a7a7-4c47-b811-5b2da8e53ea0n%40list.nist.gov?utm_medium=email&utm_source=footer>.
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "pqc-forum" group.
>> To unsubscribe from this group and stop receiving emails from it, send
>> an email to pqc-forum+...@list.nist.gov.
>> To view this discussion on the web visit
>> https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/CH0PR11MB5444BA3251DF98DDCED10EFAC1112%40CH0PR11MB5444.namprd11.prod.outlook.com <https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/CH0PR11MB5444BA3251DF98DDCED10EFAC1112%40CH0PR11MB5444.namprd11.prod.outlook.com?utm_medium=email&utm_source=footer>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "pqc-forum" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to pqc-forum+...@list.nist.gov
> <mailto:pqc-forum+...@list.nist.gov>.
> To view this discussion on the web visit
> https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/83715b92-0774-4b67-abc3-269879acf3f4%40nist.gov <https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/83715b92-0774-4b67-abc3-269879acf3f4%40nist.gov?utm_medium=email&utm_source=footer>.

Nicky Mouha

unread,
May 3, 2024, 11:51:31 AM5/3/24
to pqc-forum
I'd like to chime in to explain the reasoning behind the plan to explicitly allow implementers to limit the number of iterations in while loops.

Implementers would be allowed but not required to limit the number of loop iterations, so unbounded loops would still be an option as well. As David Cooper explained, NIST would provide a minimum allowable iteration limit (which will be reached with a negligible probability, e.g., less than 1 in 2^128) rather than letting everyone choose their own limits.

Reaching the limit would be a fatal error from which it is impossible to safely recover. (There are already other cases where cryptographic implementations encounter fatal errors from which it is impossible to safely recover, for example, when a system has a permanent failure to provide good enough entropy.)

Note that the NXP PQC team had already commented about the unbounded loop issue here: https://csrc.nist.gov/files/pubs/fips/204/ipd/docs/fips-204-initial-public-comments-2023.pdf

For critical systems, bounded loops are a common requirement. For example, NASA/JPL's coding standards require that all loops must have fixed bounds (Power of Ten Rule 2): https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Developing_Safety-Critical_Code

Following this simple coding rule would have avoided the CVEs that I found in Apple's CoreCrypto (CVE-2019-8741) and in the SHA-3 implementation by its designers (CVE-2022-37454). (For certain inputs, this SHA-3 implementation does not enter an infinite loop but allows for arbitrary code execution!) Just recently, another infinite loop vulnerability was found in BouncyCastle (CVE-2024-30172).

The limitations of black-box testing have come up in this discussion, indeed checking some of the proposed requirements ("shall not" for floating-point arithmetic and a sufficiently high number or unbounded number of iterations for certain while loops) may necessitate a simple source code review.

For anyone experiencing the limitations of black-box testing and interested in more advanced approaches for cryptographic implementations, we're organizing the NIST Workshop on Formal Methods within Certification Programs (FMCP 2024) on July 23-25. Submissions are still open! https://www.nist.gov/news-events/events/nist-workshop-formal-methods-within-certification-programs-fmcp-2024

Regards,
Nicky

>> To view this discussion on the web visit
>> https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/2529401a-a7a7-4c47-b811-5b2da8e53ea0n%40list.nist.gov <https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/2529401a-a7a7-4c47-b811-5b2da8e53ea0n%40list.nist.gov?utm_medium=email&utm_source=footer>.
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "pqc-forum" group.
>> To unsubscribe from this group and stop receiving emails from it, send
>> To view this discussion on the web visit
>> https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/CH0PR11MB5444BA3251DF98DDCED10EFAC1112%40CH0PR11MB5444.namprd11.prod.outlook.com <https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/CH0PR11MB5444BA3251DF98DDCED10EFAC1112%40CH0PR11MB5444.namprd11.prod.outlook.com?utm_medium=email&utm_source=footer>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "pqc-forum" group.
> To unsubscribe from this group and stop receiving emails from it, send

D. J. Bernstein

unread,
May 3, 2024, 2:21:29 PM5/3/24
to pqc-...@list.nist.gov
'Nicky Mouha' via pqc-forum writes:
> Reaching the limit would be a fatal error from which it is impossible to
> safely recover. (There are already other cases where cryptographic
> implementations encounter fatal errors from which it is impossible to
> safely recover, for example, when a system has a permanent failure to
> provide good enough entropy.)

Whether or not recovery is _impossible_, these are definitely bad
situations. The question is how to handle those situations in a
cryptographic implementation.

The problem with saying "We don't have valid data within the expected
number of loops, let's just return whatever garbage we have" is that,
even assuming the expected number is correctly calculated to avoid
failures under non-faulty operation, this termination frequently means
leaking secret data, accepting forged data, etc.

Saying "We'll return a failure mode for the caller to catch" is ignoring
the reality that callers are already overloaded with failure cases, as
illustrated by

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-33531
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-32962
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-32883

and many other recent CVEs.

Meanwhile it's already known how to actually ensure real-time responses,
as in the recent 2024 Elektrobit quote that I cited earlier: set up a
monitor dedicated to ensuring this safety property, and carefully review
that monitor, rather than trying to shift blame to an endless list of
sometimes-malfunctioning components.

> NASA/JPL's coding standards require that all loops must have fixed bounds
> (Power of Ten Rule 2):
> https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Developing_Safety-Critical_Code

Whatever Wikipedia might seem to suggest about that 2006 proposal, what
JPL actually ended up issuing in 2009 as an "Institutional Coding
Standard for the C Programming Language"---

https://web.archive.org/web/20111015064908/https://lars-lab.jpl.nasa.gov/JPL_Coding_Standard_C.pdf

---explicitly allowed waivers given "substantial supporting evidence".
See page 6.

More importantly, when people started actually _studying_ the impact of
coding standards, they found again and again that those standards were
poor predictors of software faults. See, e.g.,

https://web-backend.simula.no/sites/default/files/publications/Simula.SE.331.pdf

regarding the MISRA "standard" that JPL's "standard" was based on, and

https://dl.acm.org/doi/pdf/10.1145/2884781.2884812

regarding the broader issue of software-engineering beliefs often being
simultaneously (1) strongly held and (2) out of whack with the evidence.

> Following this simple coding rule would have avoided the CVEs that I found
> in Apple's CoreCrypto (CVE-2019-8741) and in the SHA-3 implementation by
> its designers (CVE-2022-37454).

No, it wouldn't. The simplest way to follow the rule is to mechanically
put a limit of 123456789123456789ULL iterations on each loop. Then the
"compliance" tools can proudly say that they've certified that the code
always exits; but this certainly doesn't stop any security holes.

---D. J. Bernstein
signature.asc

Nicky Mouha

unread,
May 3, 2024, 5:09:41 PM5/3/24
to pqc-forum
Hi Dan, I think we're mainly in agreement here! So, I just want to clarify some things for the record. I never suggested "returning whatever garbage" when a fatal error occurs, this would definitely be very dangerous! My understanding of a fatal error involves entering some BUG/abort/panic function that does not return. That's why I made a comparison with the situation where a system has a permanent failure to provide good enough entropy. This is not a failure mode for the caller to catch! The best way to deal with a fatal error clearly depends on the situation. For example, there may be cases where it is best to log an error and enter into an infinite loop, as suggested in the Rustonomicon to implement a Rust panic handler (https://doc.rust-lang.org/nomicon/panic-handler.html). It would depend on the scenario whether this may or may not be better than silently entering into an unbounded loop due to an apparent fault/bug. I never suggested that the proposal for loop iteration bounds is related to real-time responses. I agree that there are better ways to deal with this. The concern is that an implementer may be faced with two conflicting requirements: a requirement to follow the specification (even if it has unbounded loops), versus a requirement that all loops must have fixed bounds. One resolution would be the written waiver request in the JPL coding standard that you refer to (consistent with the proposal to still allow unbounded for those who prefer this), and another resolution would be to ensure a minimum iteration limit in the NIST standard (to be reached with a negligible probability, e.g., less than 1 in 2^128). It would depend on the situation which option is preferable. But we definitely want to rule out that an implementer faced with this dilemma chooses any of the bad options mentioned in this thread, such as choosing iteration limits on their own, or allowing for incorrect calculations or other garbage to be returned! I don't think it's good that the standard remains silent about this, instead it should allow bounded loops only when a strong set of requirements are met (with "shall" statements to ensure that these requirements are checked during CAVP/CMVP conformance testing). Then, implementers may either implement bounded loops with these additional requirements or may decide that the additional requirements are difficult to meet (and that unbounded loops are better)! I agree that compliance tools can't stop any security holes, especially if implementers feel incentivized to alter code to deviate from the specification just to thwart the compliance tools. This is precisely the reason to use certain coding rules as a pragmatic step toward more rigorous methods, rather than considering compliance with coding rules to be a goal by itself. Regards, Nicky
Reply all
Reply to author
Forward
0 new messages