Comments on FIPS 203/204

1,769 views
Skip to first unread message

Stern, Morgan B

unread,
Oct 30, 2023, 2:16:40 PM10/30/23
to pqc-...@list.nist.gov
We would like to call the community's attention to a few of the comments we are submitting to the official comments for both FIPS 203 and 204. These comments concern hash usage in ML-DSA and ML-KEM, related to easy readability of the standard, being consistent with other standards, and ease of implementation and production in hardware and software. These can be summarized as:

S1) For clarity, FIPS 204 should adopt the notational convention of FIPS 203 with regards to defining cryptographic functions early in terms of hash primitives.

S2) SHAKE as used in FIPS 203 is somewhat inconsistent with how it is defined in FIPS 202; this should be changed or clarified. Additionally, note that Hash_DRBG as defined in 800-90A fits the usage of "XOF" in FIPS 203 very well.

S3) In order to prevent the need for implementing several distinct hash-related primitives, ideally only one hash primitive (rather than e.g. both SHAKE256 and SHA3-512) would be invoked in a particular parameter set of ML-KEM/ML-DSA. Note that if a SHA2 primitive were chosen for this in FIPS 203/204 rather than SHAKE or SHA3, it would accelerate uptake and deployment of ML-KEM/ML-DSA in many use cases and not reduce deployment in any case we are aware of. We are not suggesting the standard give a user a choice in which hash to use - it should still be fixed at least per security level for interoperability.

Point S3) above can be seen as the general public comments NSA made at the Fourth PQC Standardization Conference being applied to the specifics of the drafts of FIPS 203/204.

Morgan Stern
NSA Cybersecurity

Official comments being submitted on FIPS 203/204 drafts concerning hashes:

O1) In the ML-KEM specification, the cryptographic functions are defined in an early section (4.1 Cryptographic Functions) which makes the different instantiations of hashes easier to keep track of than in ML-DSA. We strongly suggest ML-DSA adopt a similar section and the language be consistent between the two. As such, the remainder of these comments apply to both ML-DSA and ML-KEM.

O2) In ML-KEM section 4.1, the function XOF is defined as SHAKE and comments suggest that one can generate bits as needed by "squeezing" the sponge function to output more bits. While mathematically this is true, it is not quite consistent as written with FIPS 202 which defines the SHA3 and SHAKE functions. In section 6.2 of FIPS 202, SHAKE is explicitly defined as taking a digest length d in addition to a message m, and so FIPS implementations cannot necessarily be used in the way described in FIPS 203. In the appendix A.2 of 202 they make it clear that SHAKE(m,d) and SHAKE(m,d+e) will share the same first d bits of output, the property being used in ML-KEM.

We offer two options to fix this:

O2a) Keep SHAKE as the XOF. One can select a very large d to call for SHAKE and if more than d bits of random are needed, have the algorithm return fallure. An appendix can then note that in practice for the sake of efficiency fewer bits can be called and the internal state saved so more can be squeezed out as needed. Alternatively, have verbiage to the effect that if you need more than d+32k bits, call SHAKE (m, d+32(k+1)) and discard the first d+32*k bits.

O2b) If one wants to keep the functionality of generating a large number of bits up front, and then more as you need them, that is much closer to the description of the SP 800-90A standards for DRBG. Given that ML-KEM relies heavily elsewhere on secure hashes, we would suggest defining Hash_DRBG as the XOF used in ML-KEM/ML-DSA.

O3) It can be impractical to call multiple hash functions when one is taking advantage of previously-validated FIPS modules because frequently modules only implement a subset of the FIPS they are tested against. Hence, we suggest only one hash primitive be used. If SHAKE continues to be chosen for the XOF then it should be used for the hash as well. If Hash_DRBG is used, then the hash primitive chosen for Hash_DRBG should be the same hash primitive chosen to construct all of PRFS and hashes in the standard.

Further, we suggest that the hash function being invoked in the above discussion be SHA2 with a size appropriate to the security level. The reasons to select SHA2 are manifold:

O3a) SHA2 family algorithms are ubiquitous in the federal space and SHA-384/SHA-512 are the algorithms used for NSS as specified in CNSA. Because SHA2 is used so often in low-level parts of the computing trust infrastructure, from today's commercial and government signing infrastructure, to HMAC, to Hash_DRBG, nearly any FIPS-compliant device in use by the government that utilizes hardware acceleration for cryptography will need to have SHA2 present for the next decade. For example, Federal PKIs (including the DoD PKI) have transitioned to SHA2 and so FIPS-validated SHA2 modules are on all PKI-enabled devices in the federal government as well as any device being purchased in the medium term. SHA3/SHAKE, on the other hand, has very limited uptake in this space.

O3b) Due to this ubiquity, SHA2 will generally not require additional hardware if one is upgrading a system to ML-KEM/ML-DSA while the use of SHA3 may. On many near-term systems, such as smart cards used in Federal PKI, SHA3 and SHAKE require more hardware space than SHA2 because it is an add-on.

O3c) If the module is implemented in software, SHA2 is substantially faster than SHA3 or SHAKE.

O3) Additional context:

We would like to provide some context for how dominant SHA2 is over SHA3 when it comes to market penetration in the government sector, so we consider as a proxy the number of cryptographic hardware products that have active NIST Cryptographic Module Validation Program (CMVP) validation credentials in the CMVP public database. This hardware represents the acceleration available to federal agencies as they try to adopt FIPS 203 and 204 over the next several years.

In a search performed on 10/13/23, there were 429 hardware modules implementing FIPS 180-4 (SHA2) while 34 have FIPS 202 (SHA3 or SHAKE). This order-of -magnitude difference is not solely the result of the longer life of FIPS 180-4: if we restrict to hardware modules validated in the last calendar year, 10 have FIPS 202 while 102 contain FIPS 180-4. Further, of those 10, only a single one of the hardware modules containing FIPS 202 actually had a FIPS-validated SHAKE. This dominance also extends to digital signatures. A similar search of the Cryptographic Algorithm Validation Program (CAVP) database shows that in 2023 there are 284 CAVP validated implementations of FIPS 186 signature schemes using SHA2, 2 using SHAKE, and none using SHA3.

The ubiquity of SHA2 applies to internet protocols as well. For example, most TLS, IPSec, and SSH implementations will make use of SHA2 to generate session keys. Further, these protocols often use SHA2 as part of the integrity function when employing a non-AEAD cipher. For either use case, the supporting IETF RFCS do not define SHA3 as an option - the use of SHA3 (or SHAKE) is primarily limited to signatures, which as seen above, are typically not FIPS-compliant.

While several products will likely be able to transition to ML-KEM regardless of the hash used (such as software defined services on general purpose devices), constrained or embedded form factors as well as high performance gear may delay their transition until hardware acceleration is available. That acceleration is presently ubiquitous for SHA2, in limited availability for SHA3, and nearly unavailable in SHAKE.

To summarize, in order to ease transition for large commercial enterprises, reduce Size/Weight/Power requirements, and make it more likely that the whole of government can comply to the greatest extent possible with the requirements in National Security Memo 10 we suggest that both ML-KEM and ML-DSA adopt SHA2. For the largest parameter sets (and potentially for all) this would be:

XOF(p,i,j) := sequential bits of SHA512_DRBG(p||i||j) as instantiated by Hash_DRBG_Generate_algorithm.
H(S) := truncated (SHA-512(s), 256)
J(s) := truncated (SHA-512(s), 256)
G(C) := SHA-512(c)

Domain separation if desired can be achieved by prepending a domain-specific value, i.e. G(c)=SHA-512('G'||c).

We want to be clear that this comment is proposing a change to section 4.1, and not an addition to it. Under no circumstance should there be multiple versions of ML-KEM of a particular security level that differ only by the choice of XOF. These would not interoperate with each other and would only confuse customers as they try to comply with mandates to transition to a particular version. While several different approaches can be used for e.g. key generation in RSA or ECDH, ML-KEM is unique in that a change in key generation breaks interoperability with other implementations due to the FO transform.

Mike Hamburg

unread,
Oct 30, 2023, 9:16:43 PM10/30/23
to Stern, Morgan B, pqc-...@list.nist.gov
Hello all,

I have a few thoughts on this, mostly in the direction of “please can we not”.  Also, I’m not completely sure of it, but the particular instantiation you suggested might, uh, actually contain a weakness.

I do agree that it is important to make the usage of the XOF clear, and for certification the ML-KEM XOF usage should be made compatible with FIPS 202, if it is not already.  I also agree that the mixed usage of SHA-3 and SHAKE is a wart on ML-KEM.  However, it is in my opinion essentially harmless, because these functions are both thin wrappers around Keccak, and an implementation of SHAKE gets SHA-3 almost for free.


I do not agree that SHA-512 is more suitable for usage in ML-KEM and ML-DSA than SHAKE.  Indeed, SHAKE has a more appropriate interface, comparable or better performance, and is easier to make side-channel resistant.  Perhaps most importantly, the community has been working on it, which gives us a hope of getting the security right.


Interface and standards compliance: Hashgen (from HASH_DRBG in SP800-90A revA), like SHAKE, can in principle generate a long stream of bits.  But per the spec it takes an output length up front, so it doesn’t improve on SHAKE’s interface.  Hashgen could be used if you had an implementation of it separately from HASH_DRBG, but it’s not specified for this purpose, and indeed it is not suitable.  In particular, if the input to Hashgen (and thus to Hash_DRBG_Generate_algorithm) were really (p || i || j), then (if I’m reading right) a disaster would occur: Hashgen increments its state, and if I understand correctly it’s big-endian, so the next state would be (p || i || j+1), colliding with another call to hashgen and likely wrecking the security of the entire KEM.

As I understand it, the full HASH_DRBG “shall not" be used in place of a XOF, because while it is internally deterministic, it is only designed and specified to take its seed from an entropy source. Per SP800-90A revA, section 8.6.5, it must be used with an approved entropy source, and per section 8.6.9: “The seed that is used to initialize one instantiation of a DRBG shall not be intentionally used to reseed the same instantiation or used as the seed for another DRBG instantiation”.  That is, using a DRBG instead of a XOF within the Fujisaki-Okamoto transform is (as I understand it) forbidden, because FO requires both the sender and receiver to instantiate the DRBG with the same seed.  I might be misunderstanding the specification here, but even if so, a DRBG doesn’t seem cleaner, since we don’t need the rollback resistance, reseed functionality etc that a DRBG provides.

Furthermore, if we were to use the full HASH_DRBG instead of hashgen, we would need to fix an output granularity as well, since the sequence of bits depends on how many you read at a time (unlike with SHAKE).


Software performance: I looked at some performance data in SUPERCOP [1].  SHA-512 is slightly faster at hashing long inputs than both SHAKE128 and SHAKE256 (except on a few processors such as Apple M1, where the SHAKEs are faster).  But while its input block size is almost the same as SHAKE, the SHA512 output block size is less than half that of SHAKE.  So while producing several blocks of output, SHA-512 will not necessarily be faster, though the details will surely depend on vectorization: SHA-512 could be vectorized within a Hashgen call, but SHAKE would need to be vectorized across several XOF calls.

SHA-256 would be even slower on 64-bit machines than SHA-512, though on modern processors it is often hardware-accelerated and so would be faster.


Hardware performance: SHAKE and SHA-3 are about the same size in hardware [2], but SHAKE256 is three to four times as fast, since it has 24 rounds instead of 80.  SHA3-512 is also used by ML-KEM, and it has a smaller input size than SHA-512, but this doesn’t matter since it’s only ever called with 512-bit inputs.

One could build an ML-KEM or ML-DSA module with SHA-256, which is smaller than SHAKE in hardware, but of course it would be locked out of the stronger parameters.


Side-channel resistance: ML-KEM and ML-DSA are tricky to protect against physical side channels, and this problem is made trickier by their tight integration with the hash function.  Fortunately, Keccak is amenable to masking and threshold implementations due to its low AND-depth.  While side-channel protection is still not easy, this feature makes it more achievable.  SHA-512 is not as amenable to physical side-channel protection — while it is possible to provide side-channel protection for SHA-512 as well, this has a greater impact on area, performance and complexity.


Finally, I don’t think that the cryptographic community would appreciate such a change.  This is partly because it is rather too late for this kind of bikeshedding, since the community has put significant work into analyzing and implementing Kyber and Dilithium with Keccak.  Significant amounts of work would have to be redone with a change to SHA-512 and some sort of HASH_DRBG hack.  Also, NSA is, to put it mildly, not a globally trusted organization.  People will not trust that the changes don’t introduce a hidden weakness, even if the possible not-so-hidden weakness above is removed.

Regards,
— Mike Hamburg (speaking for myself)

[1] Daniel J. Bernstein and Tanja Lange (editors).  Measurements of SHA-3 finalists, indexed by machine.  https://bench.cr.yp.to/results-sha3.html 

[2] Nannipieri, Pietro & Bertolucci, Matteo & Baldanzi, Luca & Crocetti, Luca & di Matteo, Stefano & Falaschi, Francesco & Fanucci, Luca & Saponara, Sergio. (2020). SHA2 and SHA-3 Accelerator Design in a 7nm Technology Within the European Processor Initiative. Microprocessors and Microsystems. 87. 10.1016/j.micpro.2020.103444. 







--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/ef414f8dadd6437687871e4921feda0d%40nsa.gov.

Blumenthal, Uri - 0553 - MITLL

unread,
Oct 30, 2023, 10:46:39 PM10/30/23
to pqc-...@list.nist.gov
Interesting points...

First, one caveat: I'm a software person, and don't know enough of hardware design to even be dangerous there. So, my comments are limited to software implementations, which should cover a significant part of the expected implementations and deployments of ML-KEM and ML-DSA.

The "SHA-2 vs. SHA-3" controversy has been around for a long time. I personally think it's been handled badly. What was the point of creating a new standard without stating when and where it's going to be required? But I digress.

In software, SHA-3 is considerably slower than SHA-2. Plus, in software it's normal to expect reuse of the code - so if there's a FIPS-certified SHA-2 implementation, it would save both space and extra certification time if all the components requiring crypto hash-function could access that one module. With SHAKE, the performance difference is not that drastic - but SHAKE still lags behind. Plus, "big" CPUs don't have SHA-3 acceleration yet, and it is unclear when the "smaller" ones would get it, if at all.

From what I see, we barely managed to move the "big" PKI to SHA-2 from SHA-1 (mostly - because there's still enough of SHA-1 based certs floating around). IMHO, he likelihood of this behemoth to move to SHA-3 in foreseeable future is between zero and nil. Thus, I expect that any crypto-related "thing" (application, protocol, etc.) will have to include SHA-2, especially if it uses PKI in any form or shape. And not because CNSA requires it, but because there's no practical alternative. So, basing KEM and DS on SHA-3 instead of SHA-2 primitives would mean implementing and certifying two hash constructs instead of one.

I, for one, would prefer to see fewer "varieties" in standards. If possible - just one hash-function (whatever that one may be). One symmetric cipher. Hopefully one KEM. I'd like to say "and one DS", but I know - this isn't likely to fly: hash-based signature for software updates, dynamic lattice-based for signing certs and such, maybe something else for something else).

Thanks!
--
V/R,
Uri

There are two ways to design a system. One is to make it so simple there are obviously no deficiencies.
The other is to make it so complex there are no obvious deficiencies.
- C. A. R. Hoare




On 10/30/23, 14:17, "'Stern, Morgan B' via pqc-forum" <pqc-...@list.nist.gov <mailto:pqc-...@list.nist.gov>> wrote:


!-------------------------------------------------------------------|
This Message Is From an External Sender
This message came from outside the Laboratory.
|-------------------------------------------------------------------!
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov <mailto:pqc-forum+...@list.nist.gov>.
To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/ef414f8dadd6437687871e4921feda0d%40nsa.gov <https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/ef414f8dadd6437687871e4921feda0d%40nsa.gov>.


Markku-Juhani O. Saarinen

unread,
Oct 30, 2023, 11:30:39 PM10/30/23
to pqc-forum, Mike Hamburg, pqc-...@list.nist.gov, Stern, Morgan B
Hi,

I'd like to +1 on pretty much everything Mike said. In summary, NSA's proposals would slow down the adoption of ML-DSA and ML-KEM by years, or possibly completely prohibit it, because:

1. The proposed algorithmic changes (e.g. to use SHA512_DRBG as an XOF) would make the algorithms slower by a very significant factor.
2. There are cryptanalytic security issues with the proposal to abandon domain separation between different hash/XOF functions.
3. The resulting algorithms would be much more difficult to secure against side-channel attacks.

As it seems clear that NSA's proposal would make ML-KEM and ML-DSA both less efficient and less secure than the original Kyber and Dilithium,  the private sector (and organizations such as IETF) would be probably reluctant to accept them and will prefer to standardize and use Kyber and Dilithium Instead. This, in turn, would make it more difficult for the Federal government to source implementations of ML-KEM and ML-DSA.

The designers of Kyber and Dilithium chose SHA3/SHAKE as it was considered more secure than SHA2 in 2017, and it still is (by a large margin.) The submission documents clearly stated that the "90s" variants were there essentially as placeholders until SHA3 is more widely supported in hardware (we're getting there.) Because of this,  We have spent the last five years analyzing Kyber and Dilithium with SHAKE and zero years analyzing them with SHA2.

Opening up each of these issues:

1. The following was proposed while claiming that it would make implementations "faster in software".


"XOF(p,i,j) := sequential bits of SHA512_DRBG(p||i||j) as instantiated by Hash_DRBG_Generate_algorithm."

If you try this out, you will notice this makes implementations substantially slower. First of all, the "mode of operation" of SHA512_DRBG is not very efficient. SHA2 was never designed to output data efficiently, while SHAKE was. The output block size of SHAKE is the same as the input block size, but with SHA2 it's half of it.

Furthermore, the proposal contains a dangerous confusion between an RBG and an XOF, fundamentally different things. If I design a DRBG module -- a random number generator -- I put measures in place to prevent it from being used for entirely deterministic mode. There is certainly no API available for this purpose, and there should not be. Furthermore, using an internal function like "Hash_DRBG_Generate_algorithm"  for some other than its intended purpose has received no security analysis, and Mike is already observing security issues in it (as I am). The proposal or its "rationale" makes no technical or standardization sense, which will raise suspicions.

As for performance, SHA3 already works very well in vector architectures (often reading input data faster than SHA2, as Mike pointed out), and we are working to increase the software performance further in RISC-V (I work in that Task Group). I'm sure ARM, Apple, Intel, etc, are working on the same. From a technical viewpoint, the Keccak permutation may be large in its gate count, but the critical path of each round is very short, and each 24-round permutation can process a lot of data -- 1344 bits (SHAKE-128)  or 1088 bits (SHAKE-256). This enormous "block size" and low cycle count means that the maximum software speeds attainable by SHA3 are much higher than SHA2 in near-future microprocessors.


2. NSA also proposed the following, which replaces domain-separated FIPS 202 hashes with SHA2-512.


"H(S) := truncated (SHA-512(s), 256)
J(s) := truncated (SHA-512(s), 256)
G(C) := SHA-512(c)"

Recall that almost exactly the same processing occurs when computing SHA3-256(x) and SHAKE256(x) truncated to 256 bits; there is only a difference in a single padding byte. In Kyber and Dilithium, this is done for "domain separation" and is considered good cryptographic engineering practice. It basically prevents attackers from lifting a hash (or a preimage) of hash F to be used for some other purpose G. Generally speaking, if a protocol description contains multiple different hash functions, they must also be implemented as different hash functions -- this is assumed by the 
security proofs. The designers of Kyber and Dilithium didn't just arbitrarily use different letters in different places.


3. As someone who has worked on commercial secure hardware implementations of Kyber and Dilithium, I can confirm that SHA-2 would make the implementations much more difficult to secure against side-channel attacks. We need side-channel security, especially if Dilithium is to replace ECDSA and RSA in smart cards and platform security applications. I personally let out a big sigh of relief when the "90s" options were eliminated by NIST.

Because of its role in TLS 1.3, many of us have had to implement SHA2-based HMAC and HKDF in hardware in a secure fashion, and it is substantially harder than achieving the same side-channel security levels with Keccak. One of the biggest reasons for this is the mixing of Additive and Boolean operations in the SHA-2 round function, which makes the application of masking-type techniques much harder.


Best Regards,
- markku

Dr. Markku-Juhani O. Saarinen <mj...@iki.fi>
    To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/0790202C-AF58-4993-9F30-6C790107588E%40shiftleft.org.

Simon Hoerder

unread,
Oct 31, 2023, 5:23:57 AM10/31/23
to pqc-forum
Hi,

I‘m not convinced by the proposed changes. Mike and Markku already brought good arguments against them. +1 to them.

I‘d like to add that a lot of chip manufacturers are already hard at work prototyping ML-KEM / ML-DSA support to ensure that they can meet the ambitious migration timelines set by CNSA v2. Changing how the XOFs and Hash functions in ML-KEM / ML-DSA work will impact those efforts as they’re tightly integrated into the design of ML-KEM / ML-DSA. For every hour of development work hardware developers need to spend around 4 hours of verification work. Redoing internal interfaces in an existing design is not a fun activity; it should only happen for good reasons.

Wouldn’t it be easier to amend SP800-90A with the required SHAKE interfaces? XOF_DRBG (with a random seed from the TRNG) and XOF_DOMAIN_SEP (please find a better name) for usage within ML-KEM and ML-DSA?

Best,
Simon


On 31 Oct 2023, at 04:30, Markku-Juhani O. Saarinen <mjos....@gmail.com> wrote:

Hi,

John Mattsson

unread,
Oct 31, 2023, 6:09:29 AM10/31/23
to pqc-forum

Hi,

 

First, let me say that I think it is very good that NSA posts their suggestions on the list for public discussion. Otherwise, +1 to the comments from Mike and Markku. The suggestion seems problematic from a performance, domain separation, side-channel, and interface perspective. I think the suggestion would significantly delay adoption. To summarize, I don't think the NIST should change to SHA-2.

 

- As already pointed out by Mike and Markku I think the claim that "If the module is implemented in software, SHA2 is substantially faster than SHA3 or SHAKE." is incorrect. If more speed is wished for, I think TurboSHAKE is the way to go. But at this point in the standardization, I think changes would delay deployment and I don't want any delay, I want ML-KEM and ML-DSA standards and implementations as soon as possible.

 

- We definitely need side-channel security and more appropriate interfaces. I think the decision to use SHA-2, HMAC, and HKDF in TLS 1.3 was wrong and will haunt us in the future. Hash functions should be designed to provide indifferentiability from a random oracle. Looking at Section 10 of Draft FIPS 205 our initial thought internally was "do we really need to continue to support SHA-2" and "really hope we can get rid of SHA-2 soon". Doing anything secure with SHA-2 is complex. SHA-2 is not robust.

 

- NIST is in the process of updating FIPS 202. I think NIST should update FIPS 202 to discuss that the API does not have to be SHAKE(M,d) and can be “running” in both M and d. FIPS 202 states that "In other words, different procedures that produce the correct output for every input are permitted". I think the same thing should be said about APIs.

 

Cheers,

John Preuß Mattsson

 

Blumenthal, Uri - 0553 - MITLL

unread,
Oct 31, 2023, 9:52:43 AM10/31/23
to John Mattsson, pqc-forum

First, let me say that I think it is very good that NSA posts their suggestions on the list for public discussion. Otherwise, +1 to the comments from Mike and Markku. The suggestion seems problematic from a performance, domain separation, side-channel, and interface perspective. I think the suggestion would significantly delay adoption. To summarize, I don't think the NIST should change to SHA-2.

 

I’m not sure Domain Separation has to be a problem with SHA-2 based XOF. And I don’t think Hash_DRBG would/should be the way to employ SHA-2.

 

- As already pointed out by Mike and Markku I think the claim that "If the module is implemented in software, SHA2 is substantially faster than SHA3 or SHAKE." is incorrect.

 

We benchmarked SHA-2, SHA-3, and SHAKE. SHA-2 is absolutely faster than SHA-3, and a little faster than SHAKE. That’s a fact – if you disagree, run your own benchmarks.

 

 

If more speed is wished for, I think TurboSHAKE is the way to go.

 

AFAIK, TurboSHAKE is not a standard at this point, and It’s unclear if/when it would be. Nor is it SHAKE (nor SHA-3).

 

We aren’t comparing performance with TurboSHAKE, are we?

 

Leaving alone other points that I don’t want to debate, at least now.

 

TNX

Mike Hamburg

unread,
Oct 31, 2023, 10:37:57 AM10/31/23
to u...@ll.mit.edu, pqc-forum

On Oct 31, 2023, at 14:52, Blumenthal, Uri - 0553 - MITLL <u...@ll.mit.edu> wrote:

 

- As already pointed out by Mike and Markku I think the claim that "If the module is implemented in software, SHA2 is substantially faster than SHA3 or SHAKE." is incorrect.

 

We benchmarked SHA-2, SHA-3, and SHAKE. SHA-2 is absolutely faster than SHA-3, and a little faster than SHAKE. That’s a fact – if you disagree, run your own benchmarks.


Hi Uri,

Could you give some more information about this?  In particular, which SHA-2 and SHA-3 and SHAKE instances did you benchmark, on what hardware, with what usage pattern, and what were the results?  Did you measure speed per input byte, or per output byte, or as latency for short blocks?

My impression from the SUPERCOP data was that the relative speed depends quite a bit on these variables. Also the XOF use case in ML-KEM and ML-DSA may be somewhat different from other applications of hashing because it requires many bytes of output.

Thanks,
— Mike

John Mattsson

unread,
Oct 31, 2023, 11:15:08 AM10/31/23
to Blumenthal, Uri - 0553 - MITLL, pqc-forum

Hi Uri,

I am sure that the domain separation problems could be fixed
, but discussing and fixing such a basic security issue when the final standards are months away does not feel optimal.

Regarding performance, there are quite a lot of public benchmarks for x86 and ARM. I regret quoting NSA as their statement is both too general and not very relevant.
I think the important comparison here is SHAKE128 and SHA-512 (correct me if I am wrong). SHAKE128 is slightly slower than SHA-512 on x86, but often slightly faster on ARM. I have not seen any performance figures for RISC-V. When looking at future standards I think ARM is likely more important than x86. ARM seems to be the future of both cloud and laptops. RISC-V is already important for embedded devices and will soon likely be important in cell phones. I don’t think it is correct to state that SHA-512 is substantially faster than SHAKE128 is software. Also, the important thing is not the hash functions themselves but the performance of Kyber and Dilithium. I think NSA has a lot of implementation and performance testing to do before they can claim that the suggested change makes Kyber and Dilithium faster in software. Right now, I tend to trust Markku’s analysis that the change would make Kyber and Dilithium slower in software. In hardware SHAKE is much more efficient than SHA-2.

 

Cheers,
John Preuß Mattsson

 

Blumenthal, Uri - 0553 - MITLL

unread,
Oct 31, 2023, 11:27:51 AM10/31/23
to John Mattsson, pqc-forum
My numbers are from x86, most - via benchmarks included in Crypto++ package (which I co-maintain).

Do we want to use SHAKE128? In that case I think you’re right. If you recall, I said “SHA-512 is substantially faster than SHA-3, and a little faster than SHAKE”. 
BTW, do we want SHAKE128, or SHAKE256?

Not sure about RISC-V, never played with it. As a former Intel Research Labs employee, I do expect and hope that Intel platforms remain relevant.

Hash computations do take significant part of Kyber performance, but I think I see your point. 

Regards,
Uri

On Oct 31, 2023, at 11:15, John Mattsson <john.m...@ericsson.com> wrote:



D. J. Bernstein

unread,
Nov 17, 2023, 7:45:10 AM11/17/23
to pqc-...@list.nist.gov
Blumenthal, Uri - 0553 - MITLL writes:
> My numbers are from x86, most - via benchmarks included in Crypto++
> package (which I co-maintain).

There are many different microarchitectures for x86 CPUs, but let me
hypothesize that we're talking about Skylake or equivalent. SUPERCOP
reports in

https://bench.cr.yp.to/web-impl/amd64-samba-crypto_hash.html

that 1536-byte sha512 on Skylake takes 8891 cycles with code I wrote,
10517 cycles with openssl (version 3.0.2 on that box), and 14065 cycles
with cryptopp (8.5.0). These are median measurements; quartiles are
close (see https://bench.cr.yp.to/results-stream.html).

Implementations also vary for shake128: 12296 cycles with oncore64bits
vs. 14726 cycles with openssl. The next version of SUPERCOP will try
shake128 using cryptopp; I see 12385 cycles. I'm aware of better code,
10666 cycles.

The basic point Mike is making about these speeds is that, if you want
to predict the cost of generating N bytes of output, you should look at
hashing N bytes of input for SHAKE128, but you should look at hashing 2N
bytes of input for SHA-512, since each SHA-512 compression call takes
128 bytes of new input and produces just 64 bytes of output.

This turns a moderate SHA-512 win for hashing speed into a moderate
SHA-512 loss for output-generation speed, although this might be changed
by vectorization. There are other CPUs where SHAKE128 hashing speed is
closer to SHA-512 hashing speed to begin with, and then SHA-512 would
have even more trouble catching up in output-generation speed.

To put the above numbers in perspective, SUPERCOP says generating 1536
bytes takes 1885 cycles on the same machine with chacha20, 1191 cycles
with aes128ctr, and 905 cycles with chacha8:

https://bench.cr.yp.to/web-impl/amd64-samba-crypto_stream.html

AES has issues on CPUs without AES hardware, but ChaCha doesn't, and
ChaCha is a very simple function that would be easy to specify as part
of a KEM spec without reference to other specs.

To be clear, I continue to recommend terminating all consideration of
proposed Kyber changes after the final Kyber submission from 2020, and
in particular I recommend going back to the exact usage of SHAKE from
that submission even though that's obviously suboptimal in speed. The
cycle counts are almost unnoticeable next to communication costs: see

https://ntruprime.cr.yp.to/latticerisks-20211031.pdf#subsection.1.6.6

for quantification, or see the Kyber documentation saying "because the
basic operations comprising Kyber are extremely fast, we can afford to
pay a time penalty"---language that was already in the documentation
before various speedups to the Kyber software. More importantly, job #1
is security, and a basic procedural requirement of stability reduces
security risks; see my email dated 8 Nov 2023 14:58:29 +0100.

---D. J. Bernstein
signature.asc

Bobby McGee

unread,
Nov 17, 2023, 11:17:05 AM11/17/23
to pqc-forum, D. J. Bernstein, pqc-...@list.nist.gov
SHAKE is cool and all, but one downside is that it isn't parallelizable, whereas something like a cipher or hash in counter mode is.  If someone wants fast hardware, waiting around for PRNG that could be done immediately is a waste of time.  Generally, I'm of the opinion that anything that could be done in parallel should be have that option available.

There's also the annoying issue mentioned, namely that the use of SHAKE as a stream generator is not specified in FIPS 202.  Everyone "knows" how to use it as such, but it's somewhat silly that all the draft standards have to have some paragraph describing the situation instead of having it firmly in place in FIPS 202.

Markku is definitely right about Keccak being easier to mask (simple AND non-linearities as opposed to 32-bit adders in SHA-2), and SHAKE is definitely producing lots of secret information (e.g. s, r, e, e_1, e_2, etc. in Kyber), so side-channel protections for PRNG are a major concern.

Bas Westerbaan

unread,
Nov 17, 2023, 11:24:30 AM11/17/23
to Bobby McGee, pqc-forum, D. J. Bernstein
On Fri, Nov 17, 2023 at 5:17 PM Bobby McGee <janewayki...@gmail.com> wrote:
SHAKE is cool and all, but one downside is that it isn't parallelizable, whereas something like a cipher or hash in counter mode is.

Nothing prevents one from using SHAKE in a counter-like mode. And indeed, Kyber does just that for generation of A, so that those SHAKEs can be computed in parallel. 

Moody, Dustin (Fed)

unread,
Nov 17, 2023, 3:53:35 PM11/17/23
to pqc-forum, Stern, Morgan B

The public comment period for the draft FIPS is still open until November 22nd.  After that date, NIST will begin evaluating all the public feedback received, and decide what (if any) changes should be made in response to the comments.  

 

We did want to offer a short response to part of Morgan's 10/30/2023 post (https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/SPTpYEP7vRg/m/NK_ko8YtAQAJ) which started this thread.   His point S3) mentioned possibly considering a SHA2 primitive for use in FIPS 203 and FIPS 204.   Of course, we will wait and carefully weigh all the comments received by the deadline, but NIST currently has no plans to consider using a SHA2 primitive for FIPS 203 or FIPS 204 instead of the SHAKE/SHA3 primitives already in the draft FIPS.  We feel that would be a significant change from the 3rd round submissions, and want to minimize changes introduced.  


Dustin Moody

NIST PQC



From: 'Stern, Morgan B' via pqc-forum <pqc-...@list.nist.gov>
Sent: Monday, October 30, 2023 2:16 PM
To: pqc-forum <pqc-...@list.nist.gov>
Subject: [pqc-forum] Comments on FIPS 203/204
 

Simon Hoerder

unread,
Nov 17, 2023, 3:57:41 PM11/17/23
to pqc-forum
Hi Dustin, all,

that is good news. It will definitely help to keep the development of
post-quantum crypto products on track.

Best,
Simon

On 17/11/2023 21:53, 'Moody, Dustin (Fed)' via pqc-forum wrote:
> The public comment period for the draft FIPS is still open until
> November 22nd. After that date, NIST will begin evaluating all the
> public feedback received, and decide what (if any) changes should be
> made in response to the comments.
>
> We did want to offer a short response to part of Morgan's 10/30/2023
> post
> (https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/SPTpYEP7vRg/m/NK_ko8YtAQAJ <https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/SPTpYEP7vRg/m/NK_ko8YtAQAJ>) which started this thread.   His point S3) mentioned possibly considering a SHA2 primitive for use in FIPS 203 and FIPS 204. Of course, we will wait and carefully weigh all the comments received by the deadline, but NIST currently has no plans to consider using a SHA2 primitive for FIPS 203 or FIPS 204 instead of the SHAKE/SHA3 primitives already in the draft FIPS.  We feel that would be a significant change from the 3^rd  round submissions, and want to minimize changes introduced.
>
>
> Dustin Moody
>
> NIST PQC
>
>
> ------------------------------------------------------------------------
> *From:* 'Stern, Morgan B' via pqc-forum <pqc-...@list.nist.gov>
> *Sent:* Monday, October 30, 2023 2:16 PM
> *To:* pqc-forum <pqc-...@list.nist.gov>
> *Subject:* [pqc-forum] Comments on FIPS 203/204
> https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/ef414f8dadd6437687871e4921feda0d%40nsa.gov <https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/ef414f8dadd6437687871e4921feda0d%40nsa.gov>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "pqc-forum" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to pqc-forum+...@list.nist.gov
> <mailto:pqc-forum+...@list.nist.gov>.
> To view this discussion on the web visit
> https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/SA1PR09MB8669F681EF29D917C7D7D412E5B7A%40SA1PR09MB8669.namprd09.prod.outlook.com <https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/SA1PR09MB8669F681EF29D917C7D7D412E5B7A%40SA1PR09MB8669.namprd09.prod.outlook.com?utm_medium=email&utm_source=footer>.

D. J. Bernstein

unread,
Nov 19, 2023, 12:54:57 PM11/19/23
to pqc-...@list.nist.gov
NIST writes:
> We feel that would be a significant change from the 3rd round
> submissions, and want to minimize changes introduced.

NIST also previously wrote that one proposed change "would obviously be
occurring after the third round, which means it may not receive as much
public scrutiny and analysis".

Certainly moving targets are bad for security review. It's safest to
simply prohibit changes after the final submissions---this minimizes the
number of changes introduced---but in any case I'd expect NIST to impose
a high bar on the level of justification required for any such changes.

Why, then, do the 203/204 drafts have changes from the 3rd-round
versions of Kyber and Dilithium? Does NIST think there was a problem
with those versions? Did I miss a clear statement somewhere of NIST's
rationale for making these cryptosystem changes?

Draft 203 claims that hashing RNG output is unnecessary. This isn't
explaining why NIST thinks that removing the RNG hash is _desirable_.
The other changes seem to have even less explanation: there was a NIST
message dated 28 Apr 2023 05:50:13 -0700 saying NIST planned to change
the FO transform, but the message didn't say _why_ NIST was doing this.

To be clear, what I'm asking for here is transparency regarding NIST's
rationale for NIST's draft cryptosystem changes. I'm _not_ asking for
non-NIST restatements of non-NIST comments regarding potential changes.
Also, when I say "changes", I'm not referring to parameter restrictions
(e.g., requiring the session-key length to be specifically 256 bits).

---D. J. Bernstein
signature.asc

Loganaden Velvindron

unread,
Nov 19, 2023, 1:27:26 PM11/19/23
to Stern, Morgan B, pqc-...@list.nist.gov
Hi,

We (cyberstorm.mu) have been following the development of kyber. We haven't seen much interest in Sha-512 instead of keccak in the open source  reference implementation of kyber.



Is there a reason for this proposed change ?



Reply all
Reply to author
Forward
0 new messages