Pure vs. pre-hash signing for ML-DSA and SLH-DSA

2,607 views
Skip to first unread message

David A. Cooper

unread,
Jan 9, 2024, 2:03:08 PM1/9/24
to pqc-forum

Hello all,

Section 7.1 of Draft FIPS 204 and Section 9.4 of Draft FIPS 205 note that the message that is input to the signing function may be the hash of the content that is to be protected rather than the content itself. During the public comment period, NIST received comments noting that this could lead to an existential forgery, as an attacker could trick a verifier into believing that the hash of the content was the content itself. Comments received on the issue suggested a few ways to address this:

  • Specify in FIPS 204 and FIPS 205 both "pure" and "pre-hash" versions of the signature schemes with domain separation specified as part of the schemes.
  • Specify that the input to the signing function must be the hash of the content.
  • Do not change FIPS 204 or FIPS 205, and leave it up to applications to address the issue (e.g., by only using "pure" or "pre-hash" signatures, but not both, or by including domain separation information as part of the input to the signing function).

After some initial discussions, NIST is currently leaning towards specifying separate "pure" and "pre-hash" versions of the signature algorithms by adding in a mandatory domain separator, as was done in EdDSA, but placing the domain separator immediately before the message. So, if sign_sk(M) is slh_sign(M, sk) for SLH-DSA or ML-DSA.Sign(sk, M) for ML-DSA, then domain separation would be created by always signing M as sign_sk(domain separator || PH(M)), where PH() is either a hash function or the identity function.

One option for how this might work would be (using the syntax of RFC 8032):

  • "pure" signing:      sign_sk(octet(0) || octet(OLEN(C)) || C || M)
  • "pre-hash signing: sign_sk(octet(1) || octet(OLEN(C)) || C || OID of hash function H || H(M))

where C is a context string with a length between 0 and 255 octets.

As this construction simply changes the input M to the signing function, it could be applied to any signature scheme (FN-DSA, ML-DSA. SLH-DSA, selected on-ramp signature scheme) where it is determined that it may be useful to define both "pure" and "pre-hash" versions.

Any comments or suggestions that people may have on this proposed approach would be appreciated.

Thank you,

David Cooper
NIST PQC

Jim Goodman

unread,
Jan 9, 2024, 2:53:45 PM1/9/24
to David A. Cooper, pqc-forum

Hello David,

 

  Question for you:

 

If a given signature scheme is determined to only support one of

the two modes (not sure if “pre-hash” would be allowed on its

own…), then is it expected that the interface would still expect

the corresponding formatting w/domain separation to ensure a

consistent interface for all signature schemes?

 

  You mention using this format when defining both versions, but it

  wasn’t clear to me if it is also used when there is only a single

 version too.  I suspect that is the case to ensure consistency but

 just wanted to get a clear understanding.  Please let me know.

  Thanks!

 

  Take care.

 

Jim

--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/e8aad006-afbf-4c5b-9d49-75613f15387c%40nist.gov.

David A. Cooper

unread,
Jan 12, 2024, 12:29:42 PM1/12/24
to Jim Goodman, pqc-forum
Hello Jim,

Just speaking for myself, I believe that the answer would be no. So, for example, if the consensus was that both "pure" and "pre-hash" versions should be specified for SLH-DSA but not for ML-DSA, then the suggested formatting for domain separation would only apply to SLH-DSA.

Of course, if people want, we could apply it to all future signature schemes, even those that don't support both versions. However, there would still be signature schemes that do not use this formatting (LMS, XMSS, EdDSA, ECDSA, RSA), so there still wouldn't be consistency across all approved signature schemes.

David

Scott Fluhrer (sfluhrer)

unread,
Jan 12, 2024, 12:44:11 PM1/12/24
to David A. Cooper, Jim Goodman, pqc-forum

Actually, it’s better stated that the consistency wouldn’t be mandated.

 

If an implementation wanted to, it could implement this prepend on any of the listed signature schemes, and it would still be NIST approved (of course, interoperability is another issue…)

 

If we were to implement a composite scheme with pre-hash, such as RSA + Dilithium, we would quite plausibly implement this prepend for both signatures (and we might implement it externally to the RSA signer, while the Dilithium implementation might implement it internally)

Bas Westerbaan

unread,
Jan 15, 2024, 8:15:04 AM1/15/24
to David A. Cooper, pqc-forum
Hi everyone,

When I read these sections 7.1 and 9.4, I read them as helpful reminders in case the protocol where, say, ML-DSA is used, signs a hash of message instead of the message itself. From the discussions I see that instead this section is read to mean:

func CONFUSED-ML-DSA.Verify(pk, m, sigma):
   return ML-DSA.Verify(pk, m, sigma) or ML-DSA.Verify(pk, H(m), sigma)

The proposal now can be understood to polish this eyesore to:

func CONFUSED-ML-DSA.Verify(pk, m, sigma):
   return ML-DSA.VerifyPure(pk, m, sigma) or ML-DSA.VerifyPrehash(pk, H, H(m), sigma)

Surely this is not the actual proposal, but I'm afraid it will be used like that. Even if it's not, I am still very uncomfortable with having ML-DSA.VerifyPure and ML-DSA.VerifyPrehash. I do not think it is helpful to focus on that (for the curious [1]), but it's better to understand why this is on the table now.

I haven't followed these matters very closely, so I hope I have the facts correct. I understand that the root cause is that SLH-DSA doesn't allow streaming signing: to create a signature on a message M, we need to compute two hashes that contain M, where the second hash depends on the first hash — in essence: H( H(A || M) || B || M). Thus in an application where the message is streamed, it has to be kept in memory, which isn't always possible. An alternative is to prehash: sign H(M) instead of M.

However, this goes against the reason SLH-DSA hashes twice to begin with. It's to prevent SLH-DSA to require a hash that is resistant against collision style attacks. Signing H(M) instead of M negates that.

Note, by the way, that this all doesn't apply to ML-DSA: ML-DSA only hashes M once and thus allows streaming, because ML-DSA requires collision resistance from the hash for Fiat–Shamir to begin with.

Suppose we go forward with this, and we see deployment of SLH-DSA-Pure and SLH-DSA-Prehash. How do we explain to users what to choose? What happens if we find a collision attack on the hash? Do we expect users to know if they use pure or prehash SLH-DSA? If the proposal is actually CONFUSED-SLH-DSA (or similar), then all SLH-DSA verifiers have to upgrade to remove the prehash one. It's going to be very messy, a lot of work, and in the end we didn't gain anything substantial.

So, what are our other options?

We can change SLH-DSA to allow streaming messages. I'd say that is an unpalatable weakening of its security, as SLH-DSA is typically chosen by those that want to trade performance for conservative security. (Recall that ML-DSA does allow streaming!)

We can leave the stream signing issue to a higher layer. In many protocols, it's very common to sign a hash as the actual message. For problematic use cases, one could define SPHINS (note missing C) as SLH-DSA where H(M) is signed, accepting the reliance on collision resistance (hence the missing C). I actually think it should be possible to do better: retain collision resistance and have streaming signing. Any such scheme would need some time for review, so it's too late to include in SLH-DSA itself. 

I lean to the latter.

In any case, sections 7.1 and 9.4 should be reworded to make clear what is the case. My preference would be to remove them altogether so as not to suggest the CONFUSED interpretations.

Best,

 Bas

[1]

Adding two variants for each signature scheme increases the burden on implementers and users. For the implementor, there are more test vectors to check, code to write and review, and APIs to document. Then the user has to choose which one to pick: they'll ask chatgpt, and on stackoverflow. Probably the consensus will be that for most the pure one is best. Libraries will then probably not opt to support the prehashed one. Although it's not a fair comparison, it's notable that cSHAKE and Ed25519{Ctx,Ph} have very little, if any, real-world usage.

Although ostensibly the reason for the variants is being able to stream messages, when viewed on their own the use that comes to mind is domain separation for keypairs that are used in different contexts. Ideally one doesn't use keypairs in different contexts, but where one does, having one bit of information (are we signing a hash?) is inadequate. Forced to use the same keypair in multiple contexts, it's better to have a proper context string. Here, I would not suggest a separate ML-DSA-Ctx (for reasons mentioned before), but rather to add a context string to ML-DSA that defaults to the empty string. Although at this point, it's probably too late.

--

Falko Strenzke

unread,
Jan 17, 2024, 1:57:04 AM1/17/24
to Bas Westerbaan, David A. Cooper, pqc-forum

Hi Bas,

see my comments inline.

Am 15.01.24 um 14:14 schrieb 'Bas Westerbaan' via pqc-forum:
[...]


However, this goes against the reason SLH-DSA hashes twice to begin with. It's to prevent SLH-DSA to require a hash that is resistant against collision style attacks. Signing H(M) instead of M negates that.
Of course, that is the stated intention behind the pre-hash variant. It becomes necessary where the protocol only supports hash-and-sign or where streaming the potentially huge message to the signature-generating device is not conceivable.


Note, by the way, that this all doesn't apply to ML-DSA: ML-DSA only hashes M once and thus allows streaming, because ML-DSA requires collision resistance from the hash for Fiat–Shamir to begin with.

Suppose we go forward with this, and we see deployment of SLH-DSA-Pure and SLH-DSA-Prehash. How do we explain to users what to choose? What happens if we find a collision attack on the hash? Do we expect users to know if they use pure or prehash SLH-DSA? If the proposal is actually CONFUSED-SLH-DSA (or similar), then all SLH-DSA verifiers have to upgrade to remove the prehash one. It's going to be very messy, a lot of work, and in the end we didn't gain anything substantial.

So, what are our other options?

We can change SLH-DSA to allow streaming messages. I'd say that is an unpalatable weakening of its security, as SLH-DSA is typically chosen by those that want to trade performance for conservative security. (Recall that ML-DSA does allow streaming!)
I don't think it is that simple that a pre-hash variant of SLH-DSA is not conservative in terms of its security assumptions any more. I would say that relying on the collision resistance of a hash function still appears to most people as a much milder security assumption than any of the "asymmetric" problems underlying typical public key signatures. At least that argument was always used for the earlier versions of the hash-based schemes, before they were developed to the modern flavours that don't require collision resistance any more.


We can leave the stream signing issue to a higher layer. In many protocols, it's very common to sign a hash as the actual message. For problematic use cases, one could define SPHINS (note missing C) as SLH-DSA where H(M) is signed, accepting the reliance on collision resistance (hence the missing C). I actually think it should be possible to do better: retain collision resistance and have streaming signing. Any such scheme would need some time for review, so it's too late to include in SLH-DSA itself.

I don't see how is the proposal of a new scheme "SPHINS" is introducing less complexity than parametrizing SLH-DSA.

Surely, it is in principle possible to leave the disambiguation of the two variants to the protocol layer. But that requires protocol changes in protocols that don't already include meta information (namely the signature algorithm) in the message digest computation. This is the case for instance for CMS/X.509.


I lean to the latter.

If you mean by that modifying protocols so that they support the "full" SLH-DSA variant, I am with you. But that has to be done on a per protocol basis. What NIST is trying here with the parametrization is to provide a toolset to allow the disambiguation of the two variants without any protocol changes.

- Falko

To view this discussion on the web visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/CAMjbhoVdcvPKx2izXvHTKZRKPCuCW1K3jrn4T8%3DfvjVX2GvGBw%40mail.gmail.com.
--

MTG AG
Dr. Falko Strenzke
Executive System Architect

Phone: +49 6151 8000 24
E-Mail: falko.s...@mtg.de
Web: mtg.de


Follow us

MTG AG - Dolivostr. 11 - 64293 Darmstadt, Germany
Commercial register: HRB 8901
Register Court: Amtsgericht Darmstadt
Management Board: Jürgen Ruf (CEO), Tamer Kemeröz
Chairman of the Supervisory Board: Dr. Thomas Milde

This email may contain confidential and/or privileged information. If you are not the correct recipient or have received this email in error,
please inform the sender immediately and delete this email.Unauthorised copying or distribution of this email is not permitted.

Data protection information: Privacy policy

OpenPGP_0xD1AC7C9C72A60A61.asc
OpenPGP_signature.asc

Kampanakis, Panos

unread,
Jan 17, 2024, 9:58:17 AM1/17/24
to Falko Strenzke, Bas Westerbaan, David A. Cooper, pqc-forum

+1 on Bas’ arguments. The uses of a prehash PQ Sig will be very limited like with prehashEdDSA. In the end implementers will need to support both and never use the prehashPQSig.

Hi Falko,

 

> for CMS/X.509.

 

CMS and X.509 do not need prehash signatures or changes.
- RFC8410  did not use prehashEdDSA, draft-ietf-lamps-dilithium-certificates  does not need prehashDilithium either.

- For CMS, RFC 8419 defines a digest of the message by saying

    [...] In most situations, the CMS SignedData includes signed attributes, including the  message digest of the content. Since HashEdDSA offers no benefit when signed attributes are present, only PureEdDSA is used with the CMS.

 

Other than the HSM streaming PCKS#11 case, please state specific use-cases or protocols that will benefit from a prehash PQ Sig.

COSTA Graham

unread,
Jan 17, 2024, 10:11:04 AM1/17/24
to Kampanakis, Panos, Falko Strenzke, Bas Westerbaan, David A. Cooper, pqc-forum

THALES GROUP LIMITED DISTRIBUTION to email recipients

 

One example of where pre-hash is common is use-case like qualified signature and seal creation in the EU linked to eIDAS.  As mentioned below, an end-user or independent system outside the system hosting the signing key may locally hash the data to sign and then submit to a remote HSM for signature.

 

Security of the hash between creation and signature creation would be provided by other protocols and mechanisms.

 

I agree however that this kind of use case is in the minority of signatures produced but it does exist and is needed for practical reasons.

--

You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.

Harvey, Joseph

unread,
Jan 19, 2024, 10:47:44 AM1/19/24
to graham...@thalesgroup.com, kpa...@amazon.com, falko.s...@mtg.de, b...@cloudflare.com, david....@nist.gov, pqc-...@list.nist.gov

The proposed construction and the discussion so far have focused primarily on the objective of separating “pure hash” and “pre-hash” signatures.  While the construction appears sufficient for that purpose, there are a few additional aspects that may also be worth considering.  These points build on Verisign’s comments [1] on the draft FIPS 205.

 

1. Randomized hashing / target collision resistance

 

The security of pre-hashing in the proposed construction relies on conventional collision resistance:  an adversary who finds two messages M_1, M_2 such that H(M_1) = H(M_2), and gets the signer to sign one of them, also obtains a valid (forged) signature on the other.  SLH-DSA, in contrast,  includes a randomizer as an input to its message hashing operations.  As a result, SLH-DSA’s security is based instead on a target collision resistance-like property.  (Bas Westerbaan made a similar observation earlier in this thread.)

 

Conventional collision resistance has been a standard assumption of hash functions and signature schemes for a long time, including RSA and ECDSA in particular. SLH-DSA has arguably “raised the bar.” Indeed, by moving to a target-collision-resistance-like property, SLH-DSA has reduced the impact of future advances in collision attacks on the underlying hash function, as unlikely as such advances may be.  For this reason, it would be useful to specify a pre-hashing mode that follows the same trend.

 

Note that including a randomizer in the context string C doesn’t itself achieve randomized pre-hashing. C is an input to the underlying signature scheme, but not necessarily the pre-hashing operation.  Rather, the pre-hashing operation must be defined specifically to include a randomizer as an input, whether it is conveyed as part of C or elsewhere.

 

2. Multi-user security

 

Both ML-DSA and SLH-DSA include the signer’s public key (or its hash) as an input to their message hashing operations.  Along similar lines to the first remark, it would be useful to specify a pre-hashing mode that includes the signer’s public key (or its hash) as an input as well.  Doing so would again raise the bar, in this case to multi-user security, as an adversary would likely have to take into account the actual public keys of interest in an attack. In the current construction, a collision would potentially be usable against any public key, including future ones.

 

Including the public key or similar information in the context string C again won’t automatically achieve the security objective. The pre-hashing operation must be defined so that this additional information is a specific input.

 

3. Cryptographic separation from the underlying signature scheme

 

Finally, the security analysis of a pre-hashing construction may depend on how the hash function is used during pre-hashing relative to its use in the underlying signature scheme. SLH-DSA, while being based on a single hash function, takes great care to ensure that its various uses of that hash function involve distinct inputs.  Prepending the input to the hash function with an identifier and a context string during pre-hashing, as is done in the proposed construction, ensures that the inputs for the pure and pre-hashing modes are distinct, but it wouldn't directly ensure separation from other uses of the hash function within SLH-DSA.

 

It would be useful to have a more formal analysis of the security properties of the pre-hashing construction, particularly if it is updated to include additional inputs such as suggested in the previous two remarks.

 

Best Regards,

 

Joe Harvey

 

[1] J. Harvey et al. Comments on FIPS 205 (Draft): Stateless Hash-Based 

Signature Standard. Nov. 21, 2023.

https://googlegroups.com/a/list.nist.gov/group/pqc-forum/attach/ee505eb48f48/verisign-comments-fips-205-2023-11-21.pdf?part=0.1

 

 

 

 

From: 'COSTA Graham' via pqc-forum <pqc-...@list.nist.gov>


Reply-To: COSTA Graham <graham...@thalesgroup.com>
Date: Wednesday, January 17, 2024 at 10:11 AM
To: "Kampanakis, Panos" <kpa...@amazon.com>, Falko Strenzke <falko.s...@mtg.de>, Bas Westerbaan <b...@cloudflare.com>, "David A. Cooper" <david....@nist.gov>
Cc: pqc-forum <pqc-...@list.nist.gov>

Subject: [EXTERNAL] RE: [pqc-forum] Pure vs. pre-hash signing for ML-DSA and SLH-DSA

 

Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. 

David A. Cooper

unread,
Jan 23, 2024, 12:17:55 PM1/23/24
to Harvey, Joseph, pqc-...@list.nist.gov, graham...@thalesgroup.com, kpa...@amazon.com, falko.s...@mtg.de, b...@cloudflare.com
Hello Joe,

Speaking for myself, I have some comments on the points below.

For point 1, it is true that using a randomized hash may avoid the need to rely on the collision resistance property of the hash function. However, that would depend on the implementation. In the case of SLH-DSA, the random value R is created by the signer (private key holder) in a way that cannot be predicted by someone who does not know the private key. (Section 10 notes that SLH-DSA does not protect against the private key holder finding collisions in Hmsg.) In the case of pre-hashing, performing a randomized pre-hash would only protect against collision attacks if the attacker could not choose the randomizer R' or know what R' would be before submitting the message M to be signed. As pre-hashing will commonly be performed in a different cryptographic module than the one that performs the signing, one would need to understand the specific use case and implementation in order to determine what value randomized pre-hashing provided.

For point 2, since the underlying signature schemes for ML-DSA and SLH-DSA would not be changed, the public key would still be included in the hash value over which the digital signature is computed. Thus, the properties that including the public key provides (see for example https://eprint.iacr.org/2020/1525.pdf) would still be there. While this would mean that an attacker who could compute collisions for the hash function used to compute the pre-hash could compute a single collision that would work against multiple signers, this is not a significant advantage. Including a prefix to the message to be hashed provides limited protection against collisions if the attacker can determine the value of the prefix before submitting the message to be signed (see for example https://www.win.tue.nl/hashclash/TargetCollidingCertificates/TargetCollidingCertificatesAnnouncementv1.1.pdf).

For point 3, I do not believe one needs to ensure that the message M being pre-hashed is distinct from the hash inputs used internal to SLH-DSA. Distinct prefixes are needed within SLH-DSA since the length of the outputs of the hashes are the same as the security level. So at level 1, the hash outputs are 128 bits and one is trying to achieve 128 bits of security. Without distinct prefixes, if there were 2^m hashes, then an attacker could find a prefix with 2^{128-m} work. Since m is relatively large, this would be a significant advantage. In the case of pre-hashing, the output of the pre-hash is at least 256 bits for the 128 bit security level, so even if 2^64 messages were created, a multi-target preimage attack would require at least 2^192 work.

If, for some application, there was some overlap between the legitimate inputs to the pre-hash and the inputs to the internal hash functions, I don't think this would help the attacker. While it is true that the attacker might guess a preimage and test it against both one of the internal hashes and one or more of the inputs to the pre-hash, the odds of success against the pre-hash (at most 2^-192) would be negligible compared to the odds of success against the internal hash (2^-128), and so the overall security level would not be reduced.

Of course, if one chose to perform a randomized pre-hashing that only produced 128 bits of output, then there would be a risk of a multitarget attack. It is not clear, though, whether a lack of distinction of the inputs to the pre-hash from those used internally to SLH-DSA would make a significant difference in the risk.

Thanks,

David

Harvey, Joseph

unread,
Jan 25, 2024, 11:20:05 AM1/25/24
to david....@nist.gov, pqc-...@list.nist.gov, graham...@thalesgroup.com, kpa...@amazon.com, falko.s...@mtg.de, b...@cloudflare.com

Hi David,

 

Thanks for the careful review and detailed responses to the comments I provided.  After reviewing with the research team at Verisign, we would like to offer a few additional observations.

 

  1. You make a good point that the cryptographic module considerations for the implementation of pre-hashing step would likely be different than those for the underlying signature scheme.  This is one of the reasons that we think it would be helpful to have guidance on how to randomize during pre-hashing.

 

Our interest in having an optional specification for randomized pre-hashing comes from the perspective of overall resilience.  If the collision resistance property of the hash function holds up over the long term, then it shouldn’t matter whether a randomizer is included in the pre-hashing step (assuming the hash output is full size, e.g., 256 bits).  But if the collision resistance property doesn’t hold up for some reason, then the inclusion of the randomizer may well make the difference between an attack succeeding or failing.  Applications could choose to include a randomizer in their own way as a precaution (e.g., as certification authorities have done by adding entropy to serial numbers), but it would be better if there were a more “standardized” way to do so.

 

  1. The “BUFFing signature schemes” paper provides good coverage of additional security properties against attacks both by the signer and by other parties.  To protect against these attacks, the constructions in the paper do include the public key (or its hash) in the input to the hash functions within the signature schemes, as ML-DSA and SLH-DSA also do.  However, the paper doesn’t appear to address the case where pre-hashing is involved.  It also assumes that the collision-resistance property holds up.  If the collision-resistance property doesn’t hold up (and randomization isn’t used), then collisions obtained by an external adversary could potentially be used against _any signer who does pre-hashing, because the public key would only be involved in hashing _after the pre-hashing operation is complete.

 

  1. Our concern here is more a technical one than an actual attack.  If the hash function is also used for pre-hashing, then the security analysis would have to consider the possibility that the same query value could be presented both as an input to a hash function during the pre-hashing step, and as an input during an internal operation within the signature scheme.  This shouldn’t have a significant practical impact on security but might make it harder to treat the pre-hashing step and the signature scheme as separate building blocks during the security analysis.  This is the reason we recommended specifying the pre-hashing operation with cryptographic separation built in.

 

Thanks,

 

                Joe

Reply all
Reply to author
Forward
0 new messages