FIPS 206 Status Update

2,114 views
Skip to first unread message

John Mattsson

unread,
Oct 13, 2025, 11:05:12 AMOct 13
to pqc-forum

Hi,

At the NIST 6th PQC Standardization Conference, Ray Perlner
from NIST presented “FIPS 206 Status Update” and asked for comments:


 

https://csrc.nist.gov/csrc/media/presentations/2025/fips-206-fn-dsa-(falcon)/images-media/fips_206-perlner_2.1.pdf

We discussed this internally and have the following comments:

- "Separate pure/prehash versions" 

Does this include both External-μ and HashFN-DSA? We would like to see External-μ (and would prefer to not have HashFN-DSA, but do not object. (FN-DSA mandates a SHAKE implementation anyway).
https://keymaterial.net/2024/11/05/hashml-dsa-considered-harmful/ 

- "Only allows randomized signing" 

Is this randomized as in randomized ECDSA that was used in PS3 software signing. Or is it Hedged like in ML-DSA? We hope it is hedged. 

- "Includes BUFF transform"

That seems good. Have NIST also incorporated the suggestions in [1]? Our understanding is that these are needed to prove EUF-CMA security. We really like that of the five last signatures NIST have standardized, EdDSA and ML-DSA has SUF-CMA proofs and LMS, XMSS, and SLH-DSA are widely believed to be SUF-CMA. If possible, we prefer if NIST do not standardize anything that is not believed to be SUF-CMA. We hope ECDSA was the last signature with trivial attacks on SUF-CMA security to ever be standardized.
[1] https://eprint.iacr.org/2024/1769.pdf

- "Should we add support for key/message-recovery mode?"

Hard to answer without more details such as how many bytes could be saved and what the security properties would be. We found Elliptic Curve Qu-Vanstone Implicit Certificate Scheme (ECQV) interesting and would likely have investigated using them for constrained IoT if it was not for the PQC migration. Looking at Falcon, pk seems to be bigger that the signature, so we assume key recovery would come with significant changes to the sig format or that only parts of the key is recovered.

We would love to see a more detailed information on this proposal. If FN-DSA signatures could recover several hundred of bytes of information that would be very interesting and would ease the pain of PQC migration for systems with constrained radio.

- "Should we add support for a fixed-point version of signing?"

We would say yes. Reading "Do Not Disturb a Sleeping Falcon" does not inspire trust in using floating-point arithmetic at all. Falcon in a certified cryptographic module is likely fine, but any use of Falcon with an external floating point implementation seems worrisome. Reading "Do Not Disturb a Sleeping Falcon" makes us think about whether FN-DSA should drop floating-point support...

Cheers,
John

 

John Mattsson

unread,
Oct 14, 2025, 8:34:05 AMOct 14
to pqc-forum

One additional comment is that as several European governments are recommending strength Category 3 or above, it would have been nice with a FN-DSA-768. Products wanting to follow these European recommendations will need to use FN-DSA-1024, which is not so small anymore.

 

Algorithm     Public Key (B)   Signature (B)   Total (B)

--------------------------------------------------------

Falcon-512           897              666          1563

Falcon-1024         1793             1280          3073

ML-DSA-44           1312             2420          3732

ML-DSA-65           1952             3309          5261

--------------------------------------------------------

 

Cheers,

John

Phillip Gajland

unread,
Oct 14, 2025, 10:20:24 AMOct 14
to pqc-forum, John Mattsson, phillip...@rub.de
Hi John,

Regarding the SUF-CMA security of Falcon, in our original analysis, we were unable to prove SUF-CMA security under the standard ring SIS assumption because the norm in the SIS problem was too large.

However, in our new work (which we plan to update on ePrint in the coming days), we now prove the SUF-CMA security under a different assumption. Specifically, the UF-CMA security follows from the one-wayness of the preimage-sampleable trapdoor function, while SUF-CMA security is now based on the second-preimage resistance of this function. Initially, we had attempted to prove SUF-CMA security under the collision resistance.

The assumption we now rely on is a second preimage version of a multi-target inhomogeneous (ring) SIS problem. In this problem, an adversary is given t ISIS targets along with corresponding short preimages. The adversary must then output a second (different) short preimage for one of the targets. Interestingly, we show that this assumption is in fact equivalent to the SUF-CMA security. Specifically, an attack on this assumption would directly imply an attack on the SUF-CMA security of Falcon.

Assuming that this problem is as hard as the standard SIS problem, we show that Falcon-512 provides 113 bits of (S)UF-CMA security, while Falcon-1024 provides 256 bits. By reducing the number of allowed signing queries from 2^64 to 2^58, we can increase the bit security of Falcon-512 to 119 bits.

Regarding the changes introduced in our original analysis, we believe NIST has decided to include them in the upcoming standard. Specifically, the salt will be sampled outside the repeat loop (instead of inside, as in the original specification), and the public key will also be hashed.

Falcon is less flexible than Kyber in terms of its parameter sets because the ring dimension needs to be a power of two. Therefore, Falcon-768 is not a valid parameter set, as 768 is not a power of two. In response to: https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/1HXzjlMUU6Y/m/KNSPqC02BgAJ.

Best regards,

Phillip

Quynh Dang

unread,
Oct 14, 2025, 1:28:32 PMOct 14
to Phillip Gajland, pqc-forum, John Mattsson, phillip...@rub.de
Hi Phillip,

A wonderful work!  Thank you. 

On Tue, Oct 14, 2025 at 10:20 AM Phillip Gajland <gaph...@gmail.com> wrote:
Hi John,

Regarding the SUF-CMA security of Falcon, in our original analysis, we were unable to prove SUF-CMA security under the standard ring SIS assumption because the norm in the SIS problem was too large.

However, in our new work (which we plan to update on ePrint in the coming days), we now prove the SUF-CMA security under a different assumption. Specifically, the UF-CMA security follows from the one-wayness of the preimage-sampleable trapdoor function, while SUF-CMA security is now based on the second-preimage resistance of this function. Initially, we had attempted to prove SUF-CMA security under the collision resistance.

The assumption we now rely on is a second preimage version of a multi-target inhomogeneous (ring) SIS problem. In this problem, an adversary is given t ISIS targets along with corresponding short preimages. The adversary must then output a second (different) short preimage for one of the targets. Interestingly, we show that this assumption is in fact equivalent to the SUF-CMA security. Specifically, an attack on this assumption would directly imply an attack on the SUF-CMA security of Falcon.

Assuming that this problem is as hard as the standard SIS problem, we show that Falcon-512 provides 113 bits of (S)UF-CMA security, while Falcon-1024 provides 256 bits. By reducing the number of allowed signing queries from 2^64 to 2^58, we can increase the bit security of Falcon-512 to 119 bits.

What are the bits of security of Falcon-512 when the numbers of sigs per key are 2^50, 2^53 and 2^55 ?

Regards,
Quynh. 
 
--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/7fd9d672-1f76-46bb-b732-08b62319ee7cn%40list.nist.gov.

Phillip Gajland

unread,
Oct 14, 2025, 4:31:58 PMOct 14
to pqc-forum, Quynh Dang, pqc-forum, John Mattsson, phillip...@rub.de, Phillip Gajland
Hi Quynh,

Thanks for your interest.

Unfortunately, further reducing the number of signing queries does not improve the bit security.

For the UF-CMA security of Falcon-512, assuming the starting bit security of the multi-target ISIS instance is 120, then 7 bits are lost due to losses from Rényi divergence arguments. These arguments are particularly sensitive to the number of signing queries. Setting the number of signing queries from 2^64 to 2^58 allows the loss in each of the two required Rényi arguments to be reduced to half a bit, resulting in a total loss of approximately one bit. The SUF-CMA security bound is dominated by the UF-CMA term, so the bit security of Falcon-512 remains effectively the same.

Regards,

Phillip

John Mattsson

unread,
Oct 15, 2025, 1:09:10 AMOct 15
to Phillip Gajland, pqc-forum, Quynh Dang, pqc-forum, phillip...@rub.de, Phillip Gajland

Hi Philip,

>
we show that Falcon-512 provides 113 bits of (S)UF-CMA security, while Falcon-1024 provides 256 bits.


That is great news. Looking forward to reading your paper. Thanks for sharing!

Cheers,

John

John Mattsson

unread,
Oct 15, 2025, 8:50:01 AMOct 15
to pqc-forum

Hi,

 

Regarding key and message recovery, I looked at the following references:

 

[1] https://falcon-sign.info/falcon.pdf

[2] https://csrc.nist.gov/CSRC/media/Presentations/Falcon/images-media/Falcon-April2018.pdf

 

In [1] (2020) it is stated about message recovery: "It makes the signature twice longer, but allows to entirely recover a message which size is slightly less than half the size of the original signature". This seems to indicate no actual reduction in byte size. Is this interpretation correct?

 

In [2] (2018) it is specified that a message up n * log q bits can be recovered. I assume this is log2, so ≈ 869 bytes for Falcon-512 and ≈ 1739 bytes for Falcon-1024. The Falcon-1024 parameters from [2] are:

 

+-------+-----------+------------------+--------------+

| Mode  | Classical | Message-recovery | Key-recovery |

+-------+-----------+------------------+--------------+

| |pk|  |      1793 |             1793 |           40 |

| |sig| |      1233 |             2466 |         2466 |

+-------+-----------+------------------+--------------+

 

Assuming [2] is correct, my understanding is that the number of bytes “saved” with message recovery is ≈ min(|m| - 1233, 506) for Falcon-512 and min(|m| - 1233, 506) for Falcon-1024. To avoid increasing overhead for small messages, message recovery should only be used for messages larger than 666/1233 bytes. For messages larger than ≈ 869/1739 bytes, partial message recovery would be necessary. And as of the message is often required to select the correct public key, partial recovery may be needed in such use cases as well.

 

For Falcon-1024, key recovery would save 520 bytes when both the public key and signature are transmitted together. For Falcon-512, the savings would be ≈ (897 - 666 - 40) = 191 bytes. In constrained applications where size is critical, the public key can often be cached, but saving 191 or 520 bytes per certificate could still be significant in protocols using certificate chains by value.

 

I don't know how message or key recovery affects security proofs. That the term "key recovery" is used both for attacks and a mode is not optimal.

Cheers,

John

 

John Mattsson

unread,
Oct 15, 2025, 8:54:34 AMOct 15
to pqc-forum

OLD: ≈ min(|m| - 1233, 506) for Falcon-512

NEW: ≈ min(|m| - 666, 203) for Falcon-512

 

Vadim Lyubashevsky

unread,
Oct 15, 2025, 10:12:19 AMOct 15
to John Mattsson, pqc-forum
Hi John, all,

On Wed, Oct 15, 2025 at 2:50 PM 'John Mattsson' via pqc-forum <pqc-...@list.nist.gov> wrote:

Hi,

 

Regarding key and message recovery, I looked at the following references:

 

[1] https://falcon-sign.info/falcon.pdf

[2] https://csrc.nist.gov/CSRC/media/Presentations/Falcon/images-media/Falcon-April2018.pdf

 

In [1] (2020) it is stated about message recovery: "It makes the signature twice longer, but allows to entirely recover a message which size is slightly less than half the size of the original signature". This seems to indicate no actual reduction in byte size. Is this interpretation correct?


Sorry, the statement in the Falcon document doesn't make much sense to me right now.  It should say something like, "it makes the signature a little less than twice longer, but allows to entirely recover a message whose size is slightly less than the public key". Working out some numbers, when counting the signature and the message, one could save up to around 210 bytes when using Falcon-512 and 460 bytes when using Falcon-1024.  For key recovery, 32 bytes more can be saved when sending the longer signature instead of the public key.  

Best,
Vadim

  


 

Vadim Lyubashevsky

unread,
Oct 16, 2025, 4:03:25 AMOct 16
to John Mattsson, pqc-forum

Hi,

A small clarification to what I said:

 

In [1] (2020) it is stated about message recovery: "It makes the signature twice longer, but allows to entirely recover a message which size is slightly less than half the size of the original signature". This seems to indicate no actual reduction in byte size. Is this interpretation correct?


Sorry, the statement in the Falcon document doesn't make much sense to me right now.  It should say something like, "it makes the signature a little less than twice longer, but allows to entirely recover a message whose size is slightly less than the public key".

When I wrote less than the "public key", I meant the optimal compressed size that the public key could be, which is n*log q bits.  So the savings using message recovery for Falcon-512 is around (512*log12289)/8 - (666-40) - 32 ~ 210 bytes   

Best,
Vadim

John Mattsson

unread,
Oct 16, 2025, 4:27:51 AMOct 16
to Vadim Lyubashevsky, pqc-forum

Hi,

 

- Seems to me like the "key recovery" mode as specified in Section 3.12 of the Falcon specification can be implemented as a normal KeyGen(), Sign(), Verify() API, with no changes to the application, and that the code needed on top of the normal variant would be quite small.

 

+------------ ---+----------------+---------------+-----------+

| Variant        | Public Key (B) | Signature (B) | Total (B) |

+----------------+----------------+---------------+-----------+

| FN-DSA-512     |            897 |           666 |      1563 |

| FN-DSA-512-KR  |             32 |          1292 |      1324 |

| FN-DSA-1024    |           1793 |          1280 |      3073 |

| FN-DSA-1024-KR |             64 |          2520 |      2584 |

+----------------+----------------+---------------+-----------+

 

For use cases that always send public keys and signatures together (such as current certificate chains in TLS and IPsec) this alternative KR version lowers overhead (I don't know how it affects security and performance).

 

- Actually recovering the key h would require a new Recover() API. And to benefit you would have to use the key with normal FN-DSA-512. Would it be ok to use the same key H(h)/h with both FN-DSA-512-KR and FN-DSA-512?

 

- Vadim Lyubashevsky wrote:

>Working out some numbers, when counting the signature and the message, one could save up to around 210 bytes when using Falcon-512 and 460 bytes when using Falcon-1024.

 

The message recovery mode seems significantly more complex in practice, you would need several new APIs and significant changes to the application, and unless message recovery is supported by other algorithms, you loose crypto agility. If you have variable length messages you would need to support no message recovery, full message recovery, and partial message recovery. Would it be ok to use the same key h with and without message recovery?

 

Cheers,

John

Falko Strenzke

unread,
Oct 16, 2025, 8:56:05 AMOct 16
to John Mattsson, pqc-forum
Am 13.10.25 um 17:04 schrieb 'John Mattsson' via pqc-forum:

- "Separate pure/prehash versions" 

Does this include both External-μ and HashFN-DSA? We would like to see External-μ (and would prefer to not have HashFN-DSA, but do not object. (FN-DSA mandates a SHAKE implementation anyway).

The way Falcon is defined requires a pre-hash variant for memory-efficient signature verification for all protocols where the message precedes the signature. This is due to the first step in Falcon's signature verification being

c ← HashToPoint(r∥m, q, n)

This allows message consumption only after seeing the signature. Large messages will have to be completely buffered in pure mode for probably all existing protocols. In LAMPS at least for SLH-DSA, which exhibits the same bottleneck, this was seen as a reason to specify the pre-hash variants as well. 

External-µ does not solve this problem in all cases. It only means that the full message may not have to be send to the HSM. Memory-exhaustion of the platform receiving the message is not addressed.

Falko

--

MTG AG
Dr. Falko Strenzke

Phone: +49 6151 8000 24
E-Mail: falko.s...@mtg.de
Web: mtg.de


MTG AG - Dolivostr. 11 - 64293 Darmstadt, Germany
Commercial register: HRB 8901
Register Court: Amtsgericht Darmstadt
Management Board: Jürgen Ruf (CEO), Tamer Kemeröz
Chairman of the Supervisory Board: Dr. Thomas Milde

This email may contain confidential and/or privileged information. If you are not the correct recipient or have received this email in error,
please inform the sender immediately and delete this email.Unauthorised copying or distribution of this email is not permitted.

Data protection information: Privacy policy

Vadim Lyubashevsky

unread,
Oct 22, 2025, 4:00:54 AMOct 22
to John Mattsson, pqc-forum
Yes, you can use normal Falcon public keys in key recovery and message recovery modes.  You can also choose to sometimes use them in these modes and sometimes not.  

I think that the key recovery mode is also useful because it is the Falcon mode that gives the shortest signature + public key combination.  Also, in scenarios where one only wants to store a short public key (e.g. Bitcoin), the generic transformation for any signature scheme is to define the new pk',sig' as pk'=H(pk), sig'=(pk,sig).  In the case of Falcon, this generic construction is sub-optimal and one should define the signature as in the key-recovery mode.  

Indeed, the message-recovery mode would require more care to define (i.e. make sure that the part of the message that is recovered and not recovered are hashed properly), but it is also arguably an even more universally-applicable mode. 

I believe that these are simple-enough wrappers around Falcon and should be standardized along with it.  The bigger danger is that people will want to use these modes, which, in the absence of a standard, may result in a lot of errors.      

Best,
Vadim

Perlner, Ray A. (Fed)

unread,
Nov 21, 2025, 11:09:58 AMNov 21
to Vadim Lyubashevsky, John Mattsson, pqc-forum

Hi all,

 

Thanks for the discussion. We would like to offer the following questions and updates.

 

  1. We welcome feedback on John Mattsson’s suggestion that hedged signing be preferred over plain randomized signing. Based on the input received, we will update the FIPS 206 Initial Public Draft accordingly.
  2. While we do intend to seriously consider other changes (e.g. allowing one or both of key-recovery and message-recovery mode, defining a fixed-point version of the FN-DSA signing algorithm,) we don’t expect these to be reflected in the initial public draft. Rather, we will wait until after the public comment period before deciding whether to do any of them. In particular, it’s worth noting that a fixed-point version of FN-DSA would need additional input from the Falcon team and community feedback, as analysis will need to be done regarding the required precision of fixed-point arithmetic for signatures to have the proper distribution, and we’ll  need a reference implementation of the fixed-point version of FN-DSA to test against. If we do allow fixed-point implementation, we’d put domain separation in the private pseudorandom sampling to prevent fixed-point and floating point implementations of FN-DSA signing from using the same randomization values.
  3. In response to John’s other questions, we can confirm that External-μ, HashFN-DSA, and the changes suggested by https://eprint.iacr.org/2024/1769 will be included in the FIPS 206 IPD.
  4. Thanks Phillip Gajland for your updated analysis on SUF-CMA security for Falcon. And yes, the salt will be sampled outside the repeat loop (instead of inside, as in the original specification). We also plan to update the explanatory text in our draft to reflect your improved SUF-CMA analysis.

 

Ray Perlner (On behalf of the NIST FIPS 206 authors)

Kris Kwiatkowski

unread,
Nov 21, 2025, 11:36:57 AMNov 21
to pqc-...@list.nist.gov

Dear Ray,

These changes look promising. Could you provide an expected release date for the IPD version of FIPS-206? Do you expect it to be published within this year?

Kind regards,
Kris

Thomas Pornin

unread,
Nov 23, 2025, 3:16:06 PM (13 days ago) Nov 23
to pqc-forum, Perlner, Ray A. (Fed)
Hello,

for "hedging", let me suggest the following method:
For a source rng_seed as obtained from an approved RNG (40 bytes), private key (f, g, etc), and message representative mu, compute the derived seed (40 bytes) as:

derived_seed = SHAKE256(SHAKE256(f || g)[40] || mu || rng_seed)[40]

where SHAKE256(f || g)[40] is the SHAKE256 hash of the concatenation of the f and g polynomials _as encoded in the private key_ (i.e. starting at the second byte of the private key), with an output of 40 bytes. Reasons for this exact method are:

- Using an intermediate hash of f and g means that implementations with an API that separates key decoding (into some in-memory structure) from actual signing only have to keep that hash value around instead of the complete encoded private key (i.e. this saves some RAM).
- Using the encoded f and g (with 5 or 6 bits per coefficients) instead of the usual in-memory representation of f and g (one byte per coefficient) reduces the cost of that SHAKE256.
- There is no need to hash the F part, hashing f and g is enough to capture all the entropy contained in the private key (again, not hashing F reduces the cost of the hashing).
- Using the message representative mu so that the total derivation cost remains low (mu has a fixed size, even if the source message is large).
- 40 bytes per hash value since seeds are 40 bytes throughout.
- This method is what I implemented yesterday in my own code and it would be convenient to me, personally, that it matches exactly what gets standardized.

Thomas

Thomas Pornin

unread,
Nov 23, 2025, 4:00:31 PM (13 days ago) Nov 23
to pqc-forum, Phillip Gajland, John Mattsson
Regarding degree 768: it is actually possible to define a Falcon variant with that degree. We played with that option before the submission to round 1; you can work with the cyclotomic polynomial X^768 - X^384 + 1. FFT and NTT still work, with one of the steps being modified (at some point you have to divide the degree by 3 instead of by 2).  However, there are some extra complications, namely because the FFT is no longer an orthogonal transform when working with this modulus; in rough terms, applying the FFT transforms a European football ("soccer") into an American football (~ a rugby ball), and for achieving the expected scheme security, the private key polynomials (f,g) must have the correct L-2 norm in the FFT domain, which means that you have to sample them in that domain too (so that brings back floating-point to the keygen).
From the implementation experiments: supporting degree 768 makes the code twice bigger (most functions must be duplicated to account for the multiple-of-3 degree), and the performance is somewhat disappointing (keygen and signing speed and RAM usage are much closer to those of Falcon-1024 than those of Falcon-512). The general opinion was "it's not worth it" which is why we did not keep it in the original submission.

Thomas

si...@hoerder.net

unread,
Nov 24, 2025, 3:09:58 AM (12 days ago) Nov 24
to Ray A. Perlner, Vadim Lyubashevsky, John Mattsson, pqc-forum
Hi,

I would prefer if all options that might be included in the final standard are spellen out in the IPD and can be reviewed. I understand that this is additional work and may delay the IPD but having the options reviewed and a clear understanding of what is under debate will reduce the chance of mistakes and mistaken usability.

Regarding the fixed-point version: I would very much appreciate that but as a separate standard. For HW designs, It’s not just an option, it’s a fundamental difference. I would rather the clarity that an implementation claiming “FIPS 206 compliance” does not have to add caveats for something as fundamental as supporting floating-point or fixed-point or both. If both options are specified within the same standard I foresee decades of confusing marketing claims and misunderstandings between HW vendors and software engineers and all-round dissatisfaction.

Best,
Simon 

On 21 Nov 2025, at 17:10, 'Perlner, Ray A. (Fed)' via pqc-forum <pqc-...@list.nist.gov> wrote:




Best,
Vadim

 

  

 

 

 

<image001.png>

 

<image002.png>

Scott Fluhrer (sfluhrer)

unread,
Nov 24, 2025, 9:32:40 AM (12 days ago) Nov 24
to si...@hoerder.net, Ray A. Perlner, Vadim Lyubashevsky, John Mattsson, pqc-forum
I would agree.

I would advocate that the FIPS 206 standard not include any nontrivial changes that were not anticipated by the IPD (it's ok that the IPD mention options not making the final; there shouldn't be anything added without public review).

As for the announced intention to provide two different ways to do key generation (fixed point and floating point), I would disagree - pick one and make that mandatory for all FIPS 206 implementations.  That would simplify things (and if it was deterministic, which it would be for fixed point, then it would permit the 'seed' version of the private key).


From: pqc-...@list.nist.gov <pqc-...@list.nist.gov> on behalf of si...@hoerder.net <si...@hoerder.net>
Sent: Monday, November 24, 2025 3:09 AM
To: Ray A. Perlner <ray.p...@nist.gov>
Cc: Vadim Lyubashevsky <vadim....@gmail.com>; John Mattsson <john.m...@ericsson.com>; pqc-forum <pqc-...@list.nist.gov>
Subject: Re: [EXTERNAL] Re: [pqc-forum] Re: FIPS 206 Status Update

John Mattsson

unread,
Nov 25, 2025, 5:55:50 AM (11 days ago) Nov 25
to Scott Fluhrer (sfluhrer), si...@hoerder.net, Ray A. Perlner, Vadim Lyubashevsky, pqc-forum
Thanks for the update, Ray — much appreciated.

- My understanding is that FN‑DSA is highly dependent on high-quality randomness. It is great that NIST made ML‑DSA hedged by default, and I hope you take a similar approach with FN‑DSA. I see no need for purely random signing. While key generation is already purely random, it occurs only once, and can be more carefully controlled to ensure high-security randomness.

Beyond accidental misuse (e.g., the PS3 software signing incident) or malfunctioning HRNGs, there is a significant risk of backdoored HRNGs. Alarmingly, some QRNG vendors make claims that their products are unbreakable and that the output can be used directly for cryptography without a CSPRNG. This is exactly the kind of statements one would expect from a hardware vendor secretly influenced by a SIGINT organization.

If NIST is not already doing so, I strongly believe you should recommend the use of multiple independent entropy sources in all cryptographic applications to mitigate both accidental and intentional weaknesses. It is impossible to verify the security of a black-box HRNG.

- Thanks Thomas for the background regarding degree 768. It seems like you made the right choice to not include a degree 768. (BTW, I recently stumbled upon your work on double-odd Jacobi quartics, which was very interesting to read).

- I agree with Simon and Scott that FIPS 206 should preferably not include major changes that were not in the IPD. I still think it would be beneficial to have multiple draft versions (not necessarily IPDs with formal comment periods) to gather more early feedback. That said, I think NIST typically produces excellent standards that provide high security and carefully balance different very difficult trade-offs.

Cheers,
John Preuß Mattsson

Phillip Gajland

unread,
Nov 25, 2025, 6:04:47 AM (11 days ago) Nov 25
to pqc-forum, Perlner, Ray A. (Fed), pqc-forum, Vadim Lyubashevsky, John Mattsson
Hi Ray,

Thanks for this update. I assume you meant "the salt will be sampled *inside* the repeat loop (instead of *outside*, as in the original specification)."

Regards,

Phillip

Hugo Vincent

unread,
Nov 25, 2025, 11:42:59 AM (11 days ago) Nov 25
to pqc-forum, Phillip Gajland, Perlner, Ray A. (Fed), pqc-forum, Vadim Lyubashevsky, John Mattsson, Thomas Speier
Hi Ray & all,

I would like to make a comment in favour of defining a fixed-point version of signing.

The Arm architecture provides a feature called data-independent timing (DIT) [1], which can be used by cryptographic software to control hardware timing behaviour, and is intended to be used to prevent (secret) data leakage through timing side channels. Informally, when enabled, the hardware ensures that the execution time of sequences made up from a defined list of instructions will exhibit execution time that is not a function of operand values. I can only speak for Arm but note that other popular architectures have a similar feature.

Currently, the list of instructions for DIT does not include floating point arithmetic, and therefore software using such instructions cannot be guaranteed* to be devoid of timing side channels across any valid Arm hardware. We have been considering the possibility of adding floating point arithmetic instructions to the DIT list, but (thus far at least) FALCON is the only workload which requires it. In addition, hardware designers are strongly averse to having to support it, both due to the complexity it introduces in a part of the hardware critical for performance, and because of the lack of other use-cases that benefit from it. As such (and at this point in time), we do not intend to amend the DIT specification in the Arm architecture to include FP arithmetic.

Consequently, we currently recommend that any FALCON/FN-DSA software targeting general Arm hardware that wants to perform signing and cares about timing side channels not use hardware floating point instructions (and instead use e.g emulated FP implemented to avoid timing side channels, or use a specific hardware platform known not to leak). 

We would ideally like to see the IPD drop floating point altogether in favour of a fixed-point version (to reduce optionality), but recognize this may not be realistic given the timing; failing that, we'd love to see a fixed-point version later (even if not in the IPD). While a fixed-point version hasn't been presented yet (nor data on how it performs), I personally believe it's likely many implementers will chose it over the floating point version given the general discomfort about floating point (as per earlier in this thread). (I believe any conceivable fixed-point version can be implemented safely and performantly using the current set of DIT instructions, but if that is somehow not the case, we'll look to extend the list).

Thanks,
Hugo Vincent
Arm

* While some Arm systems may not exhibit observable data-dependent timing for FP arithmetic (especially when taking care to preclude NaNs/etc), that is not guaranteed architecturally, and therefore future Arm hardware may leak even when running the same binary.

Thomas Prest

unread,
Nov 28, 2025, 5:10:58 AM (8 days ago) Nov 28
to pqc-...@list.nist.gov

Hi Ray, Hugo, and all,

I’d like to chime in on the floating-point (FPA) vs fixed-point (FxPA) discussion. My view is that the FN-DSA specification should be agnostic to the underlying numeric representation (FPA, FxPA, or otherwise). I understand providing guidelines on the minimal precision requires, but mandating a specific representation seems counterproductive. Here are the two main reasons:
  • Compliance clarity. The IEEE 754 standard is not enforced uniformly across architectures (e.g., differences in rounding modes, fused operations, extended precision, etc.). This fuzziness makes it impossible (IMO) to say when an implementation ceases to be compliant.
  • Implementation freedom/portability. Agnosticity to the underlying representation allows implementers to choose the representation that is best suited to their platform and threat model. For example, the paper https://eprint.iacr.org/2025/1991 uses triple-word FPA for 72 bits of total precision. There is also ongoing work on FxPA approaches. Mandating a specific representation may chill these implementation efforts.

On KATs differing between FPA and FxPA

I don't understand why FxPA and FPA should require different KATs (is it a matter of avoiding having two distinct pre-images for the same target?). In practice I think we should be able to pass both KATs with the same implementation (pure FxPA, pure FPA, or even something completely different), by adjusting only the randomness input/derivation, so that this FxPA vs FPA distinction would be artificial. I'm not 100% though since I haven't seen the draft that will be released.

Happy to elaborate or provide more concrete suggestions once the IPD text is available.

Best regards,
Thomas Prest

D. J. Bernstein

unread,
Nov 30, 2025, 9:22:33 AM (6 days ago) Nov 30
to pqc-forum
Hugo Vincent writes:
> We have been considering the possibility of adding floating point
> arithmetic instructions to the DIT list, but (thus far at least) FALCON is
> the only workload which requires it.

Given the ready availability of software emulators for floating-point
arithmetic, obviously nothing requires floating-point hardware. For an
earlier version of Falcon, https://eprint.iacr.org/2019/893 presented
complete software using integer arithmetic.

On the other hand, some people might be bothered by that paper reporting
(e.g.) 40 million cycles for signing with integer instructions, compared
to <3 million cycles for signing with floating-point instructions.

Furthermore, floating-point instructions have already been shown to
provide excellent performance for central bignum operations that appear
in a wide range of cryptosystems (RSA, ECC, etc.), relying on the CPU
investment in fast floating-point arithmetic for a vastly wider range of
general applications. Carefully optimizing a cryptographic library for
many CPUs will end up with floating-point software for some CPUs,
_unless_ that's prohibited for security reasons.

For portable code, floating-point operations certainly incur a risk of
being run in environments where those operations take variable time, but
meanwhile compilers have been creating timing variations in basic
integer operations (see https://cr.yp.to/papers.html#cryptoint) for many
more environments.

It's understandable for hardware designers to be hesitant to commit to
constant-time handling of some floating-point features: denormals, for
example. I haven't checked whether denormals can appear in Falcon. They
definitely can't appear in the typical ways to use floating point for
bignum arithmetic.

I'm not suggesting that ARM should make any architectural guarantees of
constant-time behavior for floating-point instructions unless it's
serious about sticking to those guarantees. From a security perspective,
it's disturbing to see Section 2.3 of https://eprint.iacr.org/2025/759
tracing how ARM removed DIT guarantees from various instructions between
2021 and 2024. We need the guarantees nailed down.

---D. J. Bernstein
signature.asc

Thomas Pornin

unread,
Nov 30, 2025, 2:14:01 PM (6 days ago) Nov 30
to pqc-forum, D. J. Bernstein
The code got a bit better, for emulated floating-point it's down to about 9.8m cycles for Falcon-512, 21.3m for Falcon-1024 (on x86, Skylake-class) (for current code in C, see https://github.com/pornin/c-fn-dsa). It's certainly not fast by, say, elliptic curve standards, but that sill means more than 200 sig/s (for n=512, using a single core) so it's probably fast enough for many use cases. Small microcontrollers are more a bother, but I got Falcon-512 down to 17m cycles on an Arm Cortex-M4, again a bit on the slow side but not intolerably so (that's about 5 or 6 times slower than the best ML-DSA impl on the M4, I think?).

Falcon/FN-DSA has an advantage here, which is indeed that infinites, NaNs and denormals never occur (see https://eprint.iacr.org/2024/321). This helps a lot with performance for emulated floating-point. It also means that these operations a constant-time on a number of CPUs, even if in general the opcodes are not constant-time: for many CPUs, the core operation of a floating-point mul or add is a pipelined constant-time circuit, but special cases (infinites, denormals, NaNs...) trigger a slow path (typically a pipeline stall and switch to a microcoded implementation) (see https://doi.org/10.1145/3243734.3243766). Of course, no hardware vendor is providing any formal guarantee here.

Thomas

Perlner, Ray A. (Fed)

unread,
Dec 3, 2025, 2:44:16 PM (3 days ago) Dec 3
to Thomas Pornin, pqc-forum, D. J. Bernstein

 

Hi all,

 

Thanks for your feedback. In response:

 

  1. Kris Kwiatkowski (Nov. 21)  asks “Could you provide an expected release date for the IPD version of FIPS-206?” Sadly, no. The draft is still in clearance within NIST and the Dept. of Commerce. As we said before, we’re ready to send out a draft within a few days once it’s approved, but we don’t really know when that will be.

 

  1. Philip Gajland (Nov. 25) asks: “I assume you meant ‘the salt will be sampled *inside* the repeat loop (instead of *outside*, as in the original specification).’” Yes. That is correct.

 

  1. Thomas Pornin suggests a format for hedged signing (Nov. 23), (supplementing the true randomness sampled during signing with pseudorandomness seeded from a hash of f and g.) This way of doing hedged signing looks ok to us, although we could imagine other ways, (e.g., instead of using f and g, using pseudorandomness seeded with some extra bytes squeezed from SHAKE during key generation, like ML-DSA does.) Does anyone else on the forum have a preference regarding the format for hedged signing?

 

  1. Thomas Prest (Nov. 28) asks: “I don't understand why FxPA and FPA should require different KATs (is it a matter of avoiding having two distinct pre-images for the same target?”. This is required for validation. NIST (specifically CAVP) needs to be able to certify that the numerical format, as used in Gaussian sampling, provides enough precision to avoid private-key leakage from signing, and to verify that all pseudorandom sampling procedures are cryptographically strong. For reasons of cost-effectiveness, CAVPs procedure for doing this involves only black-box testing, not extensive code review, analysis, etc. With these limitations, we don’t see how we can get the guarantees we want through black-box testing alone, unless the signing procedure is specified to exactly match KAT values (for a given randomness input, message, and private key). Since fixed point and floating point will introduce different rounding errors, which may result in different signatures for the same inputs, we would need different KAT values for these two cases, and for any other numerical representation we choose to allow. As it stands, for signing, we’re only allowing floating point/ emulated floating point, because this is the format for which we have a concrete implementation, and which has been analyzed by the community.

 

(Note this is different from the case of key generation – since the exact distribution of private keys is far less sensitive than the exact distribution of signatures, we believe we can allow more flexibility to implementers and still get the guarantees we want from black-box testing. – Basically, we can enumerate all possible values for f, g that could come out of the specified rejection-sampling procedure, and then check that (f,g,F,G), h make a well-formed key-pair.)

              

  1. Simon Hoerder (Nov. 28) and several others advocated that we avoid including anything in the final version of the FIPS that is not included in the IPD. While we agree that it’s good to get community feedback on anything that might be included in the final version of the FIPS, we can’t promise to be this strict about enforcing this principle. For any significant changes we make between the IPD and the final version, we will do our best to get public feedback, through forum posts, public talks etc., but we don’t want to make major changes from our usual procedure (where the IPD represents our best guess about what the final FIPS will look like, but is subject to change in response to public feedback), or to do things that might delay the final publication even further than it already has been.

 

Best,

Ray Perlner

 

John Mattsson

unread,
Dec 4, 2025, 8:40:31 AM (2 days ago) Dec 4
to Perlner, Ray A. (Fed), Thomas Pornin, pqc-forum, D. J. Bernstein
Perlner, Ray A. (Fed) wrote:
>While we agree that it’s good to get community feedback on anything that might be included in the final version of the FIPS, we can’t promise to be this strict about enforcing this principle. For any significant changes we make between the IPD and the final version, we will do our best to get public feedback, through forum posts, public talks etc.,

I agree. The IPD should not constrain the final version. Although it would be preferable if significant changes were not needed at that stage, what ultimately matters is the final specification. If changes provide a meaningful improvement, they should be made.

Cheers,
John Prueß Mattsson

--

You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.

Al Martin

unread,
Dec 4, 2025, 1:36:02 PM (2 days ago) Dec 4
to pqc-forum, Thomas Prest
Re:
Compliance clarity.
 The IEEE 754 standard is not enforced uniformly across architectures (e.g., differences in rounding modes, fused operations, extended precision, etc.). This fuzziness makes it impossible (IMO) to say when an implementation ceases to be compliant.

As far as I know, "current" processors must support some minimum of the standard to be considered compliant with IEEE754.  The original standard was ratified in 1985, and I would be surprised if any floating-point unit being manufactured today does not comply.  As long as you only use mandated functionality, you should be ok.  Regarding fused ops, this is not a mandated operation-- don't use it or allow your compiler to use it.  Regarding extended precision, this is unique to x87 floating-point, which should only be an issue if you are porting old code.  Some ISAs implement non-standard behaviors like flushing subnormals to zero, but if your application does not have or create subnormals, that isn't an issue.

Regarding data-independent timing with floating-point subnormals, hardware can be implemented that has the same latency whether normal or subnormal. An example is Berkeley HardFloat.  bsg-external/HardFloat

Al Martin
Reply all
Reply to author
Forward
0 new messages