Query on progress of FN-DSA.

1,946 views
Skip to first unread message

niux_d...@icloud.com

unread,
Aug 21, 2025, 10:30:24 AMAug 21
to pqc-forum
Hi NIST team.


I'm wondering where we're at with the standardization of Falcon as FN-DSA.
I've went to the CSRC website several times, and given the topic of
recent publications, looks like the current administration have some other
area of interest on the priority list. (supply chain security, etc. understandable)

I recall reading from a version of the submission of the Falcon document
that its signatures are randomized (they considered making it deterministic
to ensure the security of the Gaussian samples, but favored randomization
in the 1st round submission). I know this may be bad for designing
deterministic and hedged variants, I wonder if there's communication
between NIST and the submission team over this and other areas.

Also I recall at some point NIST expressed the intention of adopting the
HAWK Gaussian sampler for FN-DSA, and I realize this require careful specification
of procedures, as well as documenting recommended implementation strategies.

So all in all, where are we at with FN-DSA?

Thank you for your attention.
DannyNiu/NJF.

Tommaso Gagliardoni

unread,
Aug 25, 2025, 5:57:24 AMAug 25
to pqc-forum, niux_d...@icloud.com
Subscribing for interest, thanks.

Moody, Dustin (Fed)

unread,
Aug 28, 2025, 9:49:14 AMAug 28
to pqc-forum, niux_d...@icloud.com

Danny,


The NIST PQC team has been working for some time on writing the draft standard for FN-DSA (Falcon), which is slated to be FIPS 206.  We have been in close communication with the Falcon submission team throughout the process.  We have also posted occasionally on the pqc-forum requesting feedback on various topics related to the standard.  See for example:

https://groups.google.com/u/1/a/list.nist.gov/g/pqc-forum/c/Dpr3tnTlKy0/m/WVlp2lNnBAAJ

https://groups.google.com/u/1/a/list.nist.gov/g/pqc-forum/c/Zhwh95D0KII/m/TY50qQo-AgAJ

 Regarding your specific questions, our current draft includes tweaks to the Gaussian sampler in Keygen made in consultation with the Falcon team, including an allowance for fixed point arithmetic – it’s not identical to either the original Falcon submission or to HAWK, though. Also, like the Falcon submission document, our draft only allows for randomized signing, not deterministic. 

The process has taken some time, as FN-DSA has a very complex implementation, and the PQC team has also been busy with other parts of the PQC project.  The draft standard is essentially completed on our end, and we have submitted it up the chain for approval for publication.  


Dustin Moody

NIST PQC



From: niux_dannyniu via pqc-forum <pqc-...@list.nist.gov>
Sent: Thursday, August 21, 2025 9:32 AM
To: pqc-forum <pqc-...@list.nist.gov>
Subject: [pqc-forum] Query on progress of FN-DSA.
 
--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/7A15659C-2E97-4A22-8B33-3A9FBF2B3A40%40icloud.com.

John Mattsson

unread,
Sep 4, 2025, 1:36:12 AM (12 days ago) Sep 4
to Moody, Dustin (Fed), pqc-forum, niux_d...@icloud.com

Hi Dustin,

 

Thanks for the status update. I hope that the non–fixed-point arithmetic parts of the FN-DSA draft specification do not require IEEE 754. Besides being paywalled, IEEE 754 is no longer state-of-the-art, as it was in 1985. The Posit Working Group Standard for Posit Arithmetic [1] is clearly superior, and there is ongoing discussion and development around Posit-enabled RISC-V cores. More broadly, I do not believe that a NIST standard should be tied to any specific format for representing real numbers in a computer.

 

[1] Standard for Posit Arithmetic (2022)

https://posithub.org/docs/posit_standard-2.pdf

 

Cheers,

John

 

John Mattsson

unread,
Sep 4, 2025, 3:37:31 AM (12 days ago) Sep 4
to Moody, Dustin (Fed), pqc-forum

Hi again,

 

For FN-DSA (FIPS 206), HQC-KEM (FIPS 207), and future algorithms, I think NIST should allow more time between the initial public draft (IPD) and the final FIPS publication, and consider releasing multiple public drafts. Having only a single draft and publishing the final versions after just a year for FIPS 203–205 felt somewhat rushed. Several major issues, such as private key formats and external hashing, warranted more thorough discussion. In retrospect, if August 2024 was the target deadline, I think NIST should have moved to the IPD phase earlier and allocated less time to the earlier phases. For many industry stakeholders, providing meaningful feedback before the IPD phase is not realistic.

 

Cheers,

John

 

Markku-Juhani O. Saarinen

unread,
Sep 4, 2025, 6:47:29 AM (12 days ago) Sep 4
to John Mattsson, Moody, Dustin (Fed), pqc-forum, niux_d...@icloud.com
On Thu, Sep 4, 2025 at 8:36 AM 'John Mattsson' via pqc-forum <pqc-...@list.nist.gov> wrote:

Hi Dustin,

 

Thanks for the status update. I hope that the non–fixed-point arithmetic parts of the FN-DSA draft specification do not require IEEE 754. Besides being paywalled, IEEE 754 is no longer state-of-the-art, as it was in 1985. The Posit Working Group Standard for Posit Arithmetic [1] is clearly superior, and there is ongoing discussion and development around Posit-enabled RISC-V cores. More broadly, I do not believe that a NIST standard should be tied to any specific format for representing real numbers in a computer.


Hi,

A private person comment (I'm the RISC-V Cryptography SIG Chair, but this is not a statement from that group, although FN-DSA has been occasionally discussed):

Regular RISC-V floating point formats (meaning standard F and D extensions and the standard vector floating point operations) are always IEEE 754. There may be special AI-related formats in matrix extensions, and individual vendors have created proprietary custom extensions. But proprietary extensions are only supported by those vendors, and there is a limited expectation of interoperability for such code.

RISC-V does not have Posit on the ratification roadmap as it has not been requested by the major industry vendors. The reasons have less to do with technical merits (with my hardware architect hat on, I can testify to the pain points of IEEE 754) but with the functionality and performance of software. There is a vast body of C code out there written with (often implicit) assumptions that floating point types ("float", "double") conform to IEEE 754. Think of the entire Linux & Android userland. This forces processor makers to invest a substantial amount of hardware real estate to make IEEE 754 efficient. I have not seen plans for Posit support in ARM and x86 ISA roadmaps either (due to similar considerations), so it is unlikely that widespread support will be coming any time soon. People can of course software-emulate it if they like, or build it into special units that are not running general software.

The main problem RISC-V Crypto has with FN-DSA using IEEE 754 is the usual one -- that it is difficult for us to guarantee security against timing attacks if floats are used for secret variables. The DIEL (Data Independent Execution Time -- "constant time") extensions Zvk and Zvkt presently excludes all floating point operations, mainly for performance and architectural reasons. Other ISAs don't generally guarantee this either. Hence, special constant-time floating-point emulation has often been used to implement Falcon even on FP-equipped systems, which is of course slow.

Highly hypothetical: If FIPS 206 requires floating point DIEL for it to be secure and performant at the same time, we could (potentially!) consider a new crypto extension that grants it for a tiny subset of scalar and vector floating point arithmetic (basically just add/sub/mul). And even in that case, probably only for well-formed floating point inputs.

If floating point *must* be software-emulated to implement FN-DSA, one could consider alternative formats to IEEE 754. These could even be specially crafted for FN-DSA. However, much of the performance of Falcon stemmed from its use of IEEE 754, so this would be awkward. I have not investigated how hard/easy it is to implement Posit Arithmetic constant time, or to mask it (against emission based side channels). As noted, if emulated, one could develop a special-purpose representation that is efficient to implement in constant time and also to mask. Given the timelines, I doubt that this will be done for FIPS 206.

Cheers,
-markku


D. J. Bernstein

unread,
Sep 4, 2025, 8:21:30 AM (12 days ago) Sep 4
to pqc-...@list.nist.gov
'John Mattsson' via pqc-forum writes:
> Besides being paywalled, IEEE 754

Paywalling in general is a big problem---slowing down science, for
example, and making security problems less likely to be caught---but for
all practical purposes this particular standard, IEEE 754, is available
for free. Many people have posted copies that are immediately found by
search engines. Maybe takedown demands from IEEE could get enough of
those offline to prompt some people to say "Oh, I can't find it, I guess
I'll pay IEEE $110 for it", but would that make IEEE more money than
they would lose from the backlash?

Anyway, https://eprint.iacr.org/2019/893 says that its integer software
does everything that the 2019 version of Falcon asks a floating-point
unit to do. It should be straightforward to use what that software does
as a spec, without even mentioning IEEE 754. (Plus whatever updates are
necessary since apparently Falcon is still a moving target, but I would
guess that whatever Falcon ends up being will similarly allow integer
implementations.)

> is no longer state-of-the-art, as it was in 1985

Intel, ARM, AMD, etc. document their floating-point instructions as
complying with IEEE 754.

> The Posit Working Group Standard for Posit Arithmetic [1] is clearly
> superior

I've worked extensively with floating-point arithmetic. I don't find it
at all clear that a hypothetical world of pervasive Posit would be
better, rather than worse, than the real world with pervasive IEEE 754.

More importantly, to the extent that floating-point arithmetic is being
used in contexts where security or reliability matters, sticking to a
stable floating-point standard simplifies review.

---D. J. Bernstein
signature.asc

Blumenthal, Uri - 0553 - MITLL

unread,
Sep 4, 2025, 8:33:24 AM (12 days ago) Sep 4
to pqc-...@list.nist.gov
+1

Except that I’d rather rid of floating point operations altogether - to ease validation of implementations.

Regards,
Uri

Secure Resilient Systems and Technologies
MIT Lincoln Laboratory

> On Sep 4, 2025, at 08:22, D. J. Bernstein <d...@cr.yp.to> wrote:
> --
> You received this message because you are subscribed to the Google Groups "pqc-forum" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
> To view this discussion visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/20250904122106.604008.qmail%40cr.yp.to.
> <signature.asc>

Daniel Apon

unread,
Sep 5, 2025, 8:07:45 AM (11 days ago) Sep 5
to pqc-...@list.nist.gov
" Paywalling in general is a big problem---slowing down science, for
example, and making security problems less likely to be caught"

I completely agree.

--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.

John Mattsson

unread,
Sep 8, 2025, 8:06:42 AM (8 days ago) Sep 8
to pqc-...@list.nist.gov

Bernstein wrote:

>Many people have posted copies that are immediately found by search engines

 

Relying on pirated content is not a solution, it’s part of the problem. Such material often carries malware, and nation-state actors have a well-documented history of exploiting digital distribution channels to deliver tampered files to selected targets. A historical example is Crypto AG, where different customers reportedly received altered manuals depending on whether they were intended targets.

 

Cheers,
John

 

From: pqc-...@list.nist.gov <pqc-...@list.nist.gov> on behalf of Daniel Apon <dapon....@gmail.com>
Date: Friday, 5 September 2025 at 14:07
To: pqc-...@list.nist.gov <pqc-...@list.nist.gov>
Subject: Re: [pqc-forum] Query on progress of FN-DSA.

John Mattsson

unread,
Sep 9, 2025, 7:50:08 AM (7 days ago) Sep 9
to niux_d...@icloud.com, pqc-...@list.nist.gov

I generally agree with Danny,

 

For someone who has not compared the Posit standard to IEEE 754, here are some of the advantages of Posit:

 

- Higher accuracy for most real-world computations

- Smaller rounding errors in practice

- Much larger dynamic range

- Simpler comparisons, since bitstring ordering matches numerical ordering

- Simpler arithmetic rules (no multiple NaNs, no signed zero, no denormals)

- Favorable trade-off between accuracy, speed, gate count, and energy efficiency.

 

The simpler arithmetic and simpler comparisons are attractive for cryptographic use, if using real-number arithmetic at all. I don’t think anyone has studied side-channel aspects of Posit in cryptographic contexts.

 

Cheers,

John

 

From: niux_d...@icloud.com <niux_d...@icloud.com>
Date: Tuesday, 9 September 2025 at 11:31
To: John Mattsson <john.m...@ericsson.com>
Cc: pqc-...@list.nist.gov <pqc-...@list.nist.gov>
Subject: Re: [pqc-forum] Query on progress of FN-DSA.

My two cents here:

 

Make integer/fixed-point arithmetic Gaussian sampler a 'baseline' choice for implementations and specify in sufficient detail for secure and portable implementation.

 

Then, specify a 'real-number' variant of Gaussian sampler as a 'mainline/high-level' choice - yes, not floating-point, and then specify the requirements for this real-number arithmetic, such as precision and accuracy.

 

(baseline, mainline, high-level are worded after H.264/AVC profiles to reflect capability of signers)

 

The requirements for real-number arithmetic should be take into consideration of existing major contenders, such as IEEE-754 basic formats, potential system-specific IEEE-754 arithmetic formats, unum/posits, GMP, and/or other *efficient* floating-point arithmetic. 

Emphasis should be placed on "arithmetic" formats and its efficiency. Similar to with ECC, we don't require point arithmetic be done using XY coordinates, homogeneous and Jaccobian are equally valid.

 

The real-number version should be a QoI issue, but all that make this choice must meet minimal security standard, which ought to be set out in the final publication (if this is a good idea of course).

 

Thank you for your attention.

DannyNiu/NJF.

Taylor R Campbell

unread,
Sep 9, 2025, 9:20:54 AM (7 days ago) Sep 9
to John Mattsson, Moody, Dustin (Fed), pqc-...@list.nist.gov, niux_d...@icloud.com
> Date: Thu, 4 Sep 2025 05:35:59 +0000
> From: "'John Mattsson' via pqc-forum" <pqc-...@list.nist.gov>
>
> I hope that the non-fixed-point arithmetic parts of the FN-DSA draft
> specification do not require IEEE 754. Besides being paywalled, IEEE
> 754 is no longer state-of-the-art, as it was in 1985.

I agree paywalls are a problem, and you're right that IEEE 754-1985 is
no longer state-of-the-art -- it has been updated by IEEE 754-2019,
with minor changes that are almost certainly irrelevant to FN-DSA.

> The Posit Working Group Standard for Posit Arithmetic [1] is clearly
> superior,

This claim is absurd on its face, for two reasons:

1. Vast swaths of the numerical analysis literature are founded on
relative error, which floating-point arithmetic guarantees bounds
on -- and which posit arithmetic does not.

So all that literature goes out the window and has to be redone.

Proponents of posits make various glib marketing claims that posit
arithmetic provides better accuracy than floating-point arithmetic
in cherry-picked examples -- which tend to rely on exploiting naive
misunderstanding of catastrophic cancellation (a property of real
number arithmetic, not of floating-point arithmetic), and can
invariably be countered by cherry-picked examples the other way:
https://marc-b-reynolds.github.io/math/2019/02/06/Posit1.html

But deceptive marketing claims are no substitute for real
literature on error analysis of algorithms.

Search for `theorem', `lemma', `proposition', `proof', or `error
analysis' in the original proposal for posits at
https://web.archive.org/web/20171104234856/http://superfri.org/superfri/article/download/137/232
and you'll turn up empty-handed.

In contrast, the motivation for IEEE 754 was to enable proving
useful theorems on a stable foundation, based on decades of
experience with real hardware and unstable foundations; see
https://history.siam.org/pdfs2/Kahan_final.pdf#page=118 for some of
the history.

2. Even if some engineering were better-served by posits, it would
still not be clear that _cryptography_ is better-served by posits.

The correctness of the algorithms in Falcon is based on theorems
bounding relative error. Perhaps you can redo that with whatever
theorems posit arithmetic has, but someone would have to do that
analysis.

Normal floating-point arithmetic -- specifically, addition and
multiplication, on normal floating-point numbers -- is relatively
easy to make constant-time, and tends to be so on real hardware
already. See, e.g.,
https://homes.cs.washington.edu/~dkohlbre/papers/subnormal_v1.pdf
for an assessment -- all of the reported variation in addition and
multiplication arose from non-normal inputs (inf/nan/subnormal).
Error bounds can be computed to guarantee that inputs and outputs
in a cryptographic algorithm always -- or with overwhelming
probability in the face of an adversary -- lie in the normal range
so this is not an issue.

(Note that even integer multiplication, and especially integer
division, may exhibit variable timings on real CPUs. See, e.g.,
https://bearssl.org/ctmul.html for an assessment of multiplication
timing variability a variety of CPUs.)

Is posit arithmetic an improvement in this respect? I doubt any
proponents of posits have made that claim. Since the internal
structure of posits is variable-width, it seems likely that posit
implementations made without concern for cryptography will tend to
be worse at it than corresponding floating-point implementations.

> and there is ongoing discussion and development around Posit-enabled
> RISC-V cores.

The vast majority of all application CPUs today consistently implement
IEEE 754 binary floating-point arithmetic at high speed, and have for
most of the last four decades, and will continue to do so indefinitely
because so much of the world relies so heavily on it. That is what
makes IEEE 754 floating-point, rather than some other system of
approximating real functions, attractive for implementing Falcon.

There are legitimate reasons to avoid floating-point in some
circumstances, like embedded CPUs that lack an FPU -- but the prospect
of some hypothetical future experiments in CPU design is hardly such a
reason!

> More broadly, I do not believe that a NIST standard should be tied
> to any specific format for representing real numbers in a computer.

Should it avoid being tied to any specific format for representing
integers? How about any specific format for representing integers of
a fixed size, scaled by a power of two of fixed size?

niux_d...@icloud.com

unread,
Sep 10, 2025, 7:55:12 AM (6 days ago) Sep 10
to John Mattsson, pqc-...@list.nist.gov
My two cents here:

Make integer/fixed-point arithmetic Gaussian sampler a 'baseline' choice for implementations and specify in sufficient detail for secure and portable implementation.

Then, specify a 'real-number' variant of Gaussian sampler as a 'mainline/high-level' choice - yes, not floating-point, and then specify the requirements for this real-number arithmetic, such as precision and accuracy.

(baseline, mainline, high-level are worded after H.264/AVC profiles to reflect capability of signers)

The requirements for real-number arithmetic should be take into consideration of existing major contenders, such as IEEE-754 basic formats, potential system-specific IEEE-754 arithmetic formats, unum/posits, GMP, and/or other *efficient* floating-point arithmetic. 
Emphasis should be placed on "arithmetic" formats and its efficiency. Similar to with ECC, we don't require point arithmetic be done using XY coordinates, homogeneous and Jaccobian are equally valid.

The real-number version should be a QoI issue, but all that make this choice must meet minimal security standard, which ought to be set out in the final publication (if this is a good idea of course).

Thank you for your attention.
DannyNiu/NJF.

niux_d...@icloud.com

unread,
Sep 10, 2025, 7:55:20 AM (6 days ago) Sep 10
to John Mattsson, pqc-...@list.nist.gov
I guess what I mean is: since we aim for crypto algorithm agility, crypto API agility, we might as well also aim for crypto implementation *technique* agility.

So the disclaimer: I'm not a proponent of unum/posit, nor am I that of IEEE-754, nor GMP.

dustin...@nist.gov

unread,
Sep 10, 2025, 2:02:06 PM (6 days ago) Sep 10
to pqc-forum, Daniel Apon

All,

 

NIST seeks to make our cryptographic standards as self-contained contained as possible.  We are happy to share that for FIPS 206, the draft standard for FN-DSA (Falcon), we have received permission from IEEE to reprint the needed parts of IEEE 754-2019 that are necessary to implement FN-DSA.   That is, FIPS 206 will be fully self-contained with regards to floating point, and one should be able to implement FN-DSA from the specifications provided in FIPS 206, without needing to go to IEEE 754.  We are grateful to the IEEE for their collaboration which makes this possible.  

 

Dustin


To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+unsubscribe@list.nist.gov.

Sophie Schmieg

unread,
Sep 10, 2025, 2:50:53 PM (6 days ago) Sep 10
to Markku-Juhani O. Saarinen, John Mattsson, Moody, Dustin (Fed), pqc-forum, niux_d...@icloud.com


Highly hypothetical: If FIPS 206 requires floating point DIEL for it to be secure and performant at the same time, we could (potentially!) consider a new crypto extension that grants it for a tiny subset of scalar and vector floating point arithmetic (basically just add/sub/mul). And even in that case, probably only for well-formed floating point inputs.

Cheers,
-markku

Consider adding support for exp in particular, that is fairly tricky to get right otherwise, unless the spec specifically talks about what algorithm to use to avoid cancellation and constant time behavior with acceptable precision. 

Sophie Schmieg

unread,
Sep 10, 2025, 2:59:19 PM (6 days ago) Sep 10
to Markku-Juhani O. Saarinen, John Mattsson, Moody, Dustin (Fed), pqc-forum, niux_d...@icloud.com
Oh and, div. Basically go through the FN-DSA spec and look at all the floating point operations required that I needed to dust off my old numerical analysis lecture notes for, I'd much rather have those implemented by a handful of competent hardware engineers than by a legion of more or less competent software engineers. (Don't ask me how to do floating point division in constant time though, that sounds like a research paper and not a random engineering task)
--

Sophie Schmieg |
 Information Security Engineer | ISE Crypto | ssch...@google.com

Blumenthal, Uri - 0553 - MITLL

unread,
Sep 10, 2025, 3:04:00 PM (6 days ago) Sep 10
to Sophie Schmieg, Markku-Juhani O. Saarinen, John Mattsson, Dustin Moody, pqc-forum, niux_d...@icloud.com
Wouldn’t it still be better to ditch the Floating Point?

Regards,
Uri

Secure Resilient Systems and Technologies
MIT Lincoln Laboratory

On Sep 10, 2025, at 14:59, 'Sophie Schmieg' via pqc-forum <pqc-...@list.nist.gov> wrote:


Oh and, div. Basically go through the FN-DSA spec and look at all the floating point operations required that I needed to dust off my old numerical analysis lecture notes for, I'd much rather have those implemented by a handful of competent
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside the Laboratory.
 
ZjQcmQRYFpfptBannerEnd
--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
To view this discussion visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/CAEEbLAaOvCkHs8JgrUpYiGehpKwckiZvZ_ye-mYNP-DRj4Z_Sw%40mail.gmail.com.

Sophie Schmieg

unread,
Sep 10, 2025, 3:08:58 PM (6 days ago) Sep 10
to Blumenthal, Uri - 0553 - MITLL, Markku-Juhani O. Saarinen, John Mattsson, Dustin Moody, pqc-forum, niux_d...@icloud.com
Definitely, but at that point you are no longer talking about Falcon. I'm just pointing to the specific, very difficult to implement in constant time, algorithms that are used in the spec. As far as I recall, div and exp are the two gnarliest ones, I think everything else like add/sub/mul is not usually implemented via a converging series, but can be computed directly. My current preference, at least where my opinion matters, is to not use Falcon. But if I have to (and I can't rule it out, given that it will be a FIPS approved algorithm), I'd rather have options to do it safely, and I trust Markku a lot more than myself when it comes to hardware constant time floating point operations.

Blumenthal, Uri - 0553 - MITLL

unread,
Sep 10, 2025, 4:52:27 PM (6 days ago) Sep 10
to Sophie Schmieg, Markku-Juhani O. Saarinen, John Mattsson, Dustin Moody, pqc-forum, niux_d...@icloud.com

ZjQcmQRYFpfptBannerEnd

Definitely, but at that point you are no longer talking about Falcon.

 

I’d rather have a strong and validate-able algorithm/standard, than try to preserve the minutia of Falcon (which, unsurprisingly, did not make it to CNSA-2.0, probably because of implementation validation concerns).

 

I'm just pointing to the specific, very difficult to implement in constant time, algorithms that are used in the spec. As far as I recall, div and exp are the two gnarliest ones, I think everything else like add/sub/mul is not usually implemented via a converging series, but can be computed directly.

 

Oh yes, understood totally.

 

My current preference, at least where my opinion matters, is to not use Falcon. But if I have to (and I can't rule it out, given that it will be a FIPS approved algorithm), I'd rather have options to do it safely, and I trust Markku a lot more than myself when it comes to hardware constant time floating point operations.

 

I’m with you, trusting Markku. But, IMHO, unless Falcon dumps the FP part (and I for one don’t care if it would still be “the Falcon”), it won’t be used, despite being one of the NIST/FIPS standards. People who wrote CNSA probably aren’t the only ones who noticed the validation difficulties.

 

Thanks!

Markku-Juhani O. Saarinen

unread,
Sep 11, 2025, 3:59:48 AM (5 days ago) Sep 11
to Sophie Schmieg, John Mattsson, Moody, Dustin (Fed), pqc-forum, niux_d...@icloud.com
Sophie,

The Falcon implementations that I've seen that use floating point avoid almost everything except add/sub/mul (when dealing with sensitive variables) and some integer-double conversions, as other floating point operations are considered too risky on almost any platform. Thomas Pornin may correct me if I've missed something.

The principle for selecting instructions for DIEL is to offer guarantees that "common implementations" (such as C implementations in OpenSSL and similar middleware) will run securely on all RISC-V instantiations that assert Zkt or Zvkt, without imposing much burden on the performance of RISC-V processors. This is why Zkt also similarly that add/sub/mul on integers is DIEL, but division and remainder of integers is not -- those should be avoided on RISC-V just like almost any other platform, as it is rarely implemented as a "1-cycle DIEL" instruction.

Recall that Zkt/Zvkt (DIEL) is not a "mode" -- if the feature is asserted, it it is always on, for all compute loads. The way exp and div are implemented in floating-point units means that to achieve DIEL for those would require considerable modification on the units. As the execution time would have to be the "worst case" for all invocations of the instruction, this would decrease performance on generic loads, which is of course the primary goal for the processor designers.

Cheers,
-markku

Dr. Markku-Juhani O. Saarinen <mj...@iki.fi>

Sophie Schmieg

unread,
Sep 11, 2025, 12:35:27 PM (5 days ago) Sep 11
to Blumenthal, Uri - 0553 - MITLL, Markku-Juhani O. Saarinen, John Mattsson, Dustin Moody, pqc-forum, niux_d...@icloud.com


I’m with you, trusting Markku. But, IMHO, unless Falcon dumps the FP part (and I for one don’t care if it would still be “the Falcon”), it won’t be used, despite being one of the NIST/FIPS standards. People who wrote CNSA probably aren’t the only ones who noticed the validation difficulties.

 

As far as I can tell, this is far more difficult than it sounds. The design of Falcon intimately uses the facts about complex valued embeddings of number fields, that cannot easily be replaced with integer or fixed point arithmetic without reevaluating the entire algorithm at that point. You are essentially choosing a different round 3 candidate at that point, with no guarantees that any of the evaluations done for Falcon still apply. Essentially, at that point, you are talking about standardizing Hawk. Which I am absolutely not opposed to, but I'm not sure if trying to morph Falcon into Hawk can succeed, processwise.
--

Watson Ladd

unread,
Sep 11, 2025, 2:16:36 PM (5 days ago) Sep 11
to Markku-Juhani O. Saarinen, Sophie Schmieg, John Mattsson, Moody, Dustin (Fed), pqc-forum, niux_d...@icloud.com
On Thu, Sep 11, 2025 at 12:59 AM Markku-Juhani O. Saarinen
<mjos....@gmail.com> wrote:
>
> Sophie,
>
> The Falcon implementations that I've seen that use floating point avoid almost everything except add/sub/mul (when dealing with sensitive variables) and some integer-double conversions, as other floating point operations are considered too risky on almost any platform. Thomas Pornin may correct me if I've missed something.
>
> The principle for selecting instructions for DIEL is to offer guarantees that "common implementations" (such as C implementations in OpenSSL and similar middleware) will run securely on all RISC-V instantiations that assert Zkt or Zvkt, without imposing much burden on the performance of RISC-V processors. This is why Zkt also similarly that add/sub/mul on integers is DIEL, but division and remainder of integers is not -- those should be avoided on RISC-V just like almost any other platform, as it is rarely implemented as a "1-cycle DIEL" instruction.
>
> Recall that Zkt/Zvkt (DIEL) is not a "mode" -- if the feature is asserted, it it is always on, for all compute loads. The way exp and div are implemented in floating-point units means that to achieve DIEL for those would require considerable modification on the units. As the execution time would have to be the "worst case" for all invocations of the instruction, this would decrease performance on generic loads, which is of course the primary goal for the processor designers.

I recall that Intel made some rather unfortunate implementations of
sin, then some very poor guarantees that forced them to be very
expensive later. Perhaps more should be done in software particularly
for transcendentals.
I do however think floating point reciprocals and division should be
doable safely: I though that usually hardware did a fixed number of
Newton Raphelson iterations.
>
> Cheers,
> -markku
>
> Dr. Markku-Juhani O. Saarinen <mj...@iki.fi>
>
>
> On Wed, Sep 10, 2025 at 9:59 PM Sophie Schmieg <ssch...@google.com> wrote:
>>
>> Oh and, div. Basically go through the FN-DSA spec and look at all the floating point operations required that I needed to dust off my old numerical analysis lecture notes for, I'd much rather have those implemented by a handful of competent hardware engineers than by a legion of more or less competent software engineers. (Don't ask me how to do floating point division in constant time though, that sounds like a research paper and not a random engineering task)
>>
>> On Wed, Sep 10, 2025 at 11:50 AM Sophie Schmieg <ssch...@google.com> wrote:
>>>>
>>>>
>>>>
>>>> Highly hypothetical: If FIPS 206 requires floating point DIEL for it to be secure and performant at the same time, we could (potentially!) consider a new crypto extension that grants it for a tiny subset of scalar and vector floating point arithmetic (basically just add/sub/mul). And even in that case, probably only for well-formed floating point inputs.
>>>>
>>>> Cheers,
>>>> -markku
>>>>
>>> Consider adding support for exp in particular, that is fairly tricky to get right otherwise, unless the spec specifically talks about what algorithm to use to avoid cancellation and constant time behavior with acceptable precision.
>>
>>
> --
> You received this message because you are subscribed to the Google Groups "pqc-forum" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
> To view this discussion visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/CA%2BiU_qmZOC534dtKsp%3DH-jybyP9tD9q%2BEJ57bC_YzM5K0p1HCA%40mail.gmail.com.



--
Astra mortemque praestare gradatim

Sophie Schmieg

unread,
Sep 11, 2025, 4:16:03 PM (5 days ago) Sep 11
to Watson Ladd, Markku-Juhani O. Saarinen, John Mattsson, Moody, Dustin (Fed), pqc-forum, niux_d...@icloud.com
Looking at it in more detail (it's been a while since I implemented Falcon), you do indeed not need exp, I just used it due to being lazy, please disregard. However, you do need inverses for SamplerZ's sigma' and the Babai's algorithm require inverses, unless I'm missing some trick to make them go away.

Bo Lin

unread,
Sep 11, 2025, 5:05:10 PM (5 days ago) Sep 11
to Sophie Schmieg, Markku-Juhani O. Saarinen, John Mattsson, niux_d...@icloud.com, Watson Ladd, pqc-forum, Moody, Dustin (Fed)
In addition to the complexity, FPU is costly in silicon die. For many embedded applications, every bit and every penny count. An FPU in a part may limit Falcon's applications. Since Falcon has been selected by NIST and it does has its advantages on signature and key sizes, can key generation be validated separately? I mean, if a vendor use offline key generation and load load key pairs to their devices while license an approved key generation some where else, it may make Falcon attractive because many applications does not require on-board key generation.

Watson Ladd

unread,
Sep 11, 2025, 6:17:02 PM (5 days ago) Sep 11
to Bo Lin, Sophie Schmieg, Markku-Juhani O. Saarinen, John Mattsson, niux_d...@icloud.com, pqc-forum, Moody, Dustin (Fed)
On Thu, Sep 11, 2025 at 2:04 PM Bo Lin <boli...@gmail.com> wrote:
>
> In addition to the complexity, FPU is costly in silicon die. For many embedded applications, every bit and every penny count. An FPU in a part may limit Falcon's applications. Since Falcon has been selected by NIST and it does has its advantages on signature and key sizes, can key generation be validated separately? I mean, if a vendor use offline key generation and load load key pairs to their devices while license an approved key generation some where else, it may make Falcon attractive because many applications does not require on-board key generation.

I believe the floating point support is used in signature generation as well.

Blumenthal, Uri - 0553 - MITLL

unread,
Sep 11, 2025, 9:42:52 PM (5 days ago) Sep 11
to Watson Ladd, Bo Lin, Sophie Schmieg, Markku-Juhani O. Saarinen, John Mattsson, niux_d...@icloud.com, pqc-forum, Moody, Dustin (Fed)

I believe the floating point support is used in signature generation as well.

 

I think so too.

 

Which is why the only reliable way to deploy Falcon is to limit the field implementations to signature verification only, relegating key and signature generation to the few (well-analyzed) places, such as CAs. Apparently, CNSA authors did not consider such an approach feasible – a pity, because I like Falcon signature sizes a lot more than those of ML-DSA.

 

Well, maybe HAWK would be added…

AM Sophie Schmieg <ssch...@google.com> wrote:

Watson Ladd

unread,
Sep 11, 2025, 10:12:08 PM (5 days ago) Sep 11
to Daniel Apon, Blumenthal, Uri - 0553 - MITLL, Bo Lin, Sophie Schmieg, Markku-Juhani O. Saarinen, John Mattsson, niux_d...@icloud.com, pqc-forum, Moody, Dustin (Fed)
On Thu, Sep 11, 2025, 7:07 PM Daniel Apon <dapon....@gmail.com> wrote:
" Well, maybe HAWK would be added…  "

As mathematically awesome as HAWK is, isn't it still an order of magnitude larger than what you *really* want from a PQ-signature size? E.g. Hawk is 555 bytes at Cat 1 (even Falcon is ~666 bytes at Cat 1). Don't you want c*secparam bit-sized signatures, for c in [1, 3] or something?

SQI-Sign says hi. Hope that was the only property you wanted from it.

I checked the NIST page for the 5th PQC conference (cf. https://csrc.nist.gov/Events/2024/fifth-pqc-standardization-conference) because I remembered a talk by Matthias Kannwischer (well, one of his three talks last year), where he sort of made the case on stage that maybe we'd failed as a community in the on-ramp process -- whose purpose was /perhaps/ or /in part/ to provide very small PQ-signatures that were standardizable. Alas, the 'on-demand video' from the event isn't something I can pull up from my local browser it seems..

---

Anyway, hope to see you at https://csrc.nist.gov/Events/2025/sixth-pqc-standardization-conference in 2 weeks with fun things to talk about as a community =)



Mike Hamburg

unread,
Sep 12, 2025, 6:00:58 AM (4 days ago) Sep 12
to Watson Bernard Ladd, Daniel Apon, Blumenthal, Uri - 0553 - MITLL, Bo Lin, Sophie Schmieg, Markku-Juhani O. Saarinen, John Mattsson, niux_d...@icloud.com, pqc-...@list.nist.gov, Moody, Dustin (Fed)
Hi all,

It’s worth noting that although FALCON and HAWK are both named after birds, and both use structured lattices, they are very different schemes otherwise and are based on different hard problems. From https://hawk-sign.info/hawk-spec.pdf:

“””
A note on naming. The name HAWK is similar to another signature scheme: FALCON,
which was selected by NIST for standardisation. Initially, beyond sharing the hash-and-
sign design of FALCON, there were no obvious similarities. However, as part of our key
generation we need to solve an NTRU equation in a similar setting to FALCON and our
HAWK-AC22 implementation reused much of the FALCON code. We therefore, despite the
different underlying hard problems, named this scheme similarly in homage.
“””

HAWK is a very cool scheme, but it will need significant additional analysis in order to have high confidence in its security, even considering the work done for FALCON.

On a parallel but off-topic note, real-life hawks are somewhat closely related to eagles, buzzards, kites, harriers and certain vultures, but not to falcons.

Regards,
— Mike

Al Martin

unread,
Sep 12, 2025, 11:12:40 AM (4 days ago) Sep 12
to pqc-forum, Watson Ladd, Sophie Schmieg, John Mattsson, Moody, Dustin (Fed), pqc-forum, niux_d...@icloud.com, Markku-Juhani O. Saarinen
(just joined this group)

Regarding "constant time" hardware implementations of divide:

Both integer and floating-point divide (and the associated remainder and square root operations) can be made constant time by (artificially) lengthening to match the longest latency cases.  This includes division-by-power-of-two, division-by-zero, and NaN source cases, which can be determined immediately.  Depending on the implementation, double-precision fdiv/fsqrt can take up to about 70 cycles to complete.  (c.f. fadd/fsub/fmul is 2 to 4 cycles).  (On the integer side, iadd/isub are 1 cycle, imul is 2-3 cycles, idiv/irem is similarly up to about 70 cycles.)  Any dependent operations must wait for the result.

Another problem with divide is that, to save on hardware, it is usually implemented as an iterative process (whether Newton-Raphson, or SRT), which has consequences to instruction issue.  The iterative process recirculates partial results, meaning that a subsequent divide must wait until the current one is done.  It may also disrupt other instructions which potentially could have been issued while the divide was underway.

In short, it is not practical in hardware to implement divide, or any other iterative function, in a "constant time" fashion, and even if they were, they should be avoided for performance reasons.

Having said that, some ISAs provide fast instructions that approximate reciprocals (about 8 bits of quotient).  These are table lookups, not iterative, so do not have the problems of a full-blown divide, and are naturally constant time (1 or 2 cycles).  If more bits of precision are needed, the results can seed a software-based N-R composed of other constant-time operations (add/sub/mul).  Log/trig and other functions could possibly be implemented in a similar fashion.  Table lookups are fairly expensive in hardware, though.

Sophie Schmieg

unread,
Sep 12, 2025, 2:21:56 PM (4 days ago) Sep 12
to Al Martin, pqc-forum, Watson Ladd, John Mattsson, Moody, Dustin (Fed), niux_d...@icloud.com, Markku-Juhani O. Saarinen
One thing to note here is that those same operations are, if anything, have even more pronounced performance differences in software: when implemented with two integers, add is a min, shift, and iadd, mul is a imul and iadd, while div is still implemented with the same Newton approximation as is used in hardware, and, given software's ease of conditional looping, usually an early exit. This isn't really all that surprising, since the underlying algorithms are, of course, the same, and care little of whether they are implemented with gates or with instructions. All of the underlying integer operations have instructions in even the most basic CPU available (as it is cmp, shift, iadd, and imul), so no direct gain is had by switching to hardware, other than potentially better use of subcycles or cache lines.

The fact that both the software and the hardware folks here are enthusiastically yelling "not it", when it comes to implementing these algorithms should probably tell us something.

Samuel Lee

unread,
Sep 12, 2025, 3:52:13 PM (4 days ago) Sep 12
to pqc-forum, Sophie Schmieg, pqc-forum, Watson Ladd, John Mattsson, Moody, Dustin (Fed), niux_d...@icloud.com, Markku-Juhani O. Saarinen, Al Martin
Just to +1 what folks are already saying and add my own two cents.
To be clear I have not tried to implement FN-DSA yet.

I think reliance on floating point for implementation of FN-DSA is highly problematic, and I will absolutely not recommend its use in Microsoft if this is a requirement (either functional or for practical performance).

There is currently no architectural support for constant time floating point in widespread CPU architectures:
Arm: Arm A-profile Architecture Registers - DIT, Data Independent Timing
Even the RISC-V DIEL that Markku referred to above does not currently encode any guarantees about floating point

So even if we tend to observe that normal operations on floating point values are constant time today, relying on this in a software implementation of any cryptographic routine handling secrets is a really bad idea.
It is just a matter of time for CPU performance engineer to find a way to improve SPECfp or some LLM training benchmark by introducing variable timing in normal operations (e.g. detecting registers with 0.0 or 1.0 as special values which turns FMUL operations into dispatch-time zeroing idioms or register renames to skip use of an execution unit entirely).


A prerequisite for me considering adopting FN-DSA would be that the standard is built on the assumption of SW implementation by using integer operations (which CPU architecture does already make timing guarantees about), and ideally has clear guidance on constant-time implementation.
If FN-DSA actually sees adoption, then it might be beneficial to encode constant time guarantees about floating-point in CPUs in order to enable faster and still safe implementations.

What I have heard so far seems to be more - NIST and the authors of FN-DSA think that using floating point probably does not introduce any timing sidechannels on existing CPUs, so we're probably fine to encode this in a standard.


I think POSITs are interesting but have no place in a NIST standard at this point in time.

Best,
Sam

Al Martin

unread,
Sep 12, 2025, 3:52:48 PM (4 days ago) Sep 12
to pqc-forum, Sophie Schmieg, pqc-forum, Watson Ladd, John Mattsson, Moody, Dustin (Fed), niux_d...@icloud.com, Markku-Juhani O. Saarinen, Al Martin
I'm not trying to pass the buck.  The point I'm trying to make is that hardware generally executes significantly faster than software.  But there are limitations which software needs to be aware of, to make the best use of that hardware.

Writing robust floating-point routines is particularly hard, having to deal with special numbers (signed zero, subnormals, infinity, NaN), rounding errors, and loss of significance.


On Friday, September 12, 2025 at 11:21:56 AM UTC-7 Sophie Schmieg wrote:

Watson Ladd

unread,
Sep 12, 2025, 4:20:06 PM (4 days ago) Sep 12
to Samuel Lee, pqc-forum, Sophie Schmieg, John Mattsson, Moody, Dustin (Fed), niux_d...@icloud.com, Markku-Juhani O. Saarinen, Al Martin
On Fri, Sep 12, 2025 at 12:52 PM 'Samuel Lee' via pqc-forum
<pqc-...@list.nist.gov> wrote:
>
> Just to +1 what folks are already saying and add my own two cents.
> To be clear I have not tried to implement FN-DSA yet.
>
> I think reliance on floating point for implementation of FN-DSA is highly problematic, and I will absolutely not recommend its use in Microsoft if this is a requirement (either functional or for practical performance).
>
> There is currently no architectural support for constant time floating point in widespread CPU architectures:
> Intel: Data Operand Independent Timing ISA Guidance and Data Operand Independent Timing Instructions
> Arm: Arm A-profile Architecture Registers - DIT, Data Independent Timing
> Even the RISC-V DIEL that Markku referred to above does not currently encode any guarantees about floating point

VPMADD52LUQ and HQ are on the constant time list for Intel, which is
most of the guts of the operations.

There's sort of a chicken and egg problem here: they could make more
guarantees, if software needed them because it adopted algorithms that
benefit, particularly for the subset of floating point Falcon needs
(no denormals!)
> To view this discussion visit https://groups.google.com/a/list.nist.gov/d/msgid/pqc-forum/94921398-8f60-422f-bb66-4d8fec48d8e5n%40list.nist.gov.

Thomas Pornin

unread,
Sep 12, 2025, 6:17:45 PM (4 days ago) Sep 12
to pqc-forum, Watson Ladd, pqc-forum, Sophie Schmieg, John Mattsson, Moody, Dustin (Fed), niux_d...@icloud.com, Markku-Juhani O. Saarinen, Al Martin, Samuel Lee
For Falcon/FN-DSA, the required floating-point operations are add, sub, mul, div, and sqrt. You also need round, floor and trunc (i.e. conversion to 32-bit integers, with rounding to nearest, toward -inf, or toward 0), and converting a (small) integer into floating-point format, but these operations are not expensive to emulate with plain integers, especially if you assume that they don't overflow. sub is extremely easy if you have add, because you just need to flip the top bit of the second operand. div and sqrt are much less used than add and mul, so they can tolerate a much slower implementation. A constant-time div does the division bit-by-bit, and constant-time sqrt works about the same way, with a similar cost.
You might want to have a look at https://eprint.iacr.org/2025/123 in which I report on my recent code for ARM Cortex M4 (with optimized assembly, everything constant-time), especially figure 2 (page 5). div and sqrt are ten times more expensive than add and mul in that code, but together make up only 6.21% of the cost of signing, while add and mul consume more than 60% of the total cost. For actual implementation of these primitives, substantial savings come from the fact that in Falcon there are no exceptional cases (no NaNs, infinites, or denormalized values; only normalized values, and zeros; operations on exponents and conversions to/from integers never overflow).

In Falcon, floating-point is used only for signature generation. Signature verification and keygen can be done with only integers. There are some usages where there will be many more verifiers than signers, typically X.509 certificates (roughly speaking, only CAs sign stuff in certificates, the rest of the world only does verification); thus, the floating-point issues are not necessarily issues in some contexts.

It is conceptually feasible to compute Falcon signatures without floating-point, in two ways. One is to instead use a fixed-point representation, i.e. you approximate value x with integer round(x*2^k) for some precision k. Problem is that the range of value is a bit large, and you'll need something like 200 or more bits per value, which means that you'll use up four times as much RAM for signing, and the performance might be disappointing. Another method is to notice that the whole Fast Fourier sampling is really a FFT representation of polynomials with rational coefficients, so you can try to use fractions (and stay out of FFT), but then it's not longer Fourier sampling, and absolutely not Fast sampling. It's closer to keygen costs, albeit consuming much more RAM and being even slower.

Thomas

Taylor R Campbell

unread,
Sep 12, 2025, 8:48:37 PM (4 days ago) Sep 12
to Al Martin, pqc-...@list.nist.gov, Sophie Schmieg, Watson Ladd, John Mattsson, Moody, Dustin (Fed), niux_d...@icloud.com, Markku-Juhani O. Saarinen, Al Martin
> Date: Fri, 12 Sep 2025 12:52:48 -0700 (PDT)
> From: Al Martin <nit...@gmail.com>
>
> Writing robust floating-point routines is particularly hard, having to deal
> with special numbers (signed zero, subnormals, infinity, NaN), rounding
> errors, and loss of significance.

If reasoning about rounding errors were too hard for cryptography,
then surely that should rule out all of lattice-based cryptography,
whether or not floating-point is involved!

Floating-point arithmetic -- mainly, correctly-rounded arithmetic on
rational numbers of the form s*2^e for fixed-size integers s and e --
is a tiny part of what makes numerical analysis and approximating real
functions hard.

Catastrophic cancellation (sometimes called `loss of significance'),
for example, is a property of _real number arithmetic_, not of
floating-point arithmetic: if x' and y' are approximations to x and y
(for any reason, be it measurement error or series truncation error or
rounding error), even if they are good approximations, then x' - y'
may be a bad approximation to x - y, with relative error proportional
to 1/(x - y).

This applies no matter how you represent x', y', and x' - y': integer,
fixed-point, scaled logarithm-indexed, floating-point, posit. This
applies even if x' - y' is computed exactly.

(The common misconception that subtracting nearby floating-point
numbers can give drastically wrong answers is contradicted by a
theorem of floating-point arithmetic called the Sterbenz lemma, which
is that if x'/2 <= y' <= x', then fl(x' - y') = x' - y', that is, the
floating-point subtraction of nearby inputs is _guaranteed_ exact.)

Fortunately, the numerical analysis has already been done for the
rounding and approximation of the Falcon algorithms in IEEE 754
floating-point arithmetic -- including theorems of avoiding
exceptional inputs of subnormal/inf/NaN -- just like the numerical
analysis has been done for the rounding and approximation of the
ML-KEM and ML-DSA algorithms!

Taylor R Campbell

unread,
Sep 12, 2025, 10:18:26 PM (4 days ago) Sep 12
to Samuel Lee, pqc-forum, Sophie Schmieg, Watson Ladd, John Mattsson, Moody, Dustin (Fed), niux_d...@icloud.com, Markku-Juhani O. Saarinen, Al Martin
> Date: Fri, 12 Sep 2025 12:52:12 -0700 (PDT)
> From: "'Samuel Lee' via pqc-forum" <pqc-...@list.nist.gov>
>
> There is currently no architectural support for constant time floating
> point in widespread CPU architectures:
> [...]
>
> So even if we tend to observe that normal operations on floating point
> values are constant time today, relying on this in a software
> implementation of any cryptographic routine handling secrets is a really
> bad idea.

Would CPU designers have bothered with DIT bits if cryptographers
hadn't asked for that?

According to Intel, the DOITM bit doesn't actually do anything in any
Intel CPUs up to 2023 (that is, all CPUs to date at the time of the
message):

https://web.archive.org/web/20240702164403/https://lore.kernel.org/all/851920c5-31c9-ddd9...@intel.com/

Presumably this is because there was very little real temptation by
Intel to put any timing variation into the instructions the bit makes
guarantees about in the first place, but cryptographers and CPU
designers agreed it would be important to forestall potential future
security issues.

> It is just a matter of time for CPU performance engineer to find a way to
> improve SPECfp or some LLM training benchmark by introducing variable
> timing in normal operations (e.g. detecting registers with 0.0 or 1.0 as
> special values which turns FMUL operations into dispatch-time zeroing
> idioms or register renames to skip use of an execution unit entirely).

Most performance improvements in this space arise from
parallelism/vectorization, not from variable sequential latency of
individual flops: if you're computing eight 64-bit multiplications in
parallel on a 512-bit vector, it doesn't help to get one of the
results a cycle faster.

But in any case, now is an excellent time to ask for guarantees
_before_ that happens, justified by FIPS 206! If it was useful for
Falcon, it will likely be useful for future algorithms too.

> A prerequisite for me considering adopting FN-DSA would be that the
> standard is built on the assumption of SW implementation by using integer
> operations (which CPU architecture does already make timing guarantees
> about), and ideally has clear guidance on constant-time implementation.

The reference implementation already demonstrates implementing Falcon
with integer operations, by compiling with -DFALCON_FPEMU. This code
(~500 lines of well-commented C) may be tricky to get right, but
that's true of many other kinds of code in cryptography, like fast
arithmetic in Z/pZ for Z = 2^255 - 19 or p = 2^252 +
27742317777372353535851937790883648493, or arithmetic in
(Z/qZ)[X]/(X^256 + 1) for q = 2^23 - 2^13 + 1.

Signing time is a factor of ~5-10x higher with FALCON_FPEMU (ballpark
estimates based on my laptop), but whether you use hardware
floating-point or software emulation:

1. verification is competitive with small-exponent RSA/Rabin-type
algorithms at comparable security;

2. signature and keygen are costlier than Ed25519, but cheaper than
ML-DSA and dramatically cheaper than SLH-DSA at comparable
security.

That is: Falcon still plausibly fits in a useful performance regime,
even with software emulation of floating-point arithmetic --
particularly for signature verification which does not involve
floating-point arithmetic.

Bo Lin

unread,
Sep 13, 2025, 4:50:16 AM (3 days ago) Sep 13
to Watson Ladd, Sophie Schmieg, Markku-Juhani O. Saarinen, John Mattsson, niux_d...@icloud.com, pqc-forum, Moody, Dustin (Fed)
Yes, that's correct. I got the impression from Comments to Falcon. It turns out a fixed-point simulation.

Bo Lin

unread,
Sep 13, 2025, 5:46:04 AM (3 days ago) Sep 13
to pqc-forum, Thomas Pornin, Watson Ladd, pqc-forum, Sophie Schmieg, John Mattsson, Moody, Dustin (Fed), niux_d...@icloud.com, Markku-Juhani O. Saarinen, Al Martin, Samuel Lee
This might be the limitation of its application because signing is widely used by a device to identify itself to gain an access to an organisation. For example, every bank card has to sign a challenge from a POS terminal (see here for a brief outline on how EMV cards handle off-line security). The same application scenario can happen to many embedded applications which need a fast signing, short data size (thanks to low bandwidth interface) and low cost.

By the way, the thread here appears repeating or continuing the Comments to Falcon.

Vincent Hwang

unread,
Sep 14, 2025, 11:11:58 AM (2 days ago) Sep 14
to pqc-forum, Bo Lin, Thomas Pornin, Watson Ladd, pqc-forum, Sophie Schmieg, John Mattsson, Moody, Dustin (Fed), niux_d...@icloud.com, Markku-Juhani O. Saarinen, Al Martin, Samuel Lee
My apologies for my previous email reply.

Below is the content intended for the forum 

========

To add a bit on Thomas's response, it is certainly possibly to verify the absence of NaN, \infty, in the context of non-fdiv operations. It was already verified that this is true for the FFT prior to the discrete Gaussian sampling https://eprint.iacr.org/2024/321. The work only targetted fadd, fmul, but I see no difficulties on extending to other fops with linear bit complexity. I haven't found time to explore the feasibility with large computations with fdiv and extend it to the whole signature generation.

========

Vincent

Vincent Hwang

unread,
8:57 AM (3 hours ago) 8:57 AM
to Thomas Pornin, pqc-forum, Watson Ladd, Sophie Schmieg, John Mattsson, Moody, Dustin (Fed), niux_d...@icloud.com, Markku-Juhani O. Saarinen, Al Martin, Samuel Lee
To add a bit on Thomas's response, it is certainly possibly to verify the absence of NaN, \infty, in the context of non-fdiv operations. It was already verified that this is true for the FFT prior to the discrete Gaussian sampling https://eprint.iacr.org/2024/321. The work only targetted fadd, fmul, but I see no difficulties on extending to other fops with linear bit complexity. I haven't found time to explore the feasibility with large computations with fdiv and extend it to the whole signature generation.

Vincent

--
You received this message because you are subscribed to the Google Groups "pqc-forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pqc-forum+...@list.nist.gov.
Reply all
Reply to author
Forward
0 new messages