Note that there are two competing meanings for "constant-time", one being what the expression says, i.e. "executes in a constant amount of time", and the other, more common meaning in cryptography, is "variations in timing measurements related to the scheme are computationally unlinkable to secret data".
In signature verification, we usually say that there is only public data, hence only the first meaning may apply, not the second one. You could imagine a contrived situation where the signature and the data are not public, and the data is a low-entropy string (a human-chosen password, for instance), in which case a timing side-channel in verification could be used to power a dictionary attack on the string (the attacker tries potential passwords until one is found that is such that it matches the measured timing signal). Even that would not work for Falcon, where the variability about the input comes from the hash-to-polynomial step, in which the input is hashed along with a 40-byte random nonce -- it the signature is secret then the attacker does not know the nonce, which is enough to deter that dictionary attack. If you have a really great imagination, you could try something where the data is secret and low-entropy, the signature is public, but the public key is not public. Under these conditions, it becomes conceivable that a timing signal in the hash-to-polynomial step could be leveraged in a timing attack. It is really a big stretch, though.
This is a freak, theoretical only concern, and we should not try to do that in practice. HOWEVER, the analysis which is in there is relevant to the other meaning of "constant-time", i.e. "executes in a constant amount of time". Or, more accurately, what kind of upper bound you can get for the execution time. As explained in the comments in the code linked above, the hash-to-polynomial process in Falcon is a case of rejection sampling: for degree n, you want n coefficients in [0,12288]; for each coefficient, you get 2 bytes from the RNG (SHAKE256) and that yields the coefficient if that 2-byte value falls in [0,61444]; but if the value is in [61445,65535] then you have to drop the 2 bytes and get 2 more, and so on. Thus, there is a bit of "oversampling". How much oversampling you may need is computable. As the table shows, for n = 512, you need at least 512*2 bytes, but the risk that you need more than (512+205)*2 bytes is lower than 2^(-256), i.e. completely negligible.
(2^(-256) is negligible because the probability that the entire mankind dies out because of a huge asteroid strike right during the millisecond or so that your microcontroller is verifying the signature is a lot higher, around 2^(-62), and that is a much more dire occurrence, if only because if that asteroid vaporizes mankind then it will also destroy your microcontroller and prevent it from completing the signature verification. We do not care about random events with probability lower than 2^(-62). In an adversarial context in which an attacker tries to find exactly the most annoying case, we may care about lower probabilities, because attackers may try to find a "bad case" signature that forces the verifier to spend more time; but even then, 2^(-256) is completely unreachable for dedicated attackers, and if an attacker can find a "bad case" with that low a probability then the attacker can also simply break the key and forge signatures. Hence, 2^(-256) risk is truly negligible.)
SHAKE256 outputs bytes by chunks of 136; the cost of SHAKE256 is running the Keccak-f permutation, once per 136 output bytes. With some figures (see
https://eprint.iacr.org/2025/123): on an ARM Cortex-M4, I get a signature verification cost of about 255300 cycles (for the "original Falcon"; if the hash of the public key is included in the signed data, then there's an extra 103000 cycles overhead, which is the cost of that hash computation). The hash-to-polynomial involves 8.03 invocations of Keccak-f (i.e. 8 or 9 in practice, with 9 being slightly more probable). Within the 2^(-256) risk, we're good as long as we get (512+205)*2 bytes, which translates to 12 Keccak-f invocations, i.e. 3 or 4 more than the normal, average case. Each Keccak-f is 14348 cycles on that M4, so we are talking of an extra overhead of less than 60k cycles, which we really cannot hit in practice, and over a base cost of 255k. The overall conclusion: for Falcon/FN-DSA verification, this really is NOT a case of "sometimes verification time is doubled". It's more a case of "theoretically we could get some slight overhead, but it will be at most +22% and it won't happen in practice, even if attackers try real hard to make it happen".
TL;DR: with Falcon/FN-DSA and signature verification time, you get nice real-time bounds. Even accounting for freak theoretical-only variations, you still get a total time which is low, as these things go (e.g. on this kind of hardware, this would still be twice faster than, say, a very optimized Ed25519 verification).
If we are talking about signature generation, not verification, then things are a bit different. You do get the hash-to-polynomial process, but it is only a small part of the overall signing cost. The use of floating-point operations certainly increases the cost on hardware which does not have an appropriate FPU, but (see the paper above) it's not catastrophic; on the Cortex-M4, signing time is about 5.4x the average cost of signing with ML-DSA, so it is certainly slower, but not egregiously so. In fact, if we want upper bounds, then that ratio is much lower. At n = 512, Falcon/FN-DSA needs to restart the signing process with probability about 1/230000 (this is mostly due to the need for a low enough L2 norm of the result; it's not an effect of the bounded/padded size limit for the signature). If you want "9 nines" chances of completing the signature (i.e. risk of failure lower than 10^(-9)), then you only need to account for two iterations. On the other hand, Dilithium-II (aka ML-DSA-44) will restart with probability about 0.826 and will need on average about 5.9 iterations, but for a "9 nines" bound the number of iterations must be assumed to be up to 109, i.e. about 18 times the average case.
Again, some figures (in million cycles), on an ARM Cortex M4:
- Falcon/FN-DSA (n = 512): average = 22m, worst case (at risk 10^(-9)) = 45m
- Dilithium-II (ML-DSA-44): average = 4m, worst case (at risk 10^(-9)) = 73m
We arrive here at the somewhat counterintuitive result that on the ARM Cortex-M4, lacking the hardware support for the IEEE 754 "binary64" floating-point values that Falcon signature generation needs to use, Falcon/FN-DSA is actually... faster than Dilithium/ML-DSA? It's a bit artificial, because it comes from the enforcement of a real-time upper bound on time (within some given failure probability, here the marketingly powerful "nine nines" value).
To sum up:
- For verification, you can get really nice real-time bounds with a relatively low overhead (22%) compared with the average case (and, for performance and code simplicity, you'd prefer FN-DSA over ML-DSA, and even over pre-quantum EdDSA).
- For signature generation, if you want real-time bounds, you may end up preferring FN-DSA over ML-DSA again, because its probability of restart is much lower. On the other hand, you might still prefer ML-DSA on the grounds that it is probably easier to mask/blind against other side channels such as differential power analysis. Both algorithms are slower than you might want, in any case (e.g. Ed25519 signing will be less than 0.5m cycles on that hardware). But of course, if you sign stuff on a microcontroller, then you have a private key and you need safe storage for that, and you get a whole lot of new issues.
Personally I would expect the signature verification to be much more commonly used than signature generation on small microcontrollers. A typical case would be verifying the signature on the firmware at boot time. Or checking a server X.509 certificate in a TLS connection to a much beefier server (i.e. you'll want to look at ML-KEM cost and variability, more than signature generation cost).
Thomas