An implementation optimization for SP800-230 and different parameter sets designed around that optimzation

218 views

Skip to first unread message

Scott Fluhrer (sfluhrer)

unread,

May 6, 2026, 5:25:38 PMMay 6

to SP800-230...@nist.gov, 'Hamilton Silberg' via pqc-forum

This is about the NIST SP 800-230 initial public draft, and I'd like to point out two things:

The first is an optimization to the SP800-230 parameter sets, which is allowed by the ipd. This optimization increases the size of the private key modestly, while providing a significant speed up during signature generation time.

This implementation optimization is to store the Merkle tree nodes at level N from the top with the private key (with N being something like 6 or 8). That is, when you generate the private/public keypair, you generate the entire Merkle tree (because you have to compute the root); when you hit a node at level N, you store that node. This costs 2^N additional hash functions, so for N=8, this would be 4k-8k — of course, the exact value of N can be an implementation decision - I used N=8 in my implementation because memory was cheap there (and larger N speed things up by less than 1%)

During signature generation, you need to generate the authentication path to the leaf node you are signing with. For the levels below level N, you generate that from scratch - if the total tree height is H (so H=22 for SLH-DSA-*-128-24) and N=8, the first H-N nodes in the authentication path can be computed by computing 2^{H-N} = 16,384 WOTS public keys, and 2^{H-N}-1 hashes to compute internal Merkle tree nodes (you could reduce this slightly, because you don't actually need to compute anything on the actual path - that's a tiny savings that's probably not worth worrying about). For the authentication path at level N and above, those can be computed using the level N nodes we stored with 2^{N}-2 hashes (and again, we could drop those on the direct path).

This significantly reduces the cost of generating the Merkle tree authentication path (compared to recomputing everything from scratch), and drastically reduces the storage needed (compared to storing the entire Merkle tree). Of course, generating a signature also involves computing the FORS trees, and those can't be optimized. On my implementation (which optionally does this), this accelerates the signing time from a factor of 1.8 to 4.7, with the L1 parameter sets being improved significantly more - that's because the L1 parameter sets has smaller FORS trees (even if by a single level).

This leads me to the second thing I wanted to point out - can we create parameter sets which are tailored around this improvement? I did some searching, and here's what I came up with:

Level 1: h=28 d=1 a=21 k=6 w=2 This has sigsize=3664 (compared to 3856), verify time = 299 hashes (compared to 311), overuse at 112 level = 29.68 (compared to 27.25).

Signature time with N=8: 38M + 289M = 327M hashes, storage cost 4k

Signature time with N=9: 38M + 145M = 183M hashes, storage cost 8k

Signature time with N=10: 38M + 72M = 110M hashes, storage cost 16k

Signature time with N=11: 38M + 36M = 74M hashes, storage cost 32k

Compared to original: signature time 302M + 1158M = 1460M hashes with no storage, 302M hashes with full storage, 302M + 5M = 307M hashes with N=8 (storage cost 4k).

Level 3: h=28 d=1 a=24 k=8 w=3 This has sigsize=7104 (compared to 7752), verify time = 499 (compared to 526), overuse at 128 level = 35.98 (compared to 31.77).

Signature time with N=8: 403M + 592M = 965M hashes, storage cost 6k

Signature time with N=9: 403M + 281M = 684M hashes, storage cost 12k

Signature time with N=10: 403M + 141M = 544M hashes, storage cost 24k

Compared to original: signature time 906M + 1124M = 2030M hashes with no storage, 906M hashes with full storage, 906M + 4M = 910M hashes with N=8 (storage cost 6k)

Level 5: h=27 d=1 a=24 k=11 w=2 This has sigsize=13952 (compared to 14944), verify time = 571 (compared to 602), overuse at 192 level = 33.47 (compared to 29.98).

Signature time with N=7: 554M + 566M = 1120M hashes, storage cost 4k

Signature time with N=8: 554M + 283M = 837M hashes, storage cost 8k

Signature time with N=9: 554M + 142M = 696M hashes, storage cost 16k

Compared to original: signature time 1208M + 1132M = 2340M hashes with no storage, 1208M hashes with full storage, 1208M + 4M = 1212M hashes with N=8 (storage cost 8k)

BTW, when I cite the signing time as X+Y, the X is the number of hashes to build the FORS trees (which we can't cache) and Y is the number of hashes to build the Merkle authentication path (which we can)). And, by "overuse at xxx level", I mean to talk about the safety margin - that is, the situation where the signer doesn't respect the "no more than 2^{24} signatures" boundary, but continues; what is the log2 of the number of signatures he can generate and still retain xxx bits of security.

So, we have improved most aspects (even if only somewhat). Of course, there are costs:

The key generation is lengthy
There's more complexity in the authentication path generation. I don't personally see that as all that much (and well worth the performance gain, even for the current parameter sets), but it is there
There is a bit of storage needed. I believe that, even in a constrained (HSM or TPM) environment, the amount is reasonable, but still, it's really needed to make these parameter sets work reasonably well (compared to the original, which can function ok without it).

And, you have published the ipd - it might be too late to consider changing the parameter sets (and these might not be enough of an improvement to consider supporting a second set of parameters). Oh well...

Reply all

Reply to author

Forward

0 new messages