An implementation optimization for SP800-230 and different parameter sets designed around that optimzation

118 views
Skip to first unread message

Scott Fluhrer (sfluhrer)

unread,
May 6, 2026, 5:25:38 PM (3 days ago) May 6
to SP800-230...@nist.gov, 'Hamilton Silberg' via pqc-forum
This is about the NIST SP 800-230 initial public draft, and I'd like to point out two things:

The first is an optimization to the SP800-230 parameter sets, which is allowed by the ipd.  This optimization increases the size of the private key modestly, while providing a significant speed up during signature generation time.

This implementation optimization is to store the Merkle tree nodes at level N from the top with the private key (with N being something like 6 or 8).  That is, when you generate the private/public keypair, you generate the entire Merkle tree (because you have to compute the root); when you hit a node at level N, you store that node.  This costs 2^N additional hash functions, so for N=8, this would be 4k-8k — of course, the exact value of N can be an implementation decision - I used N=8 in my implementation because memory was cheap there (and larger N speed things up by less than 1%)

During signature generation, you need to generate the authentication path to the leaf node you are signing with.  For the levels below level N, you generate that from scratch - if the total tree height is H (so H=22 for SLH-DSA-*-128-24) and N=8, the first H-N nodes in the authentication path can be computed by computing 2^{H-N} = 16,384 WOTS public keys, and 2^{H-N}-1 hashes to compute internal Merkle tree nodes (you could reduce this slightly, because you don't actually need to compute anything on the actual path - that's a tiny savings that's probably not worth worrying about).  For the authentication path at level N and above, those can be computed using the level N nodes we stored with 2^{N}-2 hashes (and again, we could drop those on the direct path).

This significantly reduces the cost of generating the Merkle tree authentication path (compared to recomputing everything from scratch), and drastically reduces the storage needed (compared to storing the entire Merkle tree).  Of course, generating a signature also involves computing the FORS trees, and those can't be optimized.  On my implementation (which optionally does this), this accelerates the signing time from a factor of 1.8 to 4.7, with the L1 parameter sets being improved significantly more - that's because the L1 parameter sets has smaller FORS trees (even if by a single level).


This leads me to the second thing I wanted to point out - can we create parameter sets which are tailored around this improvement?  I did some searching, and here's what I came up with:

Level 1: h=28 d=1 a=21 k=6 w=2   This has sigsize=3664 (compared to 3856), verify time = 299 hashes (compared to 311), overuse at 112 level = 29.68 (compared to 27.25).
    Signature time with N=8: 38M + 289M = 327M hashes, storage cost 4k
    Signature time with N=9: 38M + 145M = 183M hashes, storage cost 8k
    Signature time with N=10: 38M + 72M = 110M hashes, storage cost 16k
    Signature time with N=11: 38M + 36M = 74M hashes, storage cost 32k
Compared to original: signature time 302M + 1158M = 1460M hashes with no storage, 302M hashes with full storage, 302M + 5M = 307M hashes with N=8 (storage cost 4k).

Level 3: h=28 d=1 a=24 k=8 w=3  This has sigsize=7104 (compared to 7752), verify time = 499 (compared to 526), overuse at 128 level = 35.98 (compared to 31.77).
    Signature time with N=8: 403M + 592M = 965M hashes, storage cost 6k
    Signature time with N=9: 403M + 281M = 684M hashes, storage cost 12k
    Signature time with N=10: 403M + 141M = 544M hashes, storage cost 24k
Compared to original: signature time 906M + 1124M = 2030M hashes with no storage, 906M hashes with full storage, 906M + 4M = 910M hashes with N=8 (storage cost 6k)

Level 5: h=27 d=1 a=24 k=11 w=2  This has sigsize=13952 (compared to 14944), verify time = 571 (compared to 602), overuse at 192 level = 33.47 (compared to 29.98).
    Signature time with N=7: 554M + 566M = 1120M hashes, storage cost 4k
    Signature time with N=8: 554M + 283M = 837M hashes, storage cost 8k
    Signature time with N=9: 554M + 142M = 696M hashes, storage cost 16k
Compared to original: signature time 1208M + 1132M = 2340M hashes with no storage, 1208M hashes with full storage, 1208M + 4M = 1212M hashes with N=8 (storage cost 8k)

BTW, when I cite the signing time as X+Y, the X is the number of hashes to build the FORS trees (which we can't cache) and Y is the number of hashes to build the Merkle authentication path (which we can)).  And, by "overuse at xxx level", I mean to talk about the safety margin - that is, the situation where the signer doesn't respect the "no more than 2^{24} signatures" boundary, but continues; what is the log2 of the number of signatures he can generate and still retain xxx bits of security.

So, we have improved most aspects (even if only somewhat).  Of course, there are costs:

  • The key generation is lengthy
  • There's more complexity in the authentication path generation.  I don't personally see that as all that much (and well worth the performance gain, even for the current parameter sets), but it is there
  • There is a bit of storage needed.  I believe that, even in a constrained (HSM or TPM) environment, the amount is reasonable, but still, it's really needed to make these parameter sets work reasonably well (compared to the original, which can function ok without it).

And, you have published the ipd - it might be too late to consider changing the parameter sets (and these might not be enough of an improvement to consider supporting a second set of parameters).  Oh well...


Reply all
Reply to author
Forward
0 new messages