Benchmarking SLH-DSA STARK Aggregation

155 views

Skip to first unread message

remix7531

unread,

Apr 13, 2026, 5:27:57 AMApr 13

to bitco...@googlegroups.com

Hi all,

Following Ethan Heilman's "Post Quantum Signatures and Scaling Bitcoin"
post [0], which proposed using STARKs to aggregate PQ signatures per
block and raised the concern that proof generation could give large
miners an unfair advantage if too expensive, I ran some benchmarks to
put numbers on this.

Full write-up with charts:
https://remix7531.com/post/slh-dsa-stark-bench/

I built a proof-of-concept [1] that aggregates N SLH-DSA-SHA2-128s (FIPS
205) signature verifications into a single STARK proof using RISC Zero's
zkVM with its SHA-256 precompile.

Results (wall-clock proving time, succinct proofs):

N RTX 5090 B200 CPU (Ryzen 8640U) Proof size
1 4.1 s 4.2 s 14 min 17 s 218 KiB
8 28.9 s 19.5 s 1 h 14 min 222 KiB
64 3 min 31 s 2 min 33 s -- 247 KiB
512 26 min 28 s 20 min 3 s -- 454 KiB

Key findings:
- Proving scales roughly linearly with N.
- ~3.1 s/sig on RTX 5090, ~2.3 s/sig on B200.
- Proof size grows sublinearly: 218 KiB (N=1) to 454 KiB (N=512),
vs 3.8 MiB of raw signatures at N=512.
- Verification is constant at ~12-15 ms regardless of N.
- B200 is only 1.3x faster than RTX 5090. The workload is
compute-bound; RISC Zero limits segment size (PO2) to 22.

At 3.1 s/sig, proving a full block on a single RTX 5090 would take over
2 hours. That is too slow as-is, but this is a general-purpose zkVM
upper bound. Several things could improve this:

1. Dedicated AIR and prover: S-two's benchmarks [2] show their prover
running SHA-256 chains up to 85x faster than RISC Zero's SHA-256
precompile on CPU. SLH-DSA verification has overhead beyond SHA-256
that is not accelerated, so the real-world speedup is unclear.

What speedup could we realistically expect from a custom AIR and
prover built specifically for SLH-DSA verification? I would love
to hear from someone with more experience building STARK provers.

2. Preprocessing: if transactions are proven as they enter the
mempool and proofs are aggregated recursively, most proving work
shifts to before the block is mined. Only a final aggregation step
remains. This needs clever batching algorithms, probably grouping
by fee level.

How much of the per-block proving cost could preprocessing
realistically eliminate?

3. Multi-GPU: STARK segment proving is embarrassingly parallel. RISC
Zero has experimental multi-GPU support. A cluster divides the
workload proportionally.

Kudinov and Nick's Bitcoin-optimized SPHINCS+ [3] reduces SHA-256
compression calls by roughly 3x, which would also reduce the number
of cycles a STARK prover needs per signature. That said, I lean
toward sticking with NIST-standardized SLH-DSA for the ecosystem
benefits (vetted implementations, HSM support, hardware acceleration
path) and letting miners run a larger GPU cluster to compensate, but
that is a trade-off worth discussing.

Best
remix7531

[0] https://groups.google.com/g/bitcoindev/c/wKizvPUfO7w
[1] https://github.com/remix7531/slh-dsa-stark-bench
[2] https://docs.starknet.io/learn/S-two-book/benchmarks
[3] https://eprint.iacr.org/2025/2203

Ethan Heilman

unread,

Apr 20, 2026, 4:13:06 PMApr 20

to remix7531, bitco...@googlegroups.com

How does this change if Poseidon hash is used instead of SHA256?

--
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/6d80c39a-952f-4358-874a-61368e0a9911%40mailbox.org.

Reply all

Reply to author

Forward

0 new messages