Hi devs,
I've spent the last several months implementing and benchmarking optimization techniques for the post-quantum hash-based signature scheme SLH-DSA (formerly SPHINCS+), which is being considered as a candidate for a quantum-resistant soft-fork upgrade to Bitcoin, re:
BIP360.
As a material result of my findings, I believe I now possess what may be the fastest publicly available implementation of SLH-DSA (at least on my hardware), and possibly also one of the fastest GPU implementations, though I've had difficulty finding comparable alternatives on that front. Its speed is owed to the Vulkan graphics programming API, often used by video game devs to squeeze performance out of gaming PCs and mobile phones.
The code:
Using my CPU, this code can sign a message with SLH-DSA-SHA2-128s in just 11 milliseconds, and can generate keys in only 2 milliseconds (1ms if batched). Verification throughput approaches that of ECDSA, at around 15000 nanoseconds per verification if properly batched. If you have a GPU with drivers, everything runs even faster.
For perspective, the fastest open source SLH-DSA library I could find,
PQClean, requires 94 milliseconds for SLH-DSA-SHA2-128s signing and 12ms for keygen on my CPU. PQClean can only achieve this speed on x86 CPUs, whereas Vulkan works on ARM devices, including Apple silicon.
There are caveats. This technique is memory-hungry, requiring several megabytes of RAM for signing and keygen, so it will not help in resource-constrained environments like hardware wallets. Dedicated hash accelerator chips or FPGAs would be more appropriate for those use-cases.
Furthermore, there is a hefty startup penalty, owing to the need to compile shaders on-device at runtime, though this can be mitigated by on-disk caching, and proper context scoping (e.g. don't compile verification shaders if you only need signing shaders). For daemon programs like bitcoind or lnd, I believe this would be not such a big issue, but it would be problematic for start-and-stop apps like CLI utilities.
More research is needed to gather additional data, and to assess the viability of this technique on diverse platforms. If you are interested in collaborating, please email me :)
regards,
conduition