Post-Quantum BIP-86 Recovery via zk-STARK Proof of BIP-32 Seed Knowledge

78 views
Skip to first unread message

Olaoluwa Osuntokun

unread,
12:18 AM (18 hours ago) 12:18 AM
to Bitcoin Development Mailing List
Hi y'all,

I found some spare time this last weekend to dust off a little side project
I started last August: extend TinyGo [1] to be able to produce RISC-V ELF
binaries capable of being run as a guest on the risc0 platform to generate
zk-STARK proofs of arbitrary programs. Initially, I didn't really have a
clear end target application, it was mainly a technical challenge to force
me to learn a bit more about the RISC-V platform, and also the host/guest
architecture of risc0. Fast forward ~9 months later, and an initial killer
use case popped into my mind: a zk-STARK proof that a Taproot output public
key was generated using BIP-32, via a given BIP-86 derivation path.

More formally:
```math
\mathcal{R} = \left\lbrace\;
(\overbrace{K,\, C}^{\textsf{public}} ;\; \underbrace{s,\, \mathbf{p}}_{\textsf{witness}})
\;\middle|\;
\begin{aligned}
  K &= \textsf{BIP86Taproot}\bigl(\textsf{BIP32Derive}(s,\, \mathbf{p})\bigr) \\
  C &= \textsf{SHA256}\bigl(\texttt{"bip32-pq-zkp:path:v1"} \;\|\; \mathbf{p}\bigr)
\end{aligned}
\;\right\rbrace
```

where $K$ is the Taproot output key, $C$ is the path commitment, $s$ is the
BIP-32 seed, and $\mathbf{p}$ is the derivation path.


I was able to get everything working e2e over the weekend, after making
some tweaks to my initial architectural game plan!

The TL;DR is that:

  * Given that the Taproot commitment scheme is post-quantum secure [3], in
    the future we can deploy a soft fork to _disable_ the keyspend path,
    and force all Taproot spends to instead flow through the script path
    (not my idea, commonly discussed amongst developers, not sure who
    proposed it first). At that point, Taproot starts to resemble BIP-360.

  * That works for script path spends, but then leaves all the BIP-86
    wallets in a bad position, as they generated outputs that provably
    don't commit to a script path at all.

  * A 2023 paper (Protecting Quantum Procrastinators with Signature
    Lifting: A Case Study in Cryptocurrencies [4]) proposed a solution to this,
    namely _seed lifting_ (use BIP-32 as the one-way function to the
    Picnic PQ Signature scheme) to provide a post-quantum proof of secret
    information a quantum attacker wouldn't be able to easily obtain.

  * The downside of that is that it reveals the secret BIP 32 seed,
    exposing other non migrated UTXOs of a user.

  * With this project I've cobbled together a series of projects to be able
    to generate a zk-STARK proof that a Taproot output public key was
    generated via BIP-32 invocation of a BIP-86 derivation path.

  * In the future a variant of this scheme can be used to enable wallets
    that generated the private keys via BIP-86, to have a post quantum safe
    exit path in case they don't bother moving their coins in time to the
    yet-to-be-decided post quantum signature scheme.

To achieve this end, I needed to create/fork a series of repos:

  * tinygo-zkvm: https://github.com/Roasbeef/tinygo-zkvm
    * A fork of TinyGo that supports the flavor of RISC-V (rv32im) that
      risc0 requires to generate/execute a guest program to later be proved
      by the host.

  * risc0: https://github.com/Roasbeef/risc0
    * Mostly a bug fix to their c-guest example, along with some
      additional documentation on how to get things running. The repo is
      unmodified other than that. Recent updates to the repo made the
      entire process much easier (Go guest+host), more on that later.

  * go-zkvm: https://github.com/Roasbeef/go-zkvm
    * Go utilities to take a RISC-V ELf binary produced by tinygo-zkvm, and
      package it in the expected R0BF format, which combines the user
      generated RISC-V ELF (the thing that is executed to generate the
      proof) along with the v1compat ELF kernel, which is risc0's execution
      environment.

    * This also includes a Go host package, which loads the guest program,
      executes it, and generates a trace to later be proved. This is
      achieved via a C FFI compat layer between Go and the original Rust
      host/proving/verification code.

  * bip-32-pq-zkp: https://github.com/Roasbeef/bip32-pq-zkp
    * The project that packages everything together, this contains the:
      * Guest Go program that defines the secret witness and
        claim/constraints of the proof.

      * The C FFI wrapper around the OG Rust host, which is used to load
        the guest program, execute it, generate a trace, then finally
        generate a proof.

Details of the final proof as generated on my Mac Book (Apple Silicon M4
Max, 128 GB of RAM):
  * Takes ~55 seconds or so to generate+proof, including execution. This
    uses Metal for GPU acceleration on the platform.
  * Uses ~12 GB of ram.
  * Final proof size is ~1.7 MB.
  * Verification takes ~1.8 seconds, and uses ~32 MB of memory.

On several layers, this demo is far from optimized (more on that later),
this is meant to serve as a PoC to demonstrate that with the latest
software+hardware, a proof of this complexity is well within reach.

For those curious re the e2e details I've generated this tutorial that
explains the entire system top to bottom:
https://github.com/Roasbeef/go-zkvm/blob/main/docs/tutorial.md.

If you got to this point in this mail, and don't care about the lower level
details, thanks for reading up until now, and feel free to return back to
the _The Net of a Million Lies_, or as better known in our Universe:
Monitoring the Situation and/or slopfotainment! 🫡

## Motivation + Background

As commonly known, in the case of an adversary that possesses a quantum
computer capable of breaking classical asymmetric cryptography, any coins
stored in UTXOs with a known public key are vulnerable. This is the case
for any P2PK outputs from waaaay back, and also any other outputs that have
revealed their public key. Pubkey reveal might happen due to address re-use
(spending from the same script twice), or Taproot outputs, which publish
the public key plainly in the pkScript.

As detailed in [3], for Taproot outputs, a widely circulated plan is
roughly to: disable the _keyspend_ path (requires a simple signature),
enforcing a new rule that all Taproot spends must then flow through the
script path. Spending via the script path requires an opening of the
Taproot commitment (C = I + H(I || H(M))), which was shown to be binding even
under classic assumptions, as H(M) (tapscript merkle root) is still a
collision-resistant function.

That means any UTXO that _does_ commit to a script path has a future escape
hatch _if_ such a softfork would need to be deployed in the future.
However, what about all the other wallets that use BIP 86, and don't commit
to a script path at all? Under a strict version of this existing
proposal, those wallets would basically be locked forever.

The goal of this work is to demonstrate a practical solution (discussed
against devs, but never implemented AFAICT): generate a zk proof that an
output was generated using BIP-86. For the zk-Proof, we select zk-STARKs,
as they're plausibly post quantum since they rely only on symmetric
cryptography: layers of merkle trees over an execution trace, along with
some novel sampling/error-correction algorithms.

At this point, you may be asking: "if the quantum adversary can derive the
private key to a random taproot public key, then how exactly does this
help?". The answer lies in the structure of BIP-32! BIP-32 takes an initial
128-512-bit seed (with BIP-39, either 12 or 24 words), then runs it through
HMAC-SHA512 keyed by "Bitcoin seed" to produce the master extended private
key. An adversary who wants to forge this proof needs to find a _colliding_
seed: a different seed s' such that HMAC-SHA512("Bitcoin seed", s') produces
the same master key. The BHT algorithm (Brassard-Hoyer-Tapp [6]) is the
best known quantum collision finder, and it runs in time proportional to the
cube root of the output space: 2^(n/3). For HMAC-SHA512's 512-bit output,
that's ~2^171 quantum operations, well above even NIST's highest
post-quantum security category. Therefore, if you generated a wallet using
BIP-32, you possess _another_ secret that a quantum adversary can't
efficiently reconstruct!

This demo focuses on the Taproot case, but the rough approach also applies
to any other output generated via BIP-32. BIP 32 was originally published in
2012, over 14 years ago. So safe to say that _most_ wallets were generated
under this scheme. However, Bitcoin Core only officially adopted BIP-32 in
2016/2018, moving away from their existing key pool structure. I can't say
how much BTC is held today in outputs generated with Bitcoin Core's original
key pool, but if you have coins generated via that mechanism, you may want
to consider migrating them to a BIP-32 wallet.

## TinyGo + RISC-V + risc0

Now for some of the lower level details. risc0 is a STARK based proving
system that takes a RISC-V ELF binary generated by a guest program (any
program generating using their flavor of rv32im can be proved), executes
that in a host environment, generates a trace, then produces a STARK proof
from that.

Today you can take some subset of Rust, compile it to an ELF using their
toolchain, then execute it, generate a trace, to finally prove+verify it
using their system.

This demo took a bit of a round about journey to achieve this, as after
all, the journey is most of the fun, ain't it!

For the past 10 years or so, my Bitcoin stack of choice (lnd/btcsuite) uses
a series of Go libraries, so I wanted to be able to re-use them, first for
this demo, then also in the future for other projects.

TinyGo is a special Go compiler based on LLVM, that targets mostly embedded
environments. You can use it to generate go programs that can run on
micro controllers, or on web assembly (producing a smaller binary than if
you used the normal stdlib path).

TinyGo supports RISC-V, but _not_ the 32-bit variant of RISC-V that risc0
relies on. So the first step here was to create a new target definition for
TinyGo: riscv32-unknown-none, which uses base integer + multiply/divide
instructions with no compressed instructions, which uses 4 KB stacks for
each task. From there, I created a new linker script
(`targets/riscv32im-risc0-zkvm-elf.ld`) which created a memory layer
identical to what risc0 expects. The final component was a new runtime
(`src/runtime/runtime_zkvm.go`), which implemented a few platform specific
syscalls for risc0 (putchar(), exit(), ticks(), and growHeap()).

When I tried to get this working last year, I had to also implement a number
of kernel syscalls (called ecalls in the platform [7]) to handle: read+write
to stdin/stdout, halting, and the journaling mechanism (the transcript of
execution committed to), which basically implement the kernel that the guest
executes in. Fast forward to 2026, and after pulling the latest version of
the repo, I realized that they now make a libzkvm_platform.a, which packages
up the kernel nicely to be linked against. So I threw out my custom kernel
code, and slotted that in instead.

The final component is a C FFI layer that enables me to use _both_ a Go
guest (the program to be proved) and a Go host (the thing that executes the
program and generates the final proof).

## BIP-32+Taproot zk-STARK Proof

With basic proofs working (like the classic: I know the factorization of a
number `n`), I was unblocked to generate the actual proof. The claim/proof
is represented with the following JSON artifact:
```
{
  "schema_version": 1,
  "image_id": "8a6a2c27dd54d8fa0f99a332b57cb105f88472d977c84bfac077cbe70907a690",
  "claim_version": 1,
  "claim_flags": 1,
  "require_bip86": true,
  "taproot_output_key": "00324bf6fa47a8d70cb5519957dd54a02b385c0ead8e4f92f9f07f992b288ee6",
  "path_commitment": "4c7de33d397de2c231e7c2a7f53e5b581ee3c20073ea79ee4afaab56de11f74b",
  "journal_hex": "010000000100000000324bf6fa47a8d70cb5519957dd54a02b385c0ead8e4f92f9f07f992b288ee64c7de33d397de2c231e7c2a7f53e5b581ee3c20073ea79ee4afaab56de11f74b",
  "journal_size_bytes": 72,
  "proof_seal_bytes": 1797880,
  "receipt_encoding": "borsh"
}
````

The `image_id` is basically a hash of the ELF, so you know what the prover
executed. There are then a few flags that control the claim version and
whether BIP-86 derivation is a part of the proof. BIP-86 was only adopted
post-Taproot, so if you have an existing BIP-44 path, you can instead opt to
claim that instead. The Taproot key we're generating the proof against is
also part of the _public data_, as it sits plainly on the chain for all to
see. We then also include a `path_commitment`, which is a commitment to the
exact BIP 86 path that the prover used. Finally, we also commit to the
journal hex, which is basically a commitment to the public claim.

Assuming you've built the project, then you can generate the proof (even
passing in an arbitrary BIP-32 seed and derivation path with)
```
make prove GO_GOROOT=/path/to/go1.24.4
```

Then verify it with:
```
make verify GO_GOROOT=/path/to/go1.24.4
```

The default prove target writes:
  * ./artifacts/bip32-test-vector.receipt
  * ./artifacts/bip32-test-vector.claim.json

The receipt is the STARK proof artifact. claim.json is the stable,
human-readable description of the public statement being proved.

## Application to a Future Keyspend Disabling Soft fork

As mentioned above, assuming the community is forced to deploy a keyspend
disabling soft fork in the future, we can also deploy some variant of
this proof to enable both BIP-86 wallets, and also any BIP-32 wallet, to
sweep their funds into a new PQ output.

In 2026, we've shown that this is achievable using 2 year old consumer
hardware. I don't doubt that the upcoming advancements (eg: photonics, new
flavor of high bandwidth memory, etc) in hardware (driven by the fierce AI
race) will make such a proof even more feasible.

One thing to note is that this proof has a few layers of indirection,
mainly the RISC-V layer that adds overhead which increase the total amount
of steps, and therefore the size of the proof. A production grade
deployment would likely instead hand roll a custom STARK proof for this
exact statement, to achieve a faster and smaller proof).

# Future Work

In terms of future work, there're a number of interesting following up
projects that can be pursued from here.

One basic one is that the current proof doesn't actually commit to a
spending txid and/or sighash. That can be trivially incorporated into the
proof. Going a step further, the execution of the guest program can even
_generate_ a valid schnorr signature to permit spending.

Looking to the memory+computational requirements necessary to generate the
proof, I've left two low hanging fruits:

 1. First, we can speed up the Elliptic Curve operations the proof requires
    (scalar base mult, then addition, or more performantly Double Scalar
    Multiplication via the Strauss-Shamir trick). For this we can use the
    syscalls/precompile in the risc0 env for big integer arithmetic:
    sys_bigint and sys_bigint2. With this, the guest calls into the kernel
    to use an optimized/accelerated circuit for the modular arithmetic,
    reducing cycles, steps, and thus proof size.

 2. Second right now, the entire claim is a single proof. Instead, we can
    first break that up using their recursive proof/composition syscalls:
    sys_verify_integrity+sys_verify_integrity2. We can then assembled a
    series of these proofs into a _single_ statement, which can save block
    space by aggregating N proofs into a single proof.

-- Laolu

[1]: https://tinygo.org/

[2]: https://risczero.com/

[3]: https://eprint.iacr.org/2025/1307

[4]: https://eprint.iacr.org/2023/362

[5]: https://microsoft.github.io/Picnic/

[6]: https://en.wikipedia.org/wiki/BHT_algorithm

[7]: https://github.com/Roasbeef/go-zkvm/blob/main/docs/ecall-reference.md

conduition

unread,
12:10 PM (6 hours ago) 12:10 PM
to Olaoluwa Osuntokun, Bitcoin Development Mailing List
Hi Laolu,

Great work getting this working in the real world. I've heard many people on delving and the mailing list conjecture based on this idea, but you're the first person i've seen who's willing to put their money where their mouth is, and actually build a prototype. Bravo!

It seems to me the circuit (guest program) could be simplified. Notice how the guest code computes the entire HD wallet key path, including hardened and non-hardened derivation steps, and also computes the taproot output key with key-tweaking. I'd argue these steps are extraneous to the core hard relation you want the STARK to prove, and could be safely removed to reduce proof size and improve performance.

In reality, you needn't go so far as to prove (1) "I know a BIP39 seed which derives this taproot output key". You need only prove this much more general statement (2): "I know a BIP32 xpriv which derives this xpub via one or more hardened steps". The latter statement (2) still cannot be forged by a quantum adversary even if they know your account-level xpub, but it entails far less computation to prove and verify. The rest of the original statement (1) can be done externally outside the circuit.

Example. If i have a wallet with a taproot address at m/86'/0'/0'/1/2​, I could prove I know the xpriv at m/86'/0'​ which derives the xpub at m/86'/0'/0'​. Then I provide the remaining key path elements /1/2​ in the witness. Note, i do not mean we derive the xpriv at m/86'/0'​ inside the guest program. I mean the prover derives m/86'/0'​ first (in the host), and then writes that xpriv into the guest program's inputs. The guest program derives and outputs the xpub at m/86'/0'/0'​. The verifier may check the STARK output (xpub) is correctly computed, then use the given key-path to manually derive the taproot address from the xpub themselves, outside the circuit, and validate that address against the UTXO i'm spending. The verifier thus has confirmed the prover knew an xpriv which (through a hardened derivation step) derives the correct taproot output key.

This change significantly reduces the size of the circuit. From a glance, I see the original guest program performs 6 HMAC-SHA512 calls (1 for the master key, 5 for the BIP32 derivation steps), two SHA256 compression calls (for the taptweak hash), and two point multiplications. With this simplified variant, we are invoking only a single HMAC-SHA512 call and a single point multiplication. I can't say for sure, but I expect this will improve your proof size and runtime significantly.

This change also makes the circuit more generally applicable to other rescue contexts. For instance, it could be applied to BIP340 xonly keys inside a taproot script tree, or in a P2(W)SH address to an ECDSA public key, or to P2(W)PKH addresses.

Concerned about publishing xpubs? Remember that we are assuming regular EC spending is locked in this context, so it is safe-ish to share account xpubs with quantum attackers. At best the xpub can be used for surveillance but not forgery. If one would prefer not to share the account-level xpub on-chain for privacy reasons, the proof could be extended to also derive the unhardened child xpub at /1/2​ inside the guest program (but we still do not need to do the taproot key tweaking in the guest program).

We should also talk scaling efficiency. Given the cost of STARKs, this style of proof should be able to authorize spends for more than one UTXO. Say you have a wallet with 10 different UTXOs held by distinct addresses in the same BIP44 account. One single STARK proof could authorize spending all 10 of them, by simply committing all 10 input signature hashes into the journal, and labeling the inputs with the corresponding 10 BIP32 key paths somehow. The verifier would need to check the proof only once and not 10 times. The 10 UTXO spends could be validated using the common xpub from the STARK proof's journal.

For a slightly related work proving a similar relation for hashed addresses, using different STARK technology stacks, see this delving post.

However, all this said, my personal preference for long-term procrastinator rescue is still for commit/reveal strategies which prove essentially the same statement about BIP32 in a two-step procedure. They get the job done with much lighter cryptographic machinery and much smaller witnesses: a few hundred bytes over two transactions, compared to a few million bytes in one transaction with STARKs. Boris Nagaev and I discussed this on the list a while back. That said, commit/reveal requires more careful design and seems to demand the use of external quantum-safe coins to make the commitment in the first place, so perhaps the cost would be worth it to some people? IDK. What do you think of commit/reveal compared to STARKs for this purpose?

regards,
conduition

--
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/CAO3Pvs_PciUi%2BzBrCps3acO14sgeHVUANx9w6TVwUf_AYcd_qQ%40mail.gmail.com.

publickey - conduition@proton.me - 0x474891AD.asc
signature.asc

conduition

unread,
4:30 PM (2 hours ago) 4:30 PM
to conduition, Olaoluwa Osuntokun, Bitcoin Development Mailing List
Oh, I've been a fool, a foolish fool.

We don't even need to do point multiplication in the circuit at all.

I'm amending my prior suggestion slightly: The circuit (guest program) could take in an xpriv (e.g. at m/86'/0'​) and output a child xpriv (e.g. at m/86'/0'/0'​) to the journal (instead of outputting a child xpub). 

This is safe because remember, EC spending has been disabled in this context, and to a quantum attacker, an xpub is computationally equivalent to its xpriv. So why bother hiding it? The child xpriv doesn't give an observer anything they can't already do with the equivalent xpub. 

The guest program then is basically the BIP32 CKDpriv algorithm, restricted to a single hardened derivation step. The verifier gets the child xpriv, but can't use it to forge new proofs. Honest verifiers use the xpriv to derive the child address(es) as suggested in my last message, to authenticate spending.

Designing the guest program like this will massively reduce your circuit complexity, because EC point multiplication is wayyyyy harder for the RISC0 compiler to arithmetize than a simple hash function. In my prior work with RISC0, I made a guest program which ran a SHA256 hash and an EC point multiplication. I found that pruning EC point arithmetic from my guest program improved prover runtime by a factor of over 100x.

If I am not fever-dreaming and this is indeed possible, then the new circuit's complexity will be dominated not by point multiplication, but by the HMAC-SHA512 call. Our new task is then to figure out how much we can internally optimize the HMAC-SHA512 call for STARK proving. Here's a few ideas.

If you bust open HMAC-SHA512, it looks like this:

HMAC_SHA512 = SHA512((K⊕0x5c) || SHA512((K⊕0x36) || msg))​ 

...where in the context of BIP32 hardened CKD, the HMAC key K​ is the chaincode (padded with zeros to 128 bytes) and msg = (0x00 || sk || i) is the parent secret key and child index. 

Since len(K) = 128​ is the SHA512​ block size, we need a total of 4 SHA512 compression calls: 
  1. to compress (K⊕0x36)
  2. to compress the msg​ (and SHA512 padding/length)
  3. to compress (K⊕0x5c), and 
  4. a final compression call to tie it all together. 

The output of that last compression call is partitioned into the child chaincode, and a key delta which is added to the parent secret key (modulo the curve order), producing the child EC secret key. This last step is arithmetically simple; the SHA512 calls are where most of the arithmetic complexity lies.

The question then becomes, which of these compression calls can be done outside the circuit, and which are truly essential for security? 

Note how the parent secret key is the most important piece for soundness. The circuit needs to prove the parent secret key existed in the hash function preimage, and is correctly related to the child secret key via modular addition. So compression call (2) seems unavoidable. The others are less rigid.

I'd argue that if we really dig into the hard relation we're trying to prove here, we can reduce it to this statement:

Given a child xpriv with secret key k​, chaincode c​ and index i​, I know a preimage x​ and secret key sk such that:

I <- SHA512(<something> || SHA512(<something> || 0x00 || sk || i)​)
c == I[:32]
k == int(I[32:]) + sk % n

Seeing as the <something>​ slots are arbitrary, and we know in BIP32 they are always exactly one-block long, it seems easy to throw out the compression calls (1) and (3). The host can precompute the relevant SHA512 midstates outside the circuit, and pass the midstates into the guest program as secret inputs. The tradeoff is that this permits malicious provers the flexibility of choosing their starting midstates (though hash input length can be fixed at 192 bytes). I'm not entirely sure if this meaningfully weakens the verifier's soundness. Ethan Heilman might have opinions on this, he knows a lot more about attacking hash functions than I do. Intuitively, I doubt sampling random SHA512 midstates is that much better than sampling a random HMAC key (chaincode) K​ and computing the resulting midstates.

This reduces our circuit to, i think, the minimum acceptable security floor for provers: two SHA512 compression calls, which commit to a parent secret key.


regards,
conduition
publickey - conduition@proton.me - 0x474891AD.asc
signature.asc
Reply all
Reply to author
Forward
0 new messages