On (in)ability to embed data into Schnorr

288 views
Skip to first unread message

waxwing/ AdamISZ

unread,
Oct 1, 2025, 3:50:38 PM (9 days ago) Oct 1
to Bitcoin Development Mailing List
Hi all,


Here I'm analyzing whether the following statement is true: "if you can embed data into a (P, R, s) tuple (Schnorr pubkey and signature, BIP340 style), without grinding or using a sidechannel to "inform" the reader, you must be leaking your private key".

See the abstract for a slightly more fleshed out context.

I'm curious about the case of P, R, s published in utxos to prevent usage of utxos as data. I think this answers in the half-affirmative: you can only embed data by leaking the privkey so that it (can) immediately fall out of the utxo set.

(To emphasize, this is different to the earlier observations (including by me!) that just say it is *possible* to leak data by leaking the private key; here I'm trying to prove that there is *no other way*).

However I still am probably in the large majority that thinks it's appalling to imagine a sig attached to every pubkey onchain.

Either way, I found it very interesting! Perhaps others will find the analysis valuable.

Feedback (especially of the "that's wrong/that's not meaningful" variety) appreciated.

Regards,
AdamISZ/waxwing

Greg Maxwell

unread,
Oct 1, 2025, 7:04:51 PM (9 days ago) Oct 1
to waxwing/ AdamISZ, Bitcoin Development Mailing List
Intuitively it sounds likely, -- just in that the available values are a image on the curve and a value summed with a hash dependent on everything else.  I think it would be hard to prove.

But is it even really worth the analysis when grinding gets you a 12% embedding rate in that signature at not that significant cost? (because you can independently grind the nonce and signature itself, or nonce and pubkey) -- and when beyond the cost of the additional signature (making the output 3x its cost) requiring signing when forming the address completely kills public derivation, multisig with cold keys. etc?  ... and then any of whatever spam concerns people have would likely be exacerbated by the spammers using more resources due to the embedding rate?

Also re private key leaking an utxo set, well not so if it's part of an explicit multisig. E.g. 2 of 2 with leaked key and a secure one.




--
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/0f6c92cc-e922-4d9f-9fdf-69384dcc4086n%40googlegroups.com.

Andrew Poelstra

unread,
Oct 1, 2025, 7:20:25 PM (9 days ago) Oct 1
to Bitcoin Development Mailing List
On Wed, Oct 01, 2025 at 10:10:16PM +0000, Greg Maxwell wrote:
> Intuitively it sounds likely, -- just in that the available values are a
> image on the curve and a value summed with a hash dependent on everything
> else. I think it would be hard to prove.
>
> But is it even really worth the analysis when grinding gets you a 12%
> embedding rate in that signature at not that significant cost? (because you
> can independently grind the nonce and signature itself, or nonce and
> pubkey) -- and when beyond the cost of the additional signature (making the
> output 3x its cost) requiring signing when forming the address completely
> kills public derivation, multisig with cold keys. etc? ... and then any of
> whatever spam concerns people have would likely be exacerbated by the
> spammers using more resources due to the embedding rate?
>

Some time ago, I talked to Ethan Heilman about this in the context of PQ
signatures, and he made the interesting point that you can think of
12% embedding rate as representing an 8x discount for real signatures vs
embedded data. And that maybe that's okay, incentive-wise.

Needing to grind out portions of 32-byte blocks probably also reduces
the risk from people trying to embed virus signatures or other malicious
data.

As for waxwing's original question -- I also intuitively believe that
the only way to embed data in a Schnorr signature is by grinding or
revealing your key ... and I'm not convinced you can do it even by
revealing your key. (R is an EC point that you can't force to be any
particular value except by making a NUMS point, which you then can't use
to sign; and s = k + ex where e is a hash of kG (among other things)
so I don't think you can force that value at all.)

--
Andrew Poelstra
Director, Blockstream Research
Email: apoelstra at wpsoftware.net
Web: https://www.wpsoftware.net/andrew

The sun is always shining in space
-Justin Lewis-Webster

signature.asc

waxwing/ AdamISZ

unread,
Oct 1, 2025, 9:49:23 PM (8 days ago) Oct 1
to Bitcoin Development Mailing List
Hi Greg, Andrew, list,

Answers to Greg then Andrew:

> E.g. 2 of 2 with leaked key and a secure one.

That's a very good point! I was narrowly focused on the signature scheme, but Bitcoin is more than a signature scheme!

>   But is it even really worth the analysis when grinding gets you a 12% embedding rate in that signature at not that significant cost? (because you can independently grind the nonce and signature itself, or nonce and pubkey) -- and when beyond the cost of the additional signature (making the output 3x its cost) requiring signing when forming the address completely kills public derivation, multisig with cold keys. etc?  ... and then any of whatever spam concerns people have would likely be exacerbated by the spammers using more resources due to the embedding rate?

I certainly don't think it's worth *doing* (hence my use of the term "appalling idea" :) ), as per the things you mention there.

I wrote the document as a mostly academic investigation. It would be nice to be surer what the limits are, although I suspect we're all reasonably confident of what is/isn't possible.

>  12% embedding rate
Where do you get that number from? 33% for embedding 256 bits in (P, R, s) (but as per this discussion, according to me, at the cost of key leakage). If we include the other bytes in a (taproot anyway) utxo that's not much less, I guess 30% ish. I could try to guess but it'd be easier if you told me :)

to Andrew:

> As for waxwing's original question -- I also intuitively believe that
the only way to embed data in a Schnorr signature is by grinding or
revealing your key ... and I'm not convinced you can do it even by
revealing your key. (R is an EC point that you can't force to be any
particular value except by making a NUMS point, which you then can't use
to sign; and s = k + ex where e is a hash of kG (among other things)
so I don't think you can force that value at all.)

Ah, I see what you're saying, it's a subtly different target. ECDSA allows that s be controlled, Schnorr doesn't, but I set up the game as "adversary must be able to publish a function f such that f(any published R, s, (e)) = data", i.e. not just f = identity function. That was why I wrote in the introduction (copied here for convenience:)

"Data can effectively be embedded in signatures by using a publically-inferrable nonce, as was noted \href{https://groups.google.com/g/bitcoindev/c/d6ZO7gXGYbQ/m/Y8BfxMVxAAAJ}{here} and was later fleshed out in detail \href{https://blog.bitmex.com/the-unstoppable-jpg-in-private-keys/}{here} (\textbf{note}: both these sources discuss nonce-reuse but it's worse than that: any \emph{publically inferrable} nonce can achieve the same thing, such as, the block hash of the parent block; this will have the same embedding rate and cannot be disallowed)."

It may be a different target "politically" :) but I was only thinking technically, in terms of how people might end up using outputs. From a technical point of view it makes no difference if f is the identity or something more complex (as long as it's efficiently computable).

Cheers,
AdamISZ/waxwing

waxwing/ AdamISZ

unread,
Oct 2, 2025, 12:17:21 PM (8 days ago) Oct 2
to Bitcoin Development Mailing List
> >  12% embedding rate
> Where do you get that number from? 33% for embedding 256 bits in (P, R, s) (but as per this discussion, according to me, at the cost of key leakage). If we include the other bytes in a (taproot anyway) utxo that's not much less, I guess 30% ish. I could try to guess but it'd be easier if you told me :)

Thinking about it again: to publish data, you have to publish a transaction! I guess the most economical, paying taproot to taproot, is about 192 bytes with script path plus the posited extra 64 for the (R,s) in the output, so yeah that'd be 32 out of 256, 12.5%. Isn't the figure a bit different for key path though, because no control block? Well it hardly matters, it's some small fraction in that range.

An interesting mechanical detail in this near-absurd scenario is that if you wanted to repeatedly publish off the same (presumably a few multiples of dust level) output, you couldn't also do the leak single key thing, since you'd lose control to re-spend. So that'd place us in the "explicit multisig" scenario that Greg mentioned, which I think would only make sense with legacy script? Kind of a different scenario, also it would be really weird to update legacy script to take into account a new "you must sign the pubkeys" rule. Though I guess in this fictional scenario, it might happen like that. If you did do it with legacy, you'd be publishing bare 2 of 2 multisig. If you did it with taproot due to how that works, the script is not published until the output is spent, so I think that's outside what I was considering ("data in utxo set"). (I guess you could also use something like a hash lock which might be more efficient). So anyway if you wanted to do this repeatedly and minimize cost, for whatever strange reason, you'd be adding another 50-100 bytes each time bringing that % down to like 10% or less.

But that all became way too hypothetical to even analyze properly :)

Anyway just to reemphasize I certainly wasn't advocating this sig-attaching system, but it seems important to know what the result of it would be: we would still not have changed the obvious reality that embedding data in witness gives more space for data, and is more economical, and we would only reduce by a big factor how much can be embedded in outputs (anything from 8% to 15% embedding rate seems possible depending on the hypothetical details), while having to screw up much of Bitcoin's functionality in the process.

Cheers,
AdamISZ/waxwing

Greg Maxwell

unread,
Oct 2, 2025, 5:59:41 PM (8 days ago) Oct 2
to waxwing/ AdamISZ, Bitcoin Development Mailing List
I just meant in the purely grinding non-key leaking case you could get 4 bytes into the nonce pretty easily and 4 bytes into either the pubkey or signature out of a 64 byte signature.  Obviously the delivered embedding rate in a whole txn will be lower, but maybe not that much thanks to multisig outputs.


--
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+...@googlegroups.com.

Peter Todd

unread,
Oct 3, 2025, 11:51:52 AM (7 days ago) Oct 3
to waxwing/ AdamISZ, Bitcoin Development Mailing List
On Wed, Oct 01, 2025 at 07:24:50AM -0700, waxwing/ AdamISZ wrote:
> Hi all,
>
> https://github.com/AdamISZ/schnorr-unembeddability/
>
> Here I'm analyzing whether the following statement is true: "if you can
> embed data into a (P, R, s) tuple (Schnorr pubkey and signature, BIP340
> style), without grinding or using a sidechannel to "inform" the reader, you
> must be leaking your private key".
>
> See the abstract for a slightly more fleshed out context.
>
> I'm curious about the case of P, R, s published in utxos to prevent usage
> of utxos as data. I think this answers in the half-affirmative: you can
> only embed data by leaking the privkey so that it (can) immediately fall
> out of the utxo set.
>
> (To emphasize, this is different to the earlier observations (including by
> me!) that just say it is *possible* to leak data by leaking the private
> key; here I'm trying to prove that there is *no other way*).

You can probably use timelock encryption to ensure that the leak of the private
key only happens in the future, after the funds are recovered by the owner in a
subsequent transaction.

--
https://petertodd.org 'peter'[:-1]@petertodd.org
signature.asc

waxwing/ AdamISZ

unread,
Oct 4, 2025, 2:40:43 AM (6 days ago) Oct 4
to Bitcoin Development Mailing List
Hi Peter,

> You can probably use timelock encryption to ensure that the leak of the private
key only happens in the future, after the funds are recovered by the owner in a
subsequent transaction.

Another very interesting point, there, to get around the issue of key leakage ... albeit I don't see a usecase, maybe I'm just not imaginative enough, very possible.

If someone wants to keep something in the utxo set "forever", it doesn't help. If they want the property of "immediately accessible in the utxo set" (like "deposit into some fancy system with a blob of data"; I emphasize "deposit" because that would explain why not "just put it in the witness", your current outputs don't support that; correct me if my reasoning is wrong here), then I guess they don't get that, either: the data is accessible "intermediate term" instead.

Cheers,
AdamISZ/waxwing

waxwing/ AdamISZ

unread,
Oct 6, 2025, 9:06:29 AM (4 days ago) Oct 6
to Bitcoin Development Mailing List
Yes, sorry, reading fail on my part (somehow missed that you were explicitly referring to grinding in the comment).

Still don't think the 12% figure is a good one though? in (P,R,s) it's 8 out of 96 (and as discussed, worse if whole tx is (realistically) included), 1/4 the rate you get from direct key leakage. (Plus the perhaps trivial point that it does actually require work, which might conceivably matter at scale?). I'm not sure why one would not include P in the measure?

Even an explicit multisig that does not sacrifice control of the output would be of the order of double the embedding rate, without having to do work. (P,R,s x 2 = 192 and embed 32 for a 1/6 rate; vs. grinding all 4 P,R values for a 1/12 rate).

Anthony Towns

unread,
Oct 7, 2025, 5:38:40 AM (3 days ago) Oct 7
to waxwing/ AdamISZ, Bitcoin Development Mailing List
On Wed, Oct 01, 2025 at 07:24:50AM -0700, waxwing/ AdamISZ wrote:
> I'm curious about the case of P, R, s published in utxos to prevent usage
> of utxos as data. I think this answers in the half-affirmative: you can
> only embed data by leaking the privkey so that it (can) immediately fall
> out of the utxo set.

I think you can attack the setup here.

If you allow scriptPubKeys in the utxo set whose spending conditions
are HTLC/atomic-swap-like:

(pubkey A and preimage reveal of X)
OR (pubkey B and block height > H)

then you either set H to be arbitrarily far in the future and reveal
B's privkey, or choose an NUMS X with no known preimage, and reveal
A's privkey.

If you don't allow those things (eg, by requiring such constructions
also have a (pubkey musig(A,B)) path) then I think you rule out NUMS-IPK
constructions, and end up making things like vaults ("hotkey with delay,
coldkey anytime") difficult to send to ("I have to sign with my cold
key to request funds?"), or, depending on what the utxo R,s is signing,
encourage key reuse.

> (To emphasize, this is different to the earlier observations (including by
> me!) that just say it is *possible* to leak data by leaking the private
> key; here I'm trying to prove that there is *no other way*).

That seems right to me.

I think if the signature scheme supported pubkey recovery (ie, s*G = R +
H(R,m)*P, and our "m" didn't commit to P as well), you could get around
this by just having P be the data, with no one, including the "signer"
able to recover the private key.

> However I still am probably in the large majority that thinks it's
> appalling to imagine a sig attached to every pubkey onchain.

I think the only thing achieved by embedding data in the utxo set (vs
an OP_RETURN output or witness data) is to bloat the utxo set; and if
that's the goal, it can equally easily be done with spendable outputs
that the attacker simply chooses not to ever spend. So that doesn't seem
like a terribly interesting solution to anything.

As far as embedding data in signatures goes, I think the following
scheme would allow you to publish data in a cryptographically-secure way,
with minimal lost funds:

0) Setup secret keys p and q, and a 32-byte secret k. H(a,b,..) is sha256
of a,b,.. concatenated.

1) Split your data into N 31 byte blocks, a1, a2, .., aN.

2) Calculate r0 as H(k*G). Calculate r1, .., rN as:

r(i+1) = H(p, r(i)) + a(i)

3) Sign N+1 transactions in a chain spending pubkey p*G, using rN, r(N-1),
.., r1, r0 as nonces. All but the final tx should pay to a p*G output to
continue the chain; the final output should pay to q*G instead.

4) Once all transactions are sufficiently confirmed, spend the final
output with k as the secret nonce (and hence R=k*G as the public
nonce).

Recover the data using the following process:

1) From the final transaction, recover R=k*G, and calculate r0 as H(R).
Recover p from the previous transaction, p = (s0-r0)/H(r0*G, P,mi).

2) Recover ri from each signature; ri = si - H(Ri, P, mi)*p. Recover
the data ai as ai = ri - H(p,r(i-1)).

Dealing with the points being 32-bytes might require carrying over a
sign-bit; but that should be possible in the spare ~7 bits since each
block was only 31 bytes not 32 bytes. Left as an exercise for the
reader, etc.

I believe that the privkey p is secure prior to k*G being revealed,
since all the nonces are distinct hashes seeded by that privkey; and q
remains secure because k is never revealed.

If you wanted to not reuse the pubkey p*G repeatedly, you could tweak it
to be p0 = p, p(i+1) = p + H(k*G, p(i)), or similar. That would allow you
to use an n-of-n multisig to get multiple blocks in a single transaction
without seeming weird, eg.

I believe the only way to distinguish this from a normal transaction
pattern where a wallet has a change output, is via the final transaction
that reveals k*G, and detecting the relationship between k*G and the
spending conditions of the transaction that created the coin being spent.
That's already somewhat expensive to check for every spend, but could
be made more so by publishing k*G on some other medium (ie the data is
in the blockchain, but you obtain the txid and key to find the data
from elsewhere), or by revealing (k+x)*G where x is a random 20-bit
(?) number, and a significant but tractable amount of grinding is needed
to recover the desired k*G and decode the data -- the idea being that
that is tractable for someone who knows there is data at that txid,
but not tractable when performed on every signature in the blockchain
in order to filter data publication.

I think if you did 20 such transactions per block, each spending a single
20-of-20 tapscript multisig, you'd get 12400 bytes of data per block
(without violating standardness constraints), at a cost of ~11800vb, so
much less efficient than inscriptions, but slightly more efficient than
OP_RETURN, and significantly less detectable than either. I think Knots
default policy currently allows up to 50-of-50 multisig in tapscript,
which would give you 31kB of data in ~26.6kvB of tx weight in a block.

If you're regularly making payments from a particular wallet, I think
that procedure would allow you to encode data in your change outputs at
the rate of 32B/tx for no additional cost. Though the data would only be
recoverable once complete, and it's probably worth noting that I haven't
provided any security proofs...

Cheers,
aj

waxwing/ AdamISZ

unread,
Oct 7, 2025, 9:52:47 AM (3 days ago) Oct 7
to Bitcoin Development Mailing List
Hi aj,

Interesting points! Answers inline.



On Tuesday, October 7, 2025 at 6:38:40 AM UTC-3 Anthony Towns wrote:
On Wed, Oct 01, 2025 at 07:24:50AM -0700, waxwing/ AdamISZ wrote:
> I'm curious about the case of P, R, s published in utxos to prevent usage
> of utxos as data. I think this answers in the half-affirmative: you can
> only embed data by leaking the privkey so that it (can) immediately fall
> out of the utxo set.

I think you can attack the setup here.

If you allow scriptPubKeys in the utxo set whose spending conditions
are HTLC/atomic-swap-like:

(pubkey A and preimage reveal of X)
OR (pubkey B and block height > H)

then you either set H to be arbitrarily far in the future and reveal
B's privkey, or choose an NUMS X with no known preimage, and reveal
A's privkey.

Yes. In the paper (and my OP email) I'm trying to narrow it down completely to a P, R, s structure. I guess if we try to be realistic about this "publish a signature in the output always" horrible scenario, it would have to just ditch the NUMS variant of taproot, and I agree, that is a very Bad Thing (TM). (uh sorry you discuss this in the next paragraph but, w/e).

Alternative examples like multisig or hash lock in script to get the data leakage without losing control of the output (necessarily) have been mentioned but I like your 2-branch setup as a good flexible example.

If you don't allow those things (eg, by requiring such constructions
also have a (pubkey musig(A,B)) path) then I think you rule out NUMS-IPK
constructions, and end up making things like vaults ("hotkey with delay,
coldkey anytime") difficult to send to ("I have to sign with my cold
key to request funds?"), or, depending on what the utxo R,s is signing,
encourage key reuse.

> (To emphasize, this is different to the earlier observations (including by
> me!) that just say it is *possible* to leak data by leaking the private
> key; here I'm trying to prove that there is *no other way*).

That seems right to me.

I think if the signature scheme supported pubkey recovery (ie, s*G = R +
H(R,m)*P, and our "m" didn't commit to P as well), you could get around
this by just having P be the data, with no one, including the "signer"
able to recover the private key.


Yes, basically. I discuss this in the paper w.r.t. ECDSA. Your description of the relevance of pubkey recovery is good, but there are some nuances. You can't quite (with ECDSA) get P to be the data and have a valid sig, but you can get 's' to be the data simply by backsolving for the private key x. Lack of "pubkey prefixing" in the very funky 'commitment to the nonce' in ECDSA causes that. And the second nuance, you did actually mention: you get "not leaking the key" for free, here. But it's still only a 32/96 bytes embedding rate though, the way I count it.
 
> However I still am probably in the large majority that thinks it's
> appalling to imagine a sig attached to every pubkey onchain.

I think the only thing achieved by embedding data in the utxo set (vs
an OP_RETURN output or witness data) is to bloat the utxo set; and if
that's the goal, it can equally easily be done with spendable outputs
that the attacker simply chooses not to ever spend. So that doesn't seem
like a terribly interesting solution to anything.

I think the logic of that is not quite right. Suppose I want to embed pictures into the unpruneable utxo set specifically (and not only 'in transactions'). The starting point here was me trying to write out how you can't embed data in known-privkey (Schnorr) P, R, s tuples.

And not only pictures; as Andrew pointed out above, there's always the concern of some kind of virus-y "naughty" data.
Very nice example. I am glad you took the trouble to write it out, because I agree that examples like that are worth working through because as you say they lean closer to being properly indistinguishable from ordinary transaction patterns.

My analysis was narrower: output-side embedding (in a theoretical future of P,R,s outputs). But that's a little confusing because (P, R, s) is still there whether some of it is put in witness or not. So everyone seems to agree that privkey reveal is necessary for that, but everyone is also pointing out that with Bitcoin's actual consensus scripting system, that doesn't quite mean what it seems! And the embedding rate is not very good. In this framing, not much has changed in your "chained" example: once the privkey p is revealed, you get the k value per chain link, so it's still roughly a 1/3 ratio, or more realistically, as you mention (and I did upthread), it's per *transaction* which is a much lower rate.

Your points about limits, standardness constraints are well taken; those are the kinds of things that do actually matter today, but I was not thinking about.


Anthony Towns

unread,
Oct 8, 2025, 4:45:06 AM (2 days ago) Oct 8
to waxwing/ AdamISZ, Bitcoin Development Mailing List
On Tue, Oct 07, 2025 at 05:05:24AM -0700, waxwing/ AdamISZ wrote:
> Yes, basically. I discuss this in the paper w.r.t. ECDSA. Your description
> of the relevance of pubkey recovery is good, but there are some nuances.
> You can't quite (with ECDSA) get P to be the data and have a valid sig, but
> you can get 's' to be the data simply by backsolving for the private key x.
> Lack of "pubkey prefixing" in the very funky 'commitment to the nonce' in
> ECDSA causes that. And the second nuance, you did actually mention: you get
> "not leaking the key" for free, here. But it's still only a 32/96 bytes
> embedding rate though, the way I count it.

You've got 4x 32-byte values to play with: s, r, p and m. The verification
equation determines one of these, reducing it to 3x. m isn't able to be
freely chosen, reducing it to 2x. And being able to reverse the equation
in order to calculate anything requires the receiver to know one of the
secrets, which reduces it to 1x. (Grinding can bump that back up to a
factor of 1.something) So that's the 32. On the other side, you need to
transmit everything but m which is otherwise determined by the setup,
so that's the 96.

> I think the logic of that is not quite right. Suppose I want to embed
> pictures into the unpruneable utxo set specifically (and not only 'in
> transactions').

Sure, but then I'll also suppose your goal is to harm Bitcoin by bloating
the utxo set. If that weren't one of your fundamental goals, you'd use
other, cheaper and easier, ways of encoding the data.

> Very nice example. I am glad you took the trouble to write it out, because
> I agree that examples like that are worth working through because as you
> say they lean closer to being properly indistinguishable from ordinary
> transaction patterns.

I think the (P,R,s) outputs could be an interesting design for a
non-programmable system that was intended purely for payments -- a
FEDwire/SWIFT replacement without the possibility of vaults, lightning,
etc. Presumably more mimblewimble friendly etc too. Presumably the "R,s"
values could also be a signature of P by the operator's well known pubkey,
giving you a KYC/CBDC-like system too.

You could get programmability back in this scenario by allow P to sign
a script, which you then satisfy, rather than signing a payment directly
(ie, the graftroot approach).

Anyway, once you make the system programmable in interesting ways, I
think you get data embeddability pretty much immediately, and then it's
just a matter of trading off the optimal encoding rate versus how easily
identifiable your transactions can be. Forcing data to be hidden at a
cost of making it less efficient just leaves less resources available
to other users of the system, though, which doesn't seem like a win in
any way to me.

> Your points about limits, standardness constraints are well taken; those
> are the kinds of things that do actually matter today, but I was not
> thinking about.

Note that I mentioned the standardness constraints not because they're
limits today, but rather because they reflect the form existing txs take,
so mimicing that form would allow txs embedding data via this scheme to
be difficult to distinguish from other txs, and hence equally difficult
to censor/filter.

Cheers,
aj

waxwing/ AdamISZ

unread,
Oct 8, 2025, 9:49:04 AM (2 days ago) Oct 8
to Bitcoin Development Mailing List
Answers inline.

On Wednesday, October 8, 2025 at 5:45:06 AM UTC-3 Anthony Towns wrote:
On Tue, Oct 07, 2025 at 05:05:24AM -0700, waxwing/ AdamISZ wrote:
> Yes, basically. I discuss this in the paper w.r.t. ECDSA. Your description
> of the relevance of pubkey recovery is good, but there are some nuances.
> You can't quite (with ECDSA) get P to be the data and have a valid sig, but
> you can get 's' to be the data simply by backsolving for the private key x.
> Lack of "pubkey prefixing" in the very funky 'commitment to the nonce' in
> ECDSA causes that. And the second nuance, you did actually mention: you get
> "not leaking the key" for free, here. But it's still only a 32/96 bytes
> embedding rate though, the way I count it.

You've got 4x 32-byte values to play with: s, r, p and m. The verification
equation determines one of these, reducing it to 3x. m isn't able to be
freely chosen, reducing it to 2x. And being able to reverse the equation
in order to calculate anything requires the receiver to know one of the
secrets, which reduces it to 1x. (Grinding can bump that back up to a
factor of 1.something) So that's the 32. On the other side, you need to
transmit everything but m which is otherwise determined by the setup,
so that's the 96.

Yeah I think so, roughly. It's not 100% watertight deductions but it seems correct from where I'm sitting.
(I would only nit that 'm' isn't in consideration as it's implicit, not published, in current signature usage; in a proposed signature-in-output, m would obviously be constrained to something with no wiggle room (and including P if we used ECDSA, but we wouldn't).
 
> I think the logic of that is not quite right. Suppose I want to embed
> pictures into the unpruneable utxo set specifically (and not only 'in
> transactions').

Sure, but then I'll also suppose your goal is to harm Bitcoin by bloating
the utxo set. If that weren't one of your fundamental goals, you'd use
other, cheaper and easier, ways of encoding the data.

But the goal can be simply this: my data is more marketable if I can plausibly claim that it's embedded into bitcoin nodes for eternity (whether true or not, it's marketable). AFAIK this is indeed a thing, in the real world.
 


> Very nice example. I am glad you took the trouble to write it out, because
> I agree that examples like that are worth working through because as you
> say they lean closer to being properly indistinguishable from ordinary
> transaction patterns.

I think the (P,R,s) outputs could be an interesting design for a
non-programmable system that was intended purely for payments -- a
FEDwire/SWIFT replacement without the possibility of vaults, lightning,
etc. Presumably more mimblewimble friendly etc too. Presumably the "R,s"
values could also be a signature of P by the operator's well known pubkey,
giving you a KYC/CBDC-like system too.

You could get programmability back in this scenario by allow P to sign
a script, which you then satisfy, rather than signing a payment directly
(ie, the graftroot approach).


I like this line of thought, and indeed I'd forgotten about graftroot and the whole delegation angle.
(and just to repeat the point made earlier: we'd only need to sign over a message including P for ecdsa, but we wouldn't use that.)
I guess if you're discussing a hypothetical permissioned system though it's a whole different world, so I'm going to sidestep that one.

But it does sound interesting to do delegation and then ZkPOK outputs even in a Bitcoin world. Albeit it's a long way from where we are today.

Of course we're firmly pie in the sky again here, but I think it helps inform thinking about Bitcoin as it is concretely today.
 
Anyway, once you make the system programmable in interesting ways, I
think you get data embeddability pretty much immediately,

My main motivation in discussing this was indeed the extent to which you get embeddability even without any programmability; as we've established, it's not zero, and it's not restricted to grinding (exponential work). But in *pure* unprogrammable, ZkPOK outputs of form P, R,s and nothing else allowed, it *is*, I'm claiming, restricted to key leakage and doesn't surpass 33%.

and then it's
just a matter of trading off the optimal encoding rate versus how easily
identifiable your transactions can be. Forcing data to be hidden at a
cost of making it less efficient just leaves less resources available
to other users of the system, though, which doesn't seem like a win in
any way to me.

> Your points about limits, standardness constraints are well taken; those
> are the kinds of things that do actually matter today, but I was not
> thinking about.

Note that I mentioned the standardness constraints not because they're
limits today, but rather because they reflect the form existing txs take,
so mimicing that form would allow txs embedding data via this scheme to
be difficult to distinguish from other txs, and hence equally difficult
to censor/filter.

I see. Good point.
 
Reply all
Reply to author
Forward
0 new messages