On (in)ability to embed data into Schnorr

599 views
Skip to first unread message

waxwing/ AdamISZ

unread,
Oct 1, 2025, 3:50:38 PMOct 1
to Bitcoin Development Mailing List
Hi all,


Here I'm analyzing whether the following statement is true: "if you can embed data into a (P, R, s) tuple (Schnorr pubkey and signature, BIP340 style), without grinding or using a sidechannel to "inform" the reader, you must be leaking your private key".

See the abstract for a slightly more fleshed out context.

I'm curious about the case of P, R, s published in utxos to prevent usage of utxos as data. I think this answers in the half-affirmative: you can only embed data by leaking the privkey so that it (can) immediately fall out of the utxo set.

(To emphasize, this is different to the earlier observations (including by me!) that just say it is *possible* to leak data by leaking the private key; here I'm trying to prove that there is *no other way*).

However I still am probably in the large majority that thinks it's appalling to imagine a sig attached to every pubkey onchain.

Either way, I found it very interesting! Perhaps others will find the analysis valuable.

Feedback (especially of the "that's wrong/that's not meaningful" variety) appreciated.

Regards,
AdamISZ/waxwing

Greg Maxwell

unread,
Oct 1, 2025, 7:04:51 PMOct 1
to waxwing/ AdamISZ, Bitcoin Development Mailing List
Intuitively it sounds likely, -- just in that the available values are a image on the curve and a value summed with a hash dependent on everything else.  I think it would be hard to prove.

But is it even really worth the analysis when grinding gets you a 12% embedding rate in that signature at not that significant cost? (because you can independently grind the nonce and signature itself, or nonce and pubkey) -- and when beyond the cost of the additional signature (making the output 3x its cost) requiring signing when forming the address completely kills public derivation, multisig with cold keys. etc?  ... and then any of whatever spam concerns people have would likely be exacerbated by the spammers using more resources due to the embedding rate?

Also re private key leaking an utxo set, well not so if it's part of an explicit multisig. E.g. 2 of 2 with leaked key and a secure one.




--
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/0f6c92cc-e922-4d9f-9fdf-69384dcc4086n%40googlegroups.com.

Andrew Poelstra

unread,
Oct 1, 2025, 7:20:25 PMOct 1
to Bitcoin Development Mailing List
On Wed, Oct 01, 2025 at 10:10:16PM +0000, Greg Maxwell wrote:
> Intuitively it sounds likely, -- just in that the available values are a
> image on the curve and a value summed with a hash dependent on everything
> else. I think it would be hard to prove.
>
> But is it even really worth the analysis when grinding gets you a 12%
> embedding rate in that signature at not that significant cost? (because you
> can independently grind the nonce and signature itself, or nonce and
> pubkey) -- and when beyond the cost of the additional signature (making the
> output 3x its cost) requiring signing when forming the address completely
> kills public derivation, multisig with cold keys. etc? ... and then any of
> whatever spam concerns people have would likely be exacerbated by the
> spammers using more resources due to the embedding rate?
>

Some time ago, I talked to Ethan Heilman about this in the context of PQ
signatures, and he made the interesting point that you can think of
12% embedding rate as representing an 8x discount for real signatures vs
embedded data. And that maybe that's okay, incentive-wise.

Needing to grind out portions of 32-byte blocks probably also reduces
the risk from people trying to embed virus signatures or other malicious
data.

As for waxwing's original question -- I also intuitively believe that
the only way to embed data in a Schnorr signature is by grinding or
revealing your key ... and I'm not convinced you can do it even by
revealing your key. (R is an EC point that you can't force to be any
particular value except by making a NUMS point, which you then can't use
to sign; and s = k + ex where e is a hash of kG (among other things)
so I don't think you can force that value at all.)

--
Andrew Poelstra
Director, Blockstream Research
Email: apoelstra at wpsoftware.net
Web: https://www.wpsoftware.net/andrew

The sun is always shining in space
-Justin Lewis-Webster

signature.asc

waxwing/ AdamISZ

unread,
Oct 1, 2025, 9:49:23 PMOct 1
to Bitcoin Development Mailing List
Hi Greg, Andrew, list,

Answers to Greg then Andrew:

> E.g. 2 of 2 with leaked key and a secure one.

That's a very good point! I was narrowly focused on the signature scheme, but Bitcoin is more than a signature scheme!

>   But is it even really worth the analysis when grinding gets you a 12% embedding rate in that signature at not that significant cost? (because you can independently grind the nonce and signature itself, or nonce and pubkey) -- and when beyond the cost of the additional signature (making the output 3x its cost) requiring signing when forming the address completely kills public derivation, multisig with cold keys. etc?  ... and then any of whatever spam concerns people have would likely be exacerbated by the spammers using more resources due to the embedding rate?

I certainly don't think it's worth *doing* (hence my use of the term "appalling idea" :) ), as per the things you mention there.

I wrote the document as a mostly academic investigation. It would be nice to be surer what the limits are, although I suspect we're all reasonably confident of what is/isn't possible.

>  12% embedding rate
Where do you get that number from? 33% for embedding 256 bits in (P, R, s) (but as per this discussion, according to me, at the cost of key leakage). If we include the other bytes in a (taproot anyway) utxo that's not much less, I guess 30% ish. I could try to guess but it'd be easier if you told me :)

to Andrew:

> As for waxwing's original question -- I also intuitively believe that
the only way to embed data in a Schnorr signature is by grinding or
revealing your key ... and I'm not convinced you can do it even by
revealing your key. (R is an EC point that you can't force to be any
particular value except by making a NUMS point, which you then can't use
to sign; and s = k + ex where e is a hash of kG (among other things)
so I don't think you can force that value at all.)

Ah, I see what you're saying, it's a subtly different target. ECDSA allows that s be controlled, Schnorr doesn't, but I set up the game as "adversary must be able to publish a function f such that f(any published R, s, (e)) = data", i.e. not just f = identity function. That was why I wrote in the introduction (copied here for convenience:)

"Data can effectively be embedded in signatures by using a publically-inferrable nonce, as was noted \href{https://groups.google.com/g/bitcoindev/c/d6ZO7gXGYbQ/m/Y8BfxMVxAAAJ}{here} and was later fleshed out in detail \href{https://blog.bitmex.com/the-unstoppable-jpg-in-private-keys/}{here} (\textbf{note}: both these sources discuss nonce-reuse but it's worse than that: any \emph{publically inferrable} nonce can achieve the same thing, such as, the block hash of the parent block; this will have the same embedding rate and cannot be disallowed)."

It may be a different target "politically" :) but I was only thinking technically, in terms of how people might end up using outputs. From a technical point of view it makes no difference if f is the identity or something more complex (as long as it's efficiently computable).

Cheers,
AdamISZ/waxwing

waxwing/ AdamISZ

unread,
Oct 2, 2025, 12:17:21 PMOct 2
to Bitcoin Development Mailing List
> >  12% embedding rate
> Where do you get that number from? 33% for embedding 256 bits in (P, R, s) (but as per this discussion, according to me, at the cost of key leakage). If we include the other bytes in a (taproot anyway) utxo that's not much less, I guess 30% ish. I could try to guess but it'd be easier if you told me :)

Thinking about it again: to publish data, you have to publish a transaction! I guess the most economical, paying taproot to taproot, is about 192 bytes with script path plus the posited extra 64 for the (R,s) in the output, so yeah that'd be 32 out of 256, 12.5%. Isn't the figure a bit different for key path though, because no control block? Well it hardly matters, it's some small fraction in that range.

An interesting mechanical detail in this near-absurd scenario is that if you wanted to repeatedly publish off the same (presumably a few multiples of dust level) output, you couldn't also do the leak single key thing, since you'd lose control to re-spend. So that'd place us in the "explicit multisig" scenario that Greg mentioned, which I think would only make sense with legacy script? Kind of a different scenario, also it would be really weird to update legacy script to take into account a new "you must sign the pubkeys" rule. Though I guess in this fictional scenario, it might happen like that. If you did do it with legacy, you'd be publishing bare 2 of 2 multisig. If you did it with taproot due to how that works, the script is not published until the output is spent, so I think that's outside what I was considering ("data in utxo set"). (I guess you could also use something like a hash lock which might be more efficient). So anyway if you wanted to do this repeatedly and minimize cost, for whatever strange reason, you'd be adding another 50-100 bytes each time bringing that % down to like 10% or less.

But that all became way too hypothetical to even analyze properly :)

Anyway just to reemphasize I certainly wasn't advocating this sig-attaching system, but it seems important to know what the result of it would be: we would still not have changed the obvious reality that embedding data in witness gives more space for data, and is more economical, and we would only reduce by a big factor how much can be embedded in outputs (anything from 8% to 15% embedding rate seems possible depending on the hypothetical details), while having to screw up much of Bitcoin's functionality in the process.

Cheers,
AdamISZ/waxwing

Greg Maxwell

unread,
Oct 2, 2025, 5:59:41 PMOct 2
to waxwing/ AdamISZ, Bitcoin Development Mailing List
I just meant in the purely grinding non-key leaking case you could get 4 bytes into the nonce pretty easily and 4 bytes into either the pubkey or signature out of a 64 byte signature.  Obviously the delivered embedding rate in a whole txn will be lower, but maybe not that much thanks to multisig outputs.


--
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+...@googlegroups.com.

Peter Todd

unread,
Oct 3, 2025, 11:51:52 AMOct 3
to waxwing/ AdamISZ, Bitcoin Development Mailing List
On Wed, Oct 01, 2025 at 07:24:50AM -0700, waxwing/ AdamISZ wrote:
> Hi all,
>
> https://github.com/AdamISZ/schnorr-unembeddability/
>
> Here I'm analyzing whether the following statement is true: "if you can
> embed data into a (P, R, s) tuple (Schnorr pubkey and signature, BIP340
> style), without grinding or using a sidechannel to "inform" the reader, you
> must be leaking your private key".
>
> See the abstract for a slightly more fleshed out context.
>
> I'm curious about the case of P, R, s published in utxos to prevent usage
> of utxos as data. I think this answers in the half-affirmative: you can
> only embed data by leaking the privkey so that it (can) immediately fall
> out of the utxo set.
>
> (To emphasize, this is different to the earlier observations (including by
> me!) that just say it is *possible* to leak data by leaking the private
> key; here I'm trying to prove that there is *no other way*).

You can probably use timelock encryption to ensure that the leak of the private
key only happens in the future, after the funds are recovered by the owner in a
subsequent transaction.

--
https://petertodd.org 'peter'[:-1]@petertodd.org
signature.asc

waxwing/ AdamISZ

unread,
Oct 4, 2025, 2:40:43 AMOct 4
to Bitcoin Development Mailing List
Hi Peter,

> You can probably use timelock encryption to ensure that the leak of the private
key only happens in the future, after the funds are recovered by the owner in a
subsequent transaction.

Another very interesting point, there, to get around the issue of key leakage ... albeit I don't see a usecase, maybe I'm just not imaginative enough, very possible.

If someone wants to keep something in the utxo set "forever", it doesn't help. If they want the property of "immediately accessible in the utxo set" (like "deposit into some fancy system with a blob of data"; I emphasize "deposit" because that would explain why not "just put it in the witness", your current outputs don't support that; correct me if my reasoning is wrong here), then I guess they don't get that, either: the data is accessible "intermediate term" instead.

Cheers,
AdamISZ/waxwing

waxwing/ AdamISZ

unread,
Oct 6, 2025, 9:06:29 AMOct 6
to Bitcoin Development Mailing List
Yes, sorry, reading fail on my part (somehow missed that you were explicitly referring to grinding in the comment).

Still don't think the 12% figure is a good one though? in (P,R,s) it's 8 out of 96 (and as discussed, worse if whole tx is (realistically) included), 1/4 the rate you get from direct key leakage. (Plus the perhaps trivial point that it does actually require work, which might conceivably matter at scale?). I'm not sure why one would not include P in the measure?

Even an explicit multisig that does not sacrifice control of the output would be of the order of double the embedding rate, without having to do work. (P,R,s x 2 = 192 and embed 32 for a 1/6 rate; vs. grinding all 4 P,R values for a 1/12 rate).

Anthony Towns

unread,
Oct 7, 2025, 5:38:40 AMOct 7
to waxwing/ AdamISZ, Bitcoin Development Mailing List
On Wed, Oct 01, 2025 at 07:24:50AM -0700, waxwing/ AdamISZ wrote:
> I'm curious about the case of P, R, s published in utxos to prevent usage
> of utxos as data. I think this answers in the half-affirmative: you can
> only embed data by leaking the privkey so that it (can) immediately fall
> out of the utxo set.

I think you can attack the setup here.

If you allow scriptPubKeys in the utxo set whose spending conditions
are HTLC/atomic-swap-like:

(pubkey A and preimage reveal of X)
OR (pubkey B and block height > H)

then you either set H to be arbitrarily far in the future and reveal
B's privkey, or choose an NUMS X with no known preimage, and reveal
A's privkey.

If you don't allow those things (eg, by requiring such constructions
also have a (pubkey musig(A,B)) path) then I think you rule out NUMS-IPK
constructions, and end up making things like vaults ("hotkey with delay,
coldkey anytime") difficult to send to ("I have to sign with my cold
key to request funds?"), or, depending on what the utxo R,s is signing,
encourage key reuse.

> (To emphasize, this is different to the earlier observations (including by
> me!) that just say it is *possible* to leak data by leaking the private
> key; here I'm trying to prove that there is *no other way*).

That seems right to me.

I think if the signature scheme supported pubkey recovery (ie, s*G = R +
H(R,m)*P, and our "m" didn't commit to P as well), you could get around
this by just having P be the data, with no one, including the "signer"
able to recover the private key.

> However I still am probably in the large majority that thinks it's
> appalling to imagine a sig attached to every pubkey onchain.

I think the only thing achieved by embedding data in the utxo set (vs
an OP_RETURN output or witness data) is to bloat the utxo set; and if
that's the goal, it can equally easily be done with spendable outputs
that the attacker simply chooses not to ever spend. So that doesn't seem
like a terribly interesting solution to anything.

As far as embedding data in signatures goes, I think the following
scheme would allow you to publish data in a cryptographically-secure way,
with minimal lost funds:

0) Setup secret keys p and q, and a 32-byte secret k. H(a,b,..) is sha256
of a,b,.. concatenated.

1) Split your data into N 31 byte blocks, a1, a2, .., aN.

2) Calculate r0 as H(k*G). Calculate r1, .., rN as:

r(i+1) = H(p, r(i)) + a(i)

3) Sign N+1 transactions in a chain spending pubkey p*G, using rN, r(N-1),
.., r1, r0 as nonces. All but the final tx should pay to a p*G output to
continue the chain; the final output should pay to q*G instead.

4) Once all transactions are sufficiently confirmed, spend the final
output with k as the secret nonce (and hence R=k*G as the public
nonce).

Recover the data using the following process:

1) From the final transaction, recover R=k*G, and calculate r0 as H(R).
Recover p from the previous transaction, p = (s0-r0)/H(r0*G, P,mi).

2) Recover ri from each signature; ri = si - H(Ri, P, mi)*p. Recover
the data ai as ai = ri - H(p,r(i-1)).

Dealing with the points being 32-bytes might require carrying over a
sign-bit; but that should be possible in the spare ~7 bits since each
block was only 31 bytes not 32 bytes. Left as an exercise for the
reader, etc.

I believe that the privkey p is secure prior to k*G being revealed,
since all the nonces are distinct hashes seeded by that privkey; and q
remains secure because k is never revealed.

If you wanted to not reuse the pubkey p*G repeatedly, you could tweak it
to be p0 = p, p(i+1) = p + H(k*G, p(i)), or similar. That would allow you
to use an n-of-n multisig to get multiple blocks in a single transaction
without seeming weird, eg.

I believe the only way to distinguish this from a normal transaction
pattern where a wallet has a change output, is via the final transaction
that reveals k*G, and detecting the relationship between k*G and the
spending conditions of the transaction that created the coin being spent.
That's already somewhat expensive to check for every spend, but could
be made more so by publishing k*G on some other medium (ie the data is
in the blockchain, but you obtain the txid and key to find the data
from elsewhere), or by revealing (k+x)*G where x is a random 20-bit
(?) number, and a significant but tractable amount of grinding is needed
to recover the desired k*G and decode the data -- the idea being that
that is tractable for someone who knows there is data at that txid,
but not tractable when performed on every signature in the blockchain
in order to filter data publication.

I think if you did 20 such transactions per block, each spending a single
20-of-20 tapscript multisig, you'd get 12400 bytes of data per block
(without violating standardness constraints), at a cost of ~11800vb, so
much less efficient than inscriptions, but slightly more efficient than
OP_RETURN, and significantly less detectable than either. I think Knots
default policy currently allows up to 50-of-50 multisig in tapscript,
which would give you 31kB of data in ~26.6kvB of tx weight in a block.

If you're regularly making payments from a particular wallet, I think
that procedure would allow you to encode data in your change outputs at
the rate of 32B/tx for no additional cost. Though the data would only be
recoverable once complete, and it's probably worth noting that I haven't
provided any security proofs...

Cheers,
aj

waxwing/ AdamISZ

unread,
Oct 7, 2025, 9:52:47 AMOct 7
to Bitcoin Development Mailing List
Hi aj,

Interesting points! Answers inline.



On Tuesday, October 7, 2025 at 6:38:40 AM UTC-3 Anthony Towns wrote:
On Wed, Oct 01, 2025 at 07:24:50AM -0700, waxwing/ AdamISZ wrote:
> I'm curious about the case of P, R, s published in utxos to prevent usage
> of utxos as data. I think this answers in the half-affirmative: you can
> only embed data by leaking the privkey so that it (can) immediately fall
> out of the utxo set.

I think you can attack the setup here.

If you allow scriptPubKeys in the utxo set whose spending conditions
are HTLC/atomic-swap-like:

(pubkey A and preimage reveal of X)
OR (pubkey B and block height > H)

then you either set H to be arbitrarily far in the future and reveal
B's privkey, or choose an NUMS X with no known preimage, and reveal
A's privkey.

Yes. In the paper (and my OP email) I'm trying to narrow it down completely to a P, R, s structure. I guess if we try to be realistic about this "publish a signature in the output always" horrible scenario, it would have to just ditch the NUMS variant of taproot, and I agree, that is a very Bad Thing (TM). (uh sorry you discuss this in the next paragraph but, w/e).

Alternative examples like multisig or hash lock in script to get the data leakage without losing control of the output (necessarily) have been mentioned but I like your 2-branch setup as a good flexible example.

If you don't allow those things (eg, by requiring such constructions
also have a (pubkey musig(A,B)) path) then I think you rule out NUMS-IPK
constructions, and end up making things like vaults ("hotkey with delay,
coldkey anytime") difficult to send to ("I have to sign with my cold
key to request funds?"), or, depending on what the utxo R,s is signing,
encourage key reuse.

> (To emphasize, this is different to the earlier observations (including by
> me!) that just say it is *possible* to leak data by leaking the private
> key; here I'm trying to prove that there is *no other way*).

That seems right to me.

I think if the signature scheme supported pubkey recovery (ie, s*G = R +
H(R,m)*P, and our "m" didn't commit to P as well), you could get around
this by just having P be the data, with no one, including the "signer"
able to recover the private key.


Yes, basically. I discuss this in the paper w.r.t. ECDSA. Your description of the relevance of pubkey recovery is good, but there are some nuances. You can't quite (with ECDSA) get P to be the data and have a valid sig, but you can get 's' to be the data simply by backsolving for the private key x. Lack of "pubkey prefixing" in the very funky 'commitment to the nonce' in ECDSA causes that. And the second nuance, you did actually mention: you get "not leaking the key" for free, here. But it's still only a 32/96 bytes embedding rate though, the way I count it.
 
> However I still am probably in the large majority that thinks it's
> appalling to imagine a sig attached to every pubkey onchain.

I think the only thing achieved by embedding data in the utxo set (vs
an OP_RETURN output or witness data) is to bloat the utxo set; and if
that's the goal, it can equally easily be done with spendable outputs
that the attacker simply chooses not to ever spend. So that doesn't seem
like a terribly interesting solution to anything.

I think the logic of that is not quite right. Suppose I want to embed pictures into the unpruneable utxo set specifically (and not only 'in transactions'). The starting point here was me trying to write out how you can't embed data in known-privkey (Schnorr) P, R, s tuples.

And not only pictures; as Andrew pointed out above, there's always the concern of some kind of virus-y "naughty" data.
Very nice example. I am glad you took the trouble to write it out, because I agree that examples like that are worth working through because as you say they lean closer to being properly indistinguishable from ordinary transaction patterns.

My analysis was narrower: output-side embedding (in a theoretical future of P,R,s outputs). But that's a little confusing because (P, R, s) is still there whether some of it is put in witness or not. So everyone seems to agree that privkey reveal is necessary for that, but everyone is also pointing out that with Bitcoin's actual consensus scripting system, that doesn't quite mean what it seems! And the embedding rate is not very good. In this framing, not much has changed in your "chained" example: once the privkey p is revealed, you get the k value per chain link, so it's still roughly a 1/3 ratio, or more realistically, as you mention (and I did upthread), it's per *transaction* which is a much lower rate.

Your points about limits, standardness constraints are well taken; those are the kinds of things that do actually matter today, but I was not thinking about.


Anthony Towns

unread,
Oct 8, 2025, 4:45:06 AMOct 8
to waxwing/ AdamISZ, Bitcoin Development Mailing List
On Tue, Oct 07, 2025 at 05:05:24AM -0700, waxwing/ AdamISZ wrote:
> Yes, basically. I discuss this in the paper w.r.t. ECDSA. Your description
> of the relevance of pubkey recovery is good, but there are some nuances.
> You can't quite (with ECDSA) get P to be the data and have a valid sig, but
> you can get 's' to be the data simply by backsolving for the private key x.
> Lack of "pubkey prefixing" in the very funky 'commitment to the nonce' in
> ECDSA causes that. And the second nuance, you did actually mention: you get
> "not leaking the key" for free, here. But it's still only a 32/96 bytes
> embedding rate though, the way I count it.

You've got 4x 32-byte values to play with: s, r, p and m. The verification
equation determines one of these, reducing it to 3x. m isn't able to be
freely chosen, reducing it to 2x. And being able to reverse the equation
in order to calculate anything requires the receiver to know one of the
secrets, which reduces it to 1x. (Grinding can bump that back up to a
factor of 1.something) So that's the 32. On the other side, you need to
transmit everything but m which is otherwise determined by the setup,
so that's the 96.

> I think the logic of that is not quite right. Suppose I want to embed
> pictures into the unpruneable utxo set specifically (and not only 'in
> transactions').

Sure, but then I'll also suppose your goal is to harm Bitcoin by bloating
the utxo set. If that weren't one of your fundamental goals, you'd use
other, cheaper and easier, ways of encoding the data.

> Very nice example. I am glad you took the trouble to write it out, because
> I agree that examples like that are worth working through because as you
> say they lean closer to being properly indistinguishable from ordinary
> transaction patterns.

I think the (P,R,s) outputs could be an interesting design for a
non-programmable system that was intended purely for payments -- a
FEDwire/SWIFT replacement without the possibility of vaults, lightning,
etc. Presumably more mimblewimble friendly etc too. Presumably the "R,s"
values could also be a signature of P by the operator's well known pubkey,
giving you a KYC/CBDC-like system too.

You could get programmability back in this scenario by allow P to sign
a script, which you then satisfy, rather than signing a payment directly
(ie, the graftroot approach).

Anyway, once you make the system programmable in interesting ways, I
think you get data embeddability pretty much immediately, and then it's
just a matter of trading off the optimal encoding rate versus how easily
identifiable your transactions can be. Forcing data to be hidden at a
cost of making it less efficient just leaves less resources available
to other users of the system, though, which doesn't seem like a win in
any way to me.

> Your points about limits, standardness constraints are well taken; those
> are the kinds of things that do actually matter today, but I was not
> thinking about.

Note that I mentioned the standardness constraints not because they're
limits today, but rather because they reflect the form existing txs take,
so mimicing that form would allow txs embedding data via this scheme to
be difficult to distinguish from other txs, and hence equally difficult
to censor/filter.

Cheers,
aj

waxwing/ AdamISZ

unread,
Oct 8, 2025, 9:49:04 AMOct 8
to Bitcoin Development Mailing List
Answers inline.

On Wednesday, October 8, 2025 at 5:45:06 AM UTC-3 Anthony Towns wrote:
On Tue, Oct 07, 2025 at 05:05:24AM -0700, waxwing/ AdamISZ wrote:
> Yes, basically. I discuss this in the paper w.r.t. ECDSA. Your description
> of the relevance of pubkey recovery is good, but there are some nuances.
> You can't quite (with ECDSA) get P to be the data and have a valid sig, but
> you can get 's' to be the data simply by backsolving for the private key x.
> Lack of "pubkey prefixing" in the very funky 'commitment to the nonce' in
> ECDSA causes that. And the second nuance, you did actually mention: you get
> "not leaking the key" for free, here. But it's still only a 32/96 bytes
> embedding rate though, the way I count it.

You've got 4x 32-byte values to play with: s, r, p and m. The verification
equation determines one of these, reducing it to 3x. m isn't able to be
freely chosen, reducing it to 2x. And being able to reverse the equation
in order to calculate anything requires the receiver to know one of the
secrets, which reduces it to 1x. (Grinding can bump that back up to a
factor of 1.something) So that's the 32. On the other side, you need to
transmit everything but m which is otherwise determined by the setup,
so that's the 96.

Yeah I think so, roughly. It's not 100% watertight deductions but it seems correct from where I'm sitting.
(I would only nit that 'm' isn't in consideration as it's implicit, not published, in current signature usage; in a proposed signature-in-output, m would obviously be constrained to something with no wiggle room (and including P if we used ECDSA, but we wouldn't).
 
> I think the logic of that is not quite right. Suppose I want to embed
> pictures into the unpruneable utxo set specifically (and not only 'in
> transactions').

Sure, but then I'll also suppose your goal is to harm Bitcoin by bloating
the utxo set. If that weren't one of your fundamental goals, you'd use
other, cheaper and easier, ways of encoding the data.

But the goal can be simply this: my data is more marketable if I can plausibly claim that it's embedded into bitcoin nodes for eternity (whether true or not, it's marketable). AFAIK this is indeed a thing, in the real world.
 


> Very nice example. I am glad you took the trouble to write it out, because
> I agree that examples like that are worth working through because as you
> say they lean closer to being properly indistinguishable from ordinary
> transaction patterns.

I think the (P,R,s) outputs could be an interesting design for a
non-programmable system that was intended purely for payments -- a
FEDwire/SWIFT replacement without the possibility of vaults, lightning,
etc. Presumably more mimblewimble friendly etc too. Presumably the "R,s"
values could also be a signature of P by the operator's well known pubkey,
giving you a KYC/CBDC-like system too.

You could get programmability back in this scenario by allow P to sign
a script, which you then satisfy, rather than signing a payment directly
(ie, the graftroot approach).


I like this line of thought, and indeed I'd forgotten about graftroot and the whole delegation angle.
(and just to repeat the point made earlier: we'd only need to sign over a message including P for ecdsa, but we wouldn't use that.)
I guess if you're discussing a hypothetical permissioned system though it's a whole different world, so I'm going to sidestep that one.

But it does sound interesting to do delegation and then ZkPOK outputs even in a Bitcoin world. Albeit it's a long way from where we are today.

Of course we're firmly pie in the sky again here, but I think it helps inform thinking about Bitcoin as it is concretely today.
 
Anyway, once you make the system programmable in interesting ways, I
think you get data embeddability pretty much immediately,

My main motivation in discussing this was indeed the extent to which you get embeddability even without any programmability; as we've established, it's not zero, and it's not restricted to grinding (exponential work). But in *pure* unprogrammable, ZkPOK outputs of form P, R,s and nothing else allowed, it *is*, I'm claiming, restricted to key leakage and doesn't surpass 33%.

and then it's
just a matter of trading off the optimal encoding rate versus how easily
identifiable your transactions can be. Forcing data to be hidden at a
cost of making it less efficient just leaves less resources available
to other users of the system, though, which doesn't seem like a win in
any way to me.

> Your points about limits, standardness constraints are well taken; those
> are the kinds of things that do actually matter today, but I was not
> thinking about.

Note that I mentioned the standardness constraints not because they're
limits today, but rather because they reflect the form existing txs take,
so mimicing that form would allow txs embedding data via this scheme to
be difficult to distinguish from other txs, and hence equally difficult
to censor/filter.

I see. Good point.
 

Tim Ruffing

unread,
Oct 31, 2025, 6:51:48 AM (10 days ago) Oct 31
to waxwing/ AdamISZ, Bitcoin Development Mailing List
Hey Adam,

I think something is wrong here. 

Assume a group of order n=p*2^t where p is a large enough prime such
that the DL problem is hard. For example, Curve25519 has t=3 but the DL
problem still hard. Or, assuming n+1 is also prime, work in the
multiplicative group of integers modulo n+1 (which has group order n
then). I'm not aware of any obstacles to constructing such groups for
sufficiently large values of t. 

The crucial point is that, in these groups, the Pohlig-Hellman
algorithm can be used to compute the t least significant bits of the
discrete logarithm k of a group element R efficiently. So to embed t
bits in a Schnorr signature (R, s), simply pick k such that its t least
significant bits t are exactly these bits.

Of course, this does not work in BIP340 because it uses the secp256k1
group for which t=0, i.e., the group has prime order. But it appears
that the reasoning in your write up is not specific to prime-order
groups. Thus I conclude that something must be wrong or insufficient in
your argument.

Let me clarify that I do not claim that data can be embedded in a
BIP340 signature. I only claim that your arguments for why data can't
be embedded do not appear to be sound. I believe any proof that data
cannot be embedded in a Schnorr signature (or in a group element R) in
a prime-order group must somehow exploit the fact that all bits of k
are hard to compute from R; see Section 10 in Håstad-Näslund 2003 [1]
for a proof that this is the case for prime-order groups.

Best,
Tim

[1] https://www.csc.kth.se/~johanh/hnrsaacm.pdf
> --
> You received this message because you are subscribed to the Google
> Groups "Bitcoin Development Mailing List" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to bitcoindev+...@googlegroups.com.
> To view this discussion visit
> https://groups.google.com/d/msgid/bitcoindev/0f6c92cc-e922-4d9f-9fdf-69384dcc4086n%40googlegroups.com
> .

waxwing/ AdamISZ

unread,
Oct 31, 2025, 9:25:13 AM (10 days ago) Oct 31
to Bitcoin Development Mailing List
Hi Tim,

First, thanks for the considered reply! That is a very interesting point for sure.

I guess I have 2 or 3 responses:

First, my "theorem 1" was deliberately specific about BIP340. I am aware of the impact of Pohlig-Hellman on non prime order groups.

However despite me being able to "defend the thesis" in that literal sense, I still think your overall critique is valid. I think the "framework" (at least in the updated version of the paper; the first couple of drafts were a bit incoherent) makes sense, but it's too vague in the most important part of the reasoning, namely the invertibility of the functions described. But w.r.t. the values P and R, throughout, I was assuming pseudorandomness (uncontrollable output-ness) [1] of the mappings x -> P = xG and k -> R=kG. That assumption was both explicit and implicit in several steps (or perhaps leaps) I took (see e.g. how I refer to the function f(P, R, s) and in at least one place basically "ignore" the P, R dependency because they are uncontrollable); in my head , that was justifiable based on it being a prime order group, but at the very least, I should have been explicit.

> I believe any proof that data
cannot be embedded in a Schnorr signature (or in a group element R) in
a prime-order group must somehow exploit the fact that all bits of k
are hard to compute from R; see Section 10 in Håstad-Näslund 2003 [1]
for a proof that this is the case for prime-order groups.

Nice reference, thanks! I definitely wouldn't have found that. As per above, I just assumed this without justifying it; so my end conclusion that there is a reduction to hash preimage resistance is I guess incomplete.

[1] so .. k -> kG is kind of a pseudorandom function, or generator, right? If this is a DDH assumption, then perhaps that's what we should really reduce to (well, plus hash preimage resistance)?

Cheers,
Adam

Garlo Nicon

unread,
Oct 31, 2025, 9:25:30 AM (10 days ago) Oct 31
to waxwing/ AdamISZ, Bitcoin Development Mailing List
> if you can embed data into a (P, R, s) tuple (Schnorr pubkey and signature, BIP340 style), without grinding or using a sidechannel to "inform" the reader, you must be leaking your private key

You can embed data into a valid signature. For example:

R=k*G
P=d*G
k=first_chunk_of_data
d=second_chunk_of_data


And then, keys are "weak", because people can use "known plaintext attack", to get them. However, if you want to push random data, that is unknown to the reader, then it is known only by the holder of the data.

Which means, that the efficiency of this encoding is somewhere around 66%, by grinding SHA-256 hashes, it could probably reach around 70% in practice. Only s-value is something, that needs any grinding, for k-value and d-value, you need only the data, and nothing else.

So, I guess it is a spectrum: something like 70% efficiency means, that you need "known plaintext attack" to get the data. And then, you can use less and less bits per public key, to make it arbitrarily weaker. Then, instead of relying on a timelock, you can rely on computation difficulty for the reader, for example: "how many bits I need to leak, to make it breakable by lattice attack".

waxwing/ AdamISZ

unread,
Nov 1, 2025, 11:47:38 AM (8 days ago) Nov 1
to Bitcoin Development Mailing List
Hi Garlo Nicon,

Before I answer your point I want to mention (to readers): probably some things remained tacit in this thread but are worth emphasizing:

1. It's always trivial to get a 100% embedding rate if it's OK to assume the embedder is choosing to share data off-blockchain with others (just xor the real signature with their chosen data and call that the key). This is of course is a bit silly (though not entirely silly); if the purpose is to *communicate* then they can use the communication channel for the data, instead of the xor value, and forget about the blockchain. On the other hand if their purpose is to publish data, and rely on the immutability and persistence of the blockchain, then there is the problem that the xor key can be lost; it's that offchain data that represents the actual semantics of what they published, and so they're in rather the same position as they would have been without the blockchain existing at all. (insert finesses/caveats but, basically).

2. All of the above theoretical analysis doesn't work for ECDSA *as an algorithm outside of Bitcoin*. You get 32 bytes of embedding without leaking the private key, there. (the s-value can literally be made to say "hello world" 3 times or whatever). this is the non-pubkey-committing nature of standard ECDSA. I *think* you can make it behave the same as Schnorr in terms of pubkey-unembeddability-without-key-leakage by putting the pubkey in the message, but it's even harder to analyze than Schnorr (which is already hard).

3. In contrast to 2., the pubkey is in fact embedded in the message (indirectly), at least usually, in Bitcoin (except sighash_noinput type stuff which isn't live), so you can't put hello world in the signatures for now, at least AFAIK. Still even then you're stuck at a 33% rate if we include all of P, R, s, which seems reasonable (in fact, that's a generous measure). Again, I am ignoring grinding which always adds a bit more.

Anyway, you say:

> So, I guess it is a spectrum: something like 70% efficiency means, that you need "known plaintext attack" to get the data. And then, you can use less and less bits per public key, to make it arbitrarily weaker. Then, instead of relying on a timelock, you can rely on computation difficulty for the reader, for example: "how many bits I need to leak, to make it breakable by lattice attack".

I think it's an interesting idea to use lattice attacks but I can't find a way to agree with 66 or 70%. Here's why:

We assume a "few" signatures are all on the same private key. If there are N such signatures, then once LLL or similar lattice method is successful, you retrieve the 1 private key (32 bytes) and the N * 27 bytes (or so; imagining 5 bytes are biased; it *can* go lower, requiring more signatures; doesn't change the situation).

So you embedded successfully 27N+32 (all the nonces and the private key) into 64N + 32N [1] for a ratio that is a bit less than 33%. Compare with just using a repeated nonce in 2 equations, where you get 64 bytes (nonce, privkey) from 2*P + 2*(R,s) or so a total of 196, i.e. 33% exactly. Basically, at least in a bitcoin context, there is no gain in doing a partial exposure of the nonce; you may as well just reveal all of it, either by repetition or as noted in the pdf, by using something public like a block hash. Notice that if my note [1] did not apply, then all the above isn't correct, the ratios work differently.

Can you let me know how you're getting 66%+? I'm guessing you're just saying "the k and the d values" but as per above I don't see it. Maybe write out concretely what the data-reader would be doing?

[1] It's easy to slip up here - I know I did - when considering publication *on bitcoin* compared with just publishing signatures. In the latter case, I can publish 100 signatures with the tacit assumption that they all refer to the same key (or, you can verify, to check). In bitcoin the pubkey is never tacit, it's always published in the scriptPubKey or scriptSig or whatever, so you can't gain efficiency from repeated uses of the same key (i.e. you can't write 64N + 32, it must be 64N + 32N for (P, R, s) tuples).

Cheers,
Adam

Garlo Nicon

unread,
Nov 2, 2025, 5:11:35 AM (8 days ago) Nov 2
to waxwing/ AdamISZ, Bitcoin Development Mailing List
> Can you let me know how you're getting 66%+?

You have three chunks, which are needed: (P,R,s). You can control "P" and "R" directly and fully, by feeding it with your data. That means, you can get 66%, because it is just 2/3, if you assume, that all values have the same size.

Then, to get 70% or more, grinding s-value is needed, which is doable, if you want to for example grind two or three bytes of s-value, and stop there. But let's assume, that you want to make it as fast as possible, so you don't grind anything, and then stop at 66%.


> Maybe write out concretely what the data-reader would be doing?

I already told you, when I said "known plaintext attack". If you want to put random data into private keys or signatures, then things are hard to break. However, if it is something useful for the reader, then usually, that kind of data are non-random. For example: some users store transactions inside OP_RETURNs, and they use ASCII hex representation. If they would use binary encoding, then they would save 50% space. But people simply don't care.

And the similar case is possible here: if you want to store random data, then it is hard to use this method. However, if you want to store ASCII text, where many words can be found in a dictionary, or where the format of the data is known upfront, or can be easily guessed, then the security of the keys, is comparable to the brainwallets.

Which means, that you can just put your data into the private key of the user, and a "signature nonce" (which is nothing else, but yet another private key, placed on secp256k1). And then, if you know, that your data, is for example "ASCII string", then it means, that each and every key, that you produce, simply leaks at least 32 bits per 256-bit key, if not more.

And then, if the attacker can get coins from brainwallets, then decoding such data is not much harder than that. If your data contains simple words, then even dictionary attacks can be used.

So, let's say that you want to encode 64 bytes in a signature:

d="This is a test of storing data i"=0x5468697320697320612074657374206f662073746f72696e6720646174612069
k="n private keys inside signatures"=0x6e2070726976617465206b65797320696e73696465207369676e617475726573
P=d*G=02A2EF730B26A905A7D91940E3A512C5771D8BC8BCCA153D714E328043856CBB2B
R=k*G=02E19FCA1025CFD67409309E2B1711D723BFB67EC520917D9A0AD9432414DA0D0A


And then, s-value comes from SHA-256 hashing, so it is harder to control. But grinding a few bytes can give something around 70%. However, even if we stop at 66%, then still: useful data are regular. There are many patterns. If something is an ASCII string, then 1/8 bits are cleared, and it is known, which ones should be set to zero. If it is in English, then the entropy is even lower. Which means, that the private key is not directly "leaked", by being passed to the reader, but there is an assumption, that it will be easy enough to get.

Also, if the key won't be leaked, then it can be used as an advantage: first, NFTs can be minted, and transferred, and then, you can pass the data directly, and say: "See? You can confirm, that they are encoded into private keys properly". And as long as the data in question is difficult enough to fully guess, the key is not revealed, even if it is quite weak.

Which means, that my answer to your question is: it is a spectrum. You can make a weak signature, and have 33% encoding efficiency, and leak every private key immediately. But you can make something in a spectrum between 33% and 66%, and make something, that is "weak", but something, which won't be broken "on the spot, immediately after being broadcasted" (so you cannot really say, that the keys are "leaked", because you need to know "something" about the plaintext inside private keys, or about its format). And it is good for spammers, because then, funds can be safely confirmed, and later revealed, that "hey, I encoded that data here, by wasting 3 MB of block space, to encode 2 MB of ASCII strings, here is your NFT, that you can buy here".

--
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+...@googlegroups.com.

waxwing/ AdamISZ

unread,
Nov 2, 2025, 8:33:06 AM (8 days ago) Nov 2
to Bitcoin Development Mailing List
> I already told you, when I said "known plaintext attack". If you want to put random data into private keys or signatures, then things are hard to break. However, if it is something useful for the reader, then usually, that kind of data are non-random. For example: some users store transactions inside OP_RETURNs, and they use ASCII hex representation. If they would use binary encoding, then they would save 50% space. But people simply don't care.

> And the similar case is possible here: if you want to store random data, then it is hard to use this method. However, if you want to store ASCII text, where many words can be found in a dictionary, or where the format of the data is known upfront, or can be easily guessed, then the security of the keys, is comparable to the brainwallets.

> Which means, that you can just put your data into the private key of the user, and a "signature nonce" (which is nothing else, but yet another private key, placed on secp256k1). And then, if you know, that your data, is for example "ASCII string", then it means, that each and every key, that you produce, simply leaks at least 32 bits per 256-bit key, if not more.

Ah, right; I had originally written a response to this idea but then discarded it on the basis that it's kinda "obvious" that we shouldn't think about that, and focused on the more in-the-weeds concept of a lattice attack instead.

But it isn't obvious.

So let's think of the spectrum here. First, the most trivial nonce to break: one consisting of a single bit (OK technically you can't encode k=0, heh, but, whatever, put it in the second bit of the string). Obviously that is extractable, getting 32 bytes plus one bit. That one extra bit above the 33% is achievable because of "grinding" except here grinding is the most trivial version possible: trying 2 alternatives. This still fits my original claim, which is "33% plus whatever you can get from grinding, and you leak the secret key in the process".

Other end of the spectrum: not 1 bit or 5 bytes but say 20 bytes represent an actual message, and let's say the rest of the 256 bit k-string is zero. Now clearly one can't grind that, if it's random. Which brings us to your point about weakness: let's say the 20 bytes of message comes from a space of possible messages, known to all potential readers, whose size is actually 40 bits. Because they can grind 40 bits, they can retrieve the message, but that message is only 40 bits of information. E.g. most crude idea; a table of 2^40 messages, you are picking one .. notice it doesn't matter if the length of each message is 40 bits or 160 bits or 256 bits; you are only conveying 40 bits of *information* if you do this.

From this point of view it's pretty clear that we haven't changed the general conclusion: you only get 33% (say 32 bytes), *plus* whatever you can get from grinding, and since that's exponential work, it's never going to be very big, say 5 bytes or possibly 6? And you leak the key of course.

I do agree with you that there could be scenarios where this "mode" of publication/embedding might be the preferable one, because we're gliding over that line between "pure publication" and "publication with sidechannels". As I argued here and elsewhere, if there is a proper, viable, sidechannel, then most of this analysis doesn't apply but a sort of mixup where "if you know information X you can grind out more information Y from the onchain data" is possible.

But no, as per the above, you are definitely not conveying 66% (that is to say , 64 bytes out of 96) in the P, R, s tuple using this method. That'd only be true in the sense that if the space of possible messages is "hello world\n\n" and "goodbye world" and then you claimed you were sending 13 bytes because a reader can find the message.

Cheers,
AdamISZ/waxwing

Reply all
Reply to author
Forward
0 new messages