A safe way to remove objectionable content from the blockchain

152 views
Skip to first unread message

Lazy Fair

unread,
Nov 20, 2025, 4:49:03 AM (4 days ago) Nov 20
to bitco...@googlegroups.com
I propose two changes to Bitcoin, one at the consensus level, and one at the client level. The purpose of this is to support filtering of objectionable content after the content has been mined, allowing each node operator to maintain only that data they find agreeable. In so doing, my hope is that we can satisfy all users, and deal with their greatest concerns.

I do however acknowledge those people that want to stop miners from mining non-monetary transactions, because of the data storage and processing cost, and I recognised that this proposal does nothing to address those concerns.

*** Motivation ***

You can't just change or delete some data from the blockchain, because a hash of everything in a block is included in the next block. If you change the data, you change the hash. The design presented here is an attempt to achieve a compromise, where a person can have all of the benefits of running a full node, including the integrity of the ledger, yet without storing the objectionable content - and importantly without even being able to recreate that objectionable content from what data they still have.

*** Preliminary ***

Objectionable content is defined here as whatever you want it to be, and two users don't have to share the same views. One person might object to copyrighted material used without permission, another a negative depiction of the prophet Muhammad, and another video of the sexual abuse of children. The design presented below lets each person decide what to remove for themself (if anything), while those who want everything can still have it all.

The design lets a user remove any data, and deals with the impact on the matching of block hashes, data integrity and malleability. 

In the case of OP_RETURN data, the result should be no functional effect at all. Whether that's also possible for other data elements will depend on the semantics of that data.

*** Solution ***

This solution is based on two ideas, both aimed at maintaining data integrity through hashing, while removing some of the hash's input data stream.

*** First Idea ***

When performing a hash of some data (D), each chunk of data that's processed updates an internal state (S) of the hashing algorithm. If you know what the internal state is at point A and then at point B, then you can compute the final hash of D even without the data between A and B. This is the first idea. First you need to know what S(A) and S(B) are, and once you do, you can compute the hash of D, without the data between A and B. You run the hashing algorithm normally up to A, then you update the internal state from S(A) to S(B), then you continue hashing from B to the end of D.

The hash still works as an integrity check for the data before A, and the data after B: change any of this, and the final hash will change. Now you can safely change or delete the data in between, without breaking the integrity of the blockchain and proof of work - but only if you can securely obtain S(A) and S(B), and only if you don't need the data between A and B for anything else.

The easiest way to obtain S(A) and S(B) is to calculate them yourself, but that requires that you hold the objectionable data, at least for a time. That also requires finding someone else that holds the objectionable data. But what if instead, we could share S(A) and S(B) across the network, do it securely, and in a way where up to 100% of nodes could choose to drop the data in between, permanently, without breaking anything?

*** Second idea ***

It may seem like there is no one you can trust to tell you what S(A) and S(B) are. There is only one source of data that a Bitcoin node can trust, and that is the blockchain, as mined by miners, with the most proof of work, and verified locally. Therefore, the second idea is that S(A) and S(B) are trusted if (and only if) they are written into the blockchain, and verified by the network.

For example, we write data to the semantic effect of "In Transaction X: at byte offset A, the internal state of the hash function is S1; at byte offset B, the internal state of the hash function is S2." Miners then mine this statement into a block, and verifiers confirm that it is cryptographically accurate with respect to the data in Transaction X as described - or else they drop the new block as invalid.

At this point, any node can choose to delete the data between S1 and S2. This can now be done with confidence because they can double check the accuracy, and the impact on the ledger, before they delete the data. After that they may also be able to share (with the agreement of the receiving node) this modified transaction as part of initial block downloads, along with S1 and S2 - to any other nodes that don't want this objectionable content. The receiving nodes wouldn't immediately and necessarily be able to trust S1 and S2, but they would eventually, once they have the full blockchain.

*** Conclusion ***

This isn't a concrete proposal - it's not even close - but perhaps it might be the start of a fruitful conversation. I have more to say, but this email is long enough already. Email me if you're interested in discussing or developing these ideas together. I have a private Discord server, but I'm open to other suggestions, or just further discussion here.

Laissez faire, laissez passer.

Let it be, let it go.

Ethan Heilman

unread,
Nov 21, 2025, 6:25:48 PM (2 days ago) Nov 21
to Lazy Fair, bitco...@googlegroups.com
I'm not convinced your hash function approach fully does what you want it to, although it does seem doable with some additional constraints.

There is a solution that does everything you want it and more, ZKPs.

ZKP (Zero Knowledge Proofs) can prove that some data X hashes to some hash output Y while keeping the actual value X secret. Thus, everyone can be convinced that H(X) = Y even if X is deleted and no one knows what the value X was.

Even more exciting, ZKPs can prove the correctness and validity of the entire Bitcoin blockchain. Thus storing old transactions is no longer needed to convince others that the chain is correct. This would remove any harmful data. Zerosync in 2017 compressed Bitcoin's blockchain into a 800 KB proof [0] which is constant size regardless of the number of transactions or bytes compressed. This approach does not require any changes to Bitcoin and you could implement a Bitcoin full node today that supports this.

We have a solution to solve the problem of harmful data on the blockchain since 2017. It just requires time, money and motivated people to work on it.

[0]:  Robin Linus and Lukas George,  ZeroSync: Introducing Validity Proofs to Bitcoin, 2017, https://zerosync.com/zerosync.pdf

--
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/CABHzxrgbxG1qy3geyNHshA-q6tv0uNNwx5uiswUmAGDDxQjoHg%40mail.gmail.com.

Greg Maxwell

unread,
Nov 21, 2025, 6:25:50 PM (2 days ago) Nov 21
to Lazy Fair, bitco...@googlegroups.com
If you find blindly trusting miners acceptable, just run SPV and then you don't store anything but block headers.

Aside, allowing attackers access to manipulate a hash's midstate is dubious from a security perspective-- at the very least it's outside of the scope normally analyzed for security.

Saint Wenhao

unread,
2:13 AM (16 hours ago) 2:13 AM
to Greg Maxwell, Lazy Fair, bitco...@googlegroups.com
> allowing attackers access to manipulate a hash's midstate is dubious from a security perspective

It is unsafe, because the attacker can pick anything as the "middle state", run it through SHA-256, and get a valid result. For example: the hash of the Genesis Block is computed in this way:

hash0: 6a09e667 bb67ae85 3c6ef372 a54ff53a 510e527f 9b05688c 1f83d9ab 5be0cd19

01000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 3ba3edfd 7a7b12b2 7ac72c3e
67768f61 7fc81bc3 888a5132 3a9fb8aa

hash1: bc909a33 6358bff0 90ccac7d 1e59caa8 c3c8d8e9 4f0103c8 96b18736 4719f91b

4b1e5e4a 29ab5f49 ffff001d 1dac2b7c
80000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000280

hash2: af42031e 805ff493 a07341e2 f74ff581 49d22ab9 ba19f613 43e2c86c 71c5d66d

hash0: 6a09e667 bb67ae85 3c6ef372 a54ff53a 510e527f 9b05688c 1f83d9ab 5be0cd19

af42031e 805ff493 a07341e2 f74ff581
49d22ab9 ba19f613 43e2c86c 71c5d66d
80000000 00000000 00000000 00000000
00000000 00000000 00000000 00000100

hash4: 6fe28c0a b6f1b372 c1a6a246 ae63f74f 931e8365 e15a089c 68d61900 00000000


And now, let's assume that we want to skip the first 64 bytes. We get "bc909a33 6358bff0 90ccac7d 1e59caa8 c3c8d8e9 4f0103c8 96b18736 4719f91b" from the network, receive "af42031e 805ff493 a07341e2 f74ff581 49d22ab9 ba19f613 43e2c86c 71c5d66d" as a result, so we can think, that our last data chunk is set to "4b1e5e4a 29ab5f49 ffff001d 1dac2b7c". However:

fake0: 189dcde9 da998d89 12414f36 fb7a1edd d48a4c3b c0237088 6beec03e 46b7bafb

4b1e5e4a 29ab5f49 ffff001d 1dac2b7c
80000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000280

fake1: 189dcde9 da998d89 12414f36 fb7a1edd d48a4c3b c0237088 6beec03e 46b7bafb


So, the attacker can pick some data, and compute any two hashes, which will go through that initialization vector, and leave it unchanged. And then, instead of shrinking data, they can be expanded into infinite size.

Also, computing any difference between hashes is possible as well. For example: if we want to get a hash, which will be incremented by one:

fake2: f530fddf 74afe6c6 6004c3c0 c230b193 853774a9 6ab4c304 9d09ddde d9982546

4b1e5e4a 29ab5f49 ffff001d 1dac2b7c
80000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000280

fake3: f530fddf 74afe6c6 6004c3c0 c230b193 853774a9 6ab4c304 9d09ddde d9982547

Other attacks are possible as well. So, I wouldn't trust middle hashes that much, unless you have a strong cryptographic proof, that they are safe in a given context.

Reply all
Reply to author
Forward
0 new messages