New Proposal:String Substring Search in Bitcoin Script - OP_ISSUBSTR

510 views
Skip to first unread message

weichu deng

unread,
Mar 17, 2025, 12:32:57 PMMar 17
to Bitcoin Development Mailing List

Dear fellow Bitcoin developers,

 

I am pleased to present a new BIP proposal. This proposal introduces a new opcode for Bitcoin scripts: OP_ISSUBSTR.


Abstract

This BIP introduces two string opcodes, OP_ISSUBSTR and OP_ISSUBSTRVERIFY (similar to the relationship between OP_EQUAL and OP_EQUALVERIFY), to determine whether one string is a substring of another. As these opcodes do not alter any blockchain state, they are secure.

 

Specification

These opcodes check if the second string on the stack is a substring of the first string. If the opcode is OP_ISSUBSTRVERIFY, it verifies the condition and throws an error if false, without retaining the result.

 

Execution Process

  1. Take the two strings at the top of the stack.
  2. Use standard library functions to compare the two strings.
  3. Pop the two strings from the stack and push the result onto the stack.
  4. If the opcode is OP_ISSUBSTRVERIFY, do not push the result.

Motivation

The absence of string operations in Bitcoin scripts restricts its applicability. When developers need string operations for applications, they must simulate these functions through off-chain preprocessing or complex scripts, increasing development difficulty and potentially introducing centralized dependencies.

 

Early Bitcoin versions supported some string operations, such as OP_SUBSTR, which extracted a substring of specified position and length from a string, replacing the original string. For security reasons, OP_SUBSTR was disabled in Bitcoin v0.3.10 and later versions due to a vulnerability (CVE-2010-5137) caused by OP_LSHIFT. To prevent similar overflow vulnerabilities, Bitcoin disabled several opcodes, including OP_SUBSTR. As Bitcoin adoption grows, the limitations of lacking string operations have become more evident. Our proposed OP_ISSUBSTR adds string search functionality to Bitcoin scripts without changing any state, making it safe.

 

Advantages

  1. Enhanced Script Functionality and Flexibility Developers can process string logic on-chain without off-chain reliance. For example, in multi-signature wallets, developers can verify specific signer information or remarks directly in scripts using OP_ISSUBSTR to check transaction comments or signature fields for particular substrings.
  2. Support for String Searching In some scenarios, developers need to verify if parts of a string match a format or contain specific data. For example, checking if a payment transaction's payee name matches a preset value.
  3. Conversion of Non-deterministic Algorithms to Deterministic Ones Some signature algorithms or hash functions produce non-deterministic outputs. OP_ISSUBSTR allows developers to check if these outputs contain known substrings in scripts, converting non-deterministic results to deterministic ones. For example, verifying if a hash value contains a specific hexadecimal sequence (like "0000") to trigger contract logic.
  4. Simplified Address Verification Logic Bitcoin addresses typically start with specific prefixes or suffixes. OP_ISSUBSTR enables direct address format verification in scripts. For example, checking if a transaction target address starts with "bc1" to ensure validity or detect "address pollution" attacks.
  5. Integration with Modern Programming Languages Modern languages widely support string operations. OP_ISSUBSTR makes Bitcoin scripts more aligned with these languages, lowering the barrier for developers.

 

We have provided detailed documentation and a reference implementation in the BIP draft. You can read the full proposal here: https://github.com/Weichu-Deng/bips/blob/OP_ISSUBSTR/bip-yongdong%20wu-OP_ISSUBSTR.md


Thank you for your feedback! 

With respect,

Weichu Deng

 weich...@stu2024.jnu.edu.cn

 

Peter Todd

unread,
Mar 17, 2025, 1:01:16 PMMar 17
to weichu deng, Bitcoin Development Mailing List
On Mon, Mar 17, 2025 at 09:14:05AM -0700, weichu deng wrote:
>
>
> Dear fellow Bitcoin developers,
>
>
>
> I am pleased to present a new BIP proposal. This proposal introduces a new
> opcode for Bitcoin scripts: OP_ISSUBSTR.
>
>
> *Abstract*
>
> This BIP introduces two string opcodes, OP_ISSUBSTR and OP_ISSUBSTRVERIFY
> (similar to the relationship between OP_EQUAL and OP_EQUALVERIFY), to
> determine whether one string is a substring of another. As these opcodes do
> not alter any blockchain state, they are secure.

Bitcoin scripts are about validation. Not computation.

This means that substring search and concatenation are equivalent. For
every script that validates a substring search, you can instead
concatenate the substring with the rest of the string, and validate
equality instead.

Basically speaking:

foobar foo IsSubStr

is equivalent to:

foobar foo bar Cat Equal

A real-world example would be more complex. But I hope that illustrates
my point sufficiently.

--
https://petertodd.org 'peter'[:-1]@petertodd.org
signature.asc

Erik Aronesty

unread,
Mar 18, 2025, 5:24:15 PMMar 18
to Bitcoin Development Mailing List
foobar foo IsSubStr

is equivalent to:

foobar foo bar Cat Equal 


assuming bar is known (this excludes the nondeterministic example above)

weichu deng

unread,
Mar 18, 2025, 5:25:22 PMMar 18
to Bitcoin Development Mailing List
Hi, Peter Todd
Thanks for your feedback. I agree that "Bitcoin scripts are about validation. Not computation."
String search and concatenation are equivalent in some cases, such as in the example you provided.
However, it is still necessary to introduce the OP_ISSUBSTR operation separately.
One example is converting a non-deterministic signature to a deterministic one.
Another case is when the substring in question is located in the middle of the checked string.
CAT cannot replace ISSUBSTR for the following reasons:
  1. The security of CAT is still controversial. It can easily generate overly long strings, potentially causing a stack overflow. Additionally, whether OP_CAT will be restored is still under discussion.
  2. The other substring (bar) must be known in advance.

With respect,

Weichu Deng

weich...@stu2024.jnu.edu.cn

Rijndael

unread,
Mar 18, 2025, 10:28:25 PMMar 18
to weichu deng, Bitcoin Development Mailing List
Stack elements in Taproot are limited to 520 bytes. The current proposal for re-activating OP_CAT includes this restriction: creating a string longer than 520 bytes with CAT will cause the script to fail.

With either CAT or ISSUBSTR, you can either commit to the substrings or provide them at spend-time as witness data (and allow them to be unfixed in the script).

Fixed: FOOBAR BAR ISSUBSTR == FOOBAR FOO BAR CAT EQ
Variable: [witness: FOOBAR] BAR ISSUBSTR == [witness: FOOBAR FOO] BAR CAT EQ


rijndael 


-- 
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bitcoindev/678d40e3-3e22-4d55-82c0-b25ccafb87ecn%40googlegroups.com.

weichu deng

unread,
Mar 19, 2025, 8:06:41 PMMar 19
to Bitcoin Development Mailing List

Hi Rijndael,

 

Thanks for your example

[witness: foobar  foo] bar CAT EQ

 

Yes, the unfixed string can be checked against a target substring in your example. However, if the target substring is located in the middle of the unfixed string, how to check it? In other words, how to have the same function as “foobar ob ISSUBSTR” with CAT if “foobar” is unfixed?

 

For example, suppose that a lucky draw game has the rule: if anyone has a publicKey which includes a special substring "goodluck", he/she will be awarded.

This game can be easily implemented with OP_ISSUBSTR as follow.

- LockScript: OP_DUP goodluck OP_ISSUBSTR...

- UnlockScript: signature publicKey

How to implement it with OP_CAT?

 

Regards

Weichu deng

weich...@stu2024.jnu.edu.cn

Vojtěch Strnad

unread,
Mar 19, 2025, 9:02:05 PMMar 19
to Bitcoin Development Mailing List
Hi Weichu,

You can implement this game by having the user supply in the initial stack the two parts of their public key with the middle "goodluck" removed, and inserting the "goodluck" as part of the script:
  • script: "goodluck" OP_SWAP OP_CAT OP_CAT OP_CHECKSIG
  • initial stack: signature pubkey_left pubkey_right
Hope this helps.

Vojtěch

Javier Mateos

unread,
Mar 31, 2025, 4:41:20 PMMar 31
to Bitcoin Development Mailing List
  The solution of splitting the string and using OP_CAT only works if the exact position of the substring is known. How would a case be handled where the substring could be in any position?  

Pieter Wuille

unread,
Apr 1, 2025, 10:49:07 AMApr 1
to Javier Mateos, Bitcoin Development Mailing List
On Monday, March 31st, 2025 at 4:41 PM, Javier Mateos <javier...@gmail.com> wrote:
The solution of splitting the string and using OP_CAT only works if the exact position of the substring is known. How would a case be handled where the substring could be in any position

Whoever produces the signature/witness for spending the coin always knows the position already, so the script can always be modified to instead take that position as an additional input.

This is a general principle: the point of scripts is verifying provided information, not computing it. As another example, this means that there is no need for a division or square root opcode if one has a multiplication opcode.

--
Pieter

Martin Habovštiak

unread,
Apr 1, 2025, 6:38:31 PMApr 1
to Pieter Wuille, Javier Mateos, Bitcoin Development Mailing List
Hi,

I was dismissing the proposal for the same reason you do but it just occurred to me that substrings might be better than OP_CAT because it's possible to make them unabusable without any arbitrary limit on item size.

The idea is to store stack elements on the heap inside struct { ref_count, length, data[] } and put struct { pointer_to_item, position, length } on the stack. (Rust developers may be familiar with the `bytes` crate that does this.)
Substring operations would only duplicate the pointers with adjusted position and length so there's no way to blow up the stack using them.

Of course there's an exception if OP_SHA256 is used on a shorter slice but the same is true today - you can already write OP_ZERO OP_SHA256 OP_DUP OP_DUP...

Funnily, this can be used to optimize OP_DUP as well which would now add constant amount of memory, so the "exploit" above would need to use two bytes per every large object.

Anyway, while I would personally prefer not having arbitrary limits on item sizes, since the limit is already there, it might not matter. I guess something worth considering if any other future soft fork somehow enables larger items.

Have a nice day!

Martin


Dňa ut 1. 4. 2025, 16:49 Pieter Wuille <bitco...@wuille.net> napísal(a):
--
You received this message because you are subscribed to the Google Groups "Bitcoin Development Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoindev+...@googlegroups.com.

Anthony Towns

unread,
Apr 9, 2025, 5:22:56 AMApr 9
to Pieter Wuille, Javier Mateos, Bitcoin Development Mailing List
I somewhat disagree with this: there are some concerns that are *easier*
to express with different opcodes, and I think that's a factor worth
considering.

This came up with the OP_CAT based proof-of-work faucet [0] -- the
idea there is that you provide a signature and some nonce data, and
when you combine the two and hash the result, that result begins with
some sufficient number of 0 bits (that then gets related back to a
CHECKSEQUENCEVERIFY delay).

OP_CAT is *sufficient* for testing this, because you just CAT the
signature and nonce together and hash them, and can then again CAT the
the 0-bits you expect together with some other data and check that all
of those combined match the hash you calculated earlier.

But it would be more efficient, and a little easier to code, if you
could instead have used SUBSTR/LEFT to pull the initial bytes from
the calculated hash and check that those have the expected number of
leading 0-bits. More efficient, because you don't have to supply all
the trailing bytes of the hash in the witness, and easier to code,
because it's a bit more natural to think of manipulating the hash you
calculated, rather than having to put user-provided data together and
check that that actually matched what you would have extracted from
the hash had you been able to do it that way.

So just like we have unnecessary opcodes like "CHECKSIGVERIFY" (versus
"CHECKSIG VERIFY"), I think it's still worth evaluating "SUBSTR" and
"DIV" on questions of efficiency, even if they're not providing any
additional expressivity -- ie, does it provide a meaningful improvement
either for on-chain validation or when writing/maintaining scripts
compared to other ways of achieving the same goal?

(That said, I don't think "ISSUBSTR" is a great opcode; the original
SUBSTR that specifies exactly where in the original string the substring
appears seems more useful to me)

Cheers,
aj

[0] https://delvingbitcoin.org/t/proof-of-work-based-signet-faucet/937

Reply all
Reply to author
Forward
0 new messages