Hi Craig,
a few comments about the cost of label scanning, as this topic comes up
repeatedly:
> The problem with this approach is that scanning for each additional label
> adds incrementally and non-trivially to the computational burden.
I think this statement is a misconception or at best only half of the truth,
likely based on the assumption that all existing and future SP wallets would
implement scanning using the same method.
> For each label, there is an EC point addition and comparison against all
> taproot outputs for an eligible transaction.
Iterating through labels and calculating taproot outputs to match for is only
one way to implement scanning. Another one, as laid out in BIP-352 [1], is to
do it backwards: for each taproot output, subtract the output candidate and
check if the result is in the label cache. One advantage of this approach is
that its scanning cost is independent of the number of labels to scan for, as
the lookup time in the cache is constant and efficient if a proper data
structure is used.
To prove that point, I've extended the scanning benchmark of the libsecp SP
module PR #1765 [2] with an _actual_ label cache using a third-party hash
table implementation [3]:
https://github.com/theStack/secp256k1/commit/f9f41adcedaca98aa4f3f65e2782e25b2124bf85The scanning time is measured with three different label cache sizes: no
entries (i.e. empty lookup function stub, no hash-map involved), one entry
("tiny"), and one million entries ("huge"). Running the benchmark for scanning
a transaction with 10 taproot outputs leads to the following output:
$ ./build/bin/bench sp_scan_with_labels
Benchmark , Min(us) , Avg(us) , Max(us)
sp_scan_with_label_cache_empty_L=0 , 61.3 , 61.4 , 61.4
sp_scan_with_label_cache_tiny_L=1 , 61.6 , 61.6 , 61.7
sp_scan_with_label_cache_huge_L=1000000 , 61.6 , 61.7 , 61.7
This shows that the cache lookup cost is negligible even for hundreds of
thousands of labels, compared to other much more costly EC operations in the
scanning function.
With only a handful of labels to scan for, the "iterate over labels" approach
is faster (certainly if only scanning for the change label), but at some
cross-over point the BIP-style scanning with label-cache lookup performs
better [4], and at that point the scanning performance doesn't get worse
anymore, i.e. each additional label is for free.
Given that this alternative scanning approach exists, I don't think it is
appropriate to say that label scanning doesn't scale in general. It is my
understanding that this BIP-style scanning might not work well for light
clients, as they usually don't have the (full) transaction output data to
start with, so it's not applicable for all types of wallets. Still, there
might be full-node wallets in the future with special use-cases that can deal
with hundreds of thousands of labels without a noticeable decline in scanning
performance; if they choose the scanning approach as described in the BIP, and
implement the label cache using an efficient data structure, there shouldn't
be any problems.
Hence, I'd be in favor of allowing label ranges in the SP descriptor format.
Even for use-cases with a smaller amount of labels, it seems to make sense to
shorten the descriptor string in most cases. I can't think of any good reasons
for introducing gaps in the m values for label creation.
I'm still curious to hear some more concrete use-cases for labels in general
and especially large amount of labels. (Admittedly, if we were absolutely
certain that there will never be any use for that, there would indeed not much
point in allowing lots of labels, but we have no way of knowing, and
restricting potential use-cases doesn't seem the right approach to me).
Cheers,
Sebastian
[1]
https://github.com/bitcoin/bips/blob/master/bip-0352.mediawiki#user-content-Scanning, see "check for labels" ff.
[2]
https://github.com/bitcoin-core/secp256k1/pull/1765[3] using the header-only libraries khash.h (for the hash-map) and rapidhash.h
(as hash function), see
https://github.com/attractivechaos/klib/blob/master/khash.h and
https://github.com/Nicoshev/rapidhash/blob/master/rapidhash.h[4] in theory, whenever the number of labels L is larger than twice the number
of taproot outputs N (i.e. L > 2*N), BIP-style scanning should be faster; see
https://gist.github.com/theStack/25c77747838610931e8bbeb9d76faf78?permalink_comment_id=5897811#gistcomment-5897811for benchmarks over a L/N matrix