FAQ on mempool limiting

321 views
Skip to first unread message

Mike Hearn

unread,
Sep 9, 2015, 8:16:08 AM9/9/15
to bitcoin-xt
I wrote an article on the new memory pool limiting feature here:


It contains some basic technical info and I also argue that random selection actually makes more sense than ordering by fee. I feel about 80% confident in this argument but there might be something I've overlooked. Please do argue with me if you'd like to see a switch to shoot-lowest-fee in future.

Robin

unread,
Sep 9, 2015, 8:49:08 AM9/9/15
to bitco...@googlegroups.com
Considering your introduction:
"Every Bitcoin node on the network holds unconfirmed transactions in RAM and removes them once they are confirmed. From then on they’re held only on disk."

To prevent confusion, perhaps you should define what "dropped", "removing" and "forget" means later on. Rejected? Written to disk? Silently ignored?
--
You received this message because you are subscribed to the Google Groups "bitcoin-xt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoin-xt+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mike Hearn

unread,
Sep 9, 2015, 8:52:24 AM9/9/15
to bitcoin-xt
They are all synonyms for erased. Did you interpret it differently, or are you worried other people may be confused?

Robin

unread,
Sep 9, 2015, 8:57:29 AM9/9/15
to bitco...@googlegroups.com
For Bitcoin functionality I think it's important to be clear what the impact will be on the wire.
Imo better to be explicit. Erased from RAM? Erased from the node completely? Before or after relaying? Does it mean legitimate TX can be lost in cyberspace forever?

I can always read the latest code, so I'm suggesting it to prevent acquisitions and panicking people.

NxtChg

unread,
Sep 9, 2015, 9:04:16 AM9/9/15
to bitco...@googlegroups.com

>I wrote an article on the new memory pool limiting feature here...

1. Would be nice if you could add a paragraph with rationale for random removal vs simply dropping incoming txs until there's free space available.

2. If spam transactions often create long chains, maybe drop longest chains first?

Or do you think spammers will adapt and start sending normal txs, so it's not worth it? Still, might make it more difficult to create lots of spam txs...

In any case, that's the question that popped up in my head, so you could also address it just to make it clear for the readers why you don't drop such chains.


Tom Zander

unread,
Sep 9, 2015, 9:05:34 AM9/9/15
to bitco...@googlegroups.com
Did you not feel the FAQ addresses that? Or do you fear others may not
understand it?

The topic is not easy to understand, but all of your points were addressed.
Tom Zander

Robin

unread,
Sep 9, 2015, 9:20:03 AM9/9/15
to bitco...@googlegroups.com
I do think it's addressed, but too implicitly.
It's not very FAQ-like if you must read 6-7 paragraphs to understand the answer.

It would imo be better to be explicit and redundant by adding in a recap/tl;dr question after the long answer. So I would suggest: "Q: What happens to the TX that are removed?" "Q: Can my valid TX be lost because of this?"

Exactly because it's not easy to understand I don't think we should rely on having people read the full article paying very close attention so they can infer meanings from choice of words near the end.

Mike Hearn

unread,
Sep 9, 2015, 9:26:03 AM9/9/15
to bitcoin-xt
I'll take a look at making it clearer a bit later.

--

Tom Harding

unread,
Sep 9, 2015, 9:47:42 AM9/9/15
to bitco...@googlegroups.com

I think it's well reasoned.

It's not a proof, but those are pretty hard to come by complex systems with countless humans and computers interacting.

My biggest worry is that usage grows too fast even for BIP 101 (ridiculous optimist here).  Then better handling of the full-pool situation will be needed.

Chris Chua

unread,
Sep 10, 2015, 8:55:33 AM9/10/15
to bitcoin-xt
One possible middle ground for a mempool eviction policy would be to pick two random transactions, then evict the one with lower fee/kB. This biases evictions towards low-fee transactions, while retaining randomness. More specifically, the highest fee transaction will never get evicted, while the lowest fee transaction has twice the probability of eviction (compare to completely random selection).

This eviction policy has the advantage of discouraging low fee/kB spam - an attacker must use a fee of at least the median of everyone's mempools, or they will find their spam will get evicted most of the time. At the same time, the inherent randomness means that users won't suddenly see their transactions simultaneously drop off everyone's mempools. The disadvantage is that it's more complex than random eviction.

This approach can be generalised to bias the eviction further towards low-fee transactions: randomly pick n transactions, evict the lowest fee/kB one. The limit of this is setting n = entire mempool, where the policy reduces to "shoot-lowest-fee".

Gavin Andresen

unread,
Sep 10, 2015, 9:36:09 AM9/10/15
to bitcoin-xt
On Thu, Sep 10, 2015 at 8:55 AM, Chris Chua <some...@gmail.com> wrote:
One possible middle ground for a mempool eviction policy would be to pick two random transactions, then evict the one with lower fee/kB.

That's a great idea, Chris!  Just barely more complex than evicting one random transaction...

RE: picking n and evicting the lowest fee/kb : spammers try to crowd out valid transactions by paying just a LITTLE more fees than at-the-margin valid transactions. So I suspect keeping n small is the right thing to do.

--
--
Gavin Andresen

Mike Hearn

unread,
Sep 10, 2015, 10:00:31 AM9/10/15
to bitcoin-xt
Agreed - great thinking Chris.

N=3 or 4 sounds reasonable. Though I also liked Tom's pick random byte suggestion. Perhaps both ideas can be combined.

--

Tom Harding

unread,
Sep 10, 2015, 2:57:40 PM9/10/15
to bitcoin-xt

Agreed, the simplicity of the rules is as important as the "optimality" of the result, since the goals themselves can't be totally nailed down anyway.

From simple rules like this, ant colonies have complex emergent behavior.

One consequence would be that under a prolonged pool-full period, this rule tends toward highest-feerate.  A measurement I like better is: expectedBlocks(feerate) * sizeBytes.  This approximates the node's expected cost of keeping it in the mempool.

Robin

unread,
Sep 10, 2015, 9:06:46 PM9/10/15
to bitco...@googlegroups.com

On 10/09/15 20:57, Tom Harding wrote:
>
> A measurement I like better is: expectedBlocks(feerate) * sizeBytes. This approximates the node's expected cost of keeping it in the mempool.
>

I think we can save the CPU of using the expected block projections, because those are based exclusively on the fee/kb. Lower fee/kb is more blocks, higher fee/kb is less blocks. So wouldn't 1/feeRate * sizeBytes produce essentially the same bias?

More importantly though, I wonder which bias makes more sense.
Your suggestion chooses evictions that would be best for local resource management.
Using fee/kb would choose evictions that are best to ensure that: higher fee == more speed and reliability.

Both make a valid case and I suspect both have roughly the same complexity for wallets to take this into account.


Nick ODell

unread,
Sep 10, 2015, 10:21:14 PM9/10/15
to bitcoin-xt
My concern with this change is that it encourages businesses to broadcast transactions in an antisocial manner. If I connect to all of the Bitcoin XT nodes every ten minutes, and send them the transactions I want confirmed, it will mean that my transactions have a higher chance of getting into the next XT block.

I agree that letting the OS manage memory by killing Bitcoin is a worse solution.

Christophe Biocca

unread,
Sep 11, 2015, 8:37:32 PM9/11/15
to bitcoin-xt
Such a concern (and the ability to make a node waste bandwidth that Peter Todd seems really worried about) can be addressed by making the eviction picking be deterministic for a given node:

1. Generate a random 256 bit string S.
2. Sort all transactions by `txid xor S`.
3. This is your eviction list, in order. A new transaction gets inserted in the right position (and is rejected if `txid xor S` is smaller than that lowest transaction in the set).

Because S is node-specific, we still keep the nice behaviour of transactions having a higher chance of survival.

But if a transaction gets evicted by a node, rebroadcasting the same transaction won't put it back in.

This is also compatible with Chris Hua's idea (we just compare the n-lowest transactions and drop the one with lowest fee).

Robin

unread,
Sep 12, 2015, 5:41:56 AM9/12/15
to bitco...@googlegroups.com

On 12/09/15 02:37, Christophe Biocca wrote:
> Such a concern (and the ability to make a node waste bandwidth that Peter Todd seems really worried about) can be addressed by making the eviction picking be deterministic for a given node:
>
> 1. Generate a random 256 bit string S.
> 2. Sort all transactions by `txid xor S`.
> 3. This is your eviction list, in order. A new transaction gets inserted in the right position (and is rejected if `txid xor S` is smaller than that lowest transaction in the set).
>
> Because S is node-specific, we still keep the nice behaviour of transactions having a higher chance of survival.
>
> But if a transaction gets evicted by a node, rebroadcasting the same transaction won't put it back in.
>
> This is also compatible with Chris Hua's idea (we just compare the n-lowest transactions and drop the one with lowest fee).
>

It's an interesting approach, but it has the downside of making it possible that TX we've never seen before are rejected immediately.
To prevent this, you would need to broadcast it even though you won't store it in your mempool for the DoS protection and you'd be back at 'wasting bandwidth'.

Another way would be to actually send a rejection message to the sender of the TX and expect them to broadcast it to other nodes.
That could cause trouble with older versions (unless you want to assume they will have multiple connections anyway).

--

What about a bloom filter + (optional) misbehave tracking?
Compared to sorting through tens of thousands of TX it would be cheaper in terms of CPU.

1. We start with a new bloom filter and an eviction counter.
2. Every time an eviction is triggered, the bloom filter is updated with the evicted TX and the eviction count goes +1.
3. If a TX comes in that matches the bloom filter, reject it and add a small misbehave value to that peer.
4. Reset the bloom filter when the blockchain is updated, or when the eviction counter reaches a certain value.
(That certain value is to make sure the false-positive rate of the bloom filter stays small.)

While there is a false-positive chance, I think it can be kept under control better than Christophe's ordering.
Where the worst case of the bloom filter can be kept in check by having a good threshold for a reset, while Christophe's worst case is that the lowest `txid xor S` is very high (2^256 is an astronomical number that could easily fit your entire mempool in the higher numbers).

Since it should be very hard to predict for a peer what our bloom filter looks like, or when it gets updated, this will deter from spam broadcasting TX or trying to game the filter.

A side effect of adding the misbehave system to this could be that very long running nodes may ban peers for accumulated false-positives and bad luck.
While I don't have exact numbers of how long this would take, it would definitely be long enough for them to build a stable list of other connections they can fall back to.


Christophe Biocca

unread,
Sep 12, 2015, 10:56:35 AM9/12/15
to bitcoin-xt, bea...@oscp.info
> It's an interesting approach, but it has the downside of making it possible that TX we've never seen before are rejected immediately. 
> To prevent this, you would need to broadcast it even though you won't store it in your mempool for the DoS protection and you'd be back at 'wasting bandwidth'. 

Isn't that the case already with Gavin's code? Low priority low fee transactions will not make it in and will be dropped immediately. Are those relayed at all? If you take the n-lowest transactions and compare them together before making an eviction decision (based on fee and priority), there's still a way for a tx to get in if its fee is high enough.

> worst case is that the lowest `txid xor S` is very high (2^256 is an astronomical number that could easily fit your entire mempool in the higher numbers)

That only happens if txids are already clustered together (since `xor S` flips the same bits in the same way). The pathological case is that the entire network mempool transactions' txids share a 16 bit prefix or something. Then they'll be at the upper end of the range on half of the nodes, and at the lower end on the other half, which guarantees a reasonable outcome. There are more robust techniques that can be employed (such as sha256(txid + S)) which completely randomize the order that each node has, at the cost of slightly more CPU per transaction (which isn't really an issue).

The core of the idea is to pre-determine all eviction decisions at boot (in a way that's different for each node), so that no amount of deliberate or accidental re-submitting will change what we do.

> A side effect of adding the misbehave system to this could be that very long running nodes may ban peers for accumulated false-positives and bad luck.
> While I don't have exact numbers of how long this would take, it would definitely be long enough for them to build a stable list of other connections they can fall back to.

The only downside I can see is that if a peer chooses you as their "trickle node" in a DOS situation, they'll send you their entire mempool. If you've got low overlap (because everyone's pools got out of sync), then you'll either really increase their "misbehaving" count, or let through a lot of transactions you've previously rejected.

Tom Zander

unread,
Sep 12, 2015, 12:46:13 PM9/12/15
to bitco...@googlegroups.com
On Friday 11. September 2015 17.37.32 Christophe Biocca wrote:
> Such a concern (and the ability to make a node waste bandwidth that Peter
> Todd seems really worried about) can be addressed by making the eviction
> picking be deterministic for a given node:

I don't want it to be deterministic :)

> But if a transaction gets evicted by a node, rebroadcasting the same
> transaction won't put it back in.

This would be faulty behavior. If there is space now, then the transaction
should be accepted.

We also want the memory pool to keep getting refreshed so we don't end up
sitting on a transaction for days that apparently the miners don't like. So
this random eviction (and face it, next year the chance is 1 in 400000) is the
healthy thing to do.


The complaints that Peter made really just stem from a different way of
looking at zero-conf and related technologies. The re-sending of a 500 bytes
transaction is in the end just a rounding-error on the bandwidth measurements.
It doesn't even show up on measurements.
--
Tom Zander

Christophe Biocca

unread,
Sep 12, 2015, 1:52:47 PM9/12/15
to bitcoin-xt, t...@thomaszander.se
> I don't want it to be deterministic :)

Why? Is there a particular reason? I can see why we don't want to do sort-by-fee: It drops perfectly good transactions from the entire network and thus they never get a chance to confirm a handful of blocks later. But idempotent behaviour is usually a good idea.

> This would be faulty behavior. If there is space now, then the transaction should be accepted.

It would get accepted. The sorting just gives you a guarantee that unless space was freed up since eviction, it won't re-accept the transaction.

> We also want the memory pool to keep getting refreshed so we don't end up sitting on a transaction for days that apparently the miners don't like. So this random eviction (and face it, next year the chance is 1 in 400000) is the healthy thing to do.

But random eviction is not something that happens outside of DOS situations. If your goal is to clean up txes eventually, then you need the mempool janitor jgarzik wrote, or something like it.

> The re-sending of a 500 bytes transaction is in the end just a rounding-error on the bandwidth measurements. It doesn't even show up on measurements.

I'm not worried about that one, I'm trying to deal with the concern Nick ODell wrote, where people have an incentive to broadcast transactions repeatedly to get theirs to stick in mempools. It seems reasonable to me that we shouldn't give an advantage to a frequently-broadcast transaction.

Tom Zander

unread,
Sep 12, 2015, 2:11:11 PM9/12/15
to bitcoin-xt
On Saturday 12. September 2015 10.52.46 Christophe Biocca wrote:
> > I don't want it to be deterministic :)
>
> Why? Is there a particular reason? I can see why we don't want to do
> sort-by-fee: It drops perfectly good transactions from the entire network
> and thus they never get a chance to confirm a handful of blocks later. But
> idempotent behaviour is usually a good idea.

Because this is not a centralized system.

One node dropping a random transaction means there is a pretty darn good
chance that one of the 6000 other nodes still has it.

AI researchers found that the best way to solve issues is to do the logical
thing, but use randomness. As long as you then use a feedback loop that
checks for greatest success, you get to benefit that attackers can't predict
what you will do. You also benefit from self-correcting systems that when your
choice wasn't good, someone else will have kept the transaction.

Animals (and humans) use randomness all the time. Its really not a weird
thing.


> > This would be faulty behavior. If there is space now, then the
> > transaction should be accepted.
>
> It would get accepted. The sorting just gives you a guarantee that *unless*
> space was freed up since eviction, it won't re-accept the transaction.

But this avoids the self-correcting behavior that in a distributed setting
helps avoid randomness being destructive.

> > We also want the memory pool to keep getting refreshed so we don't end up
> > sitting on a transaction for days that apparently the miners don't like.
> > So
> > this random eviction (and face it, next year the chance is 1 in 400000) is
>> the healthy thing to do.
>
> But random eviction is not something that happens outside of DOS
> situations. If your goal is to clean up txes eventually, then you need the
> mempool janitor jgarzik wrote, or something like it.

We are talking about the usecase of an overfull mempool. Where space is
scarce. The situation where this is not happening, and we have plenty of
space, there it doesn't really matter.


> > The re-sending of a 500 bytes transaction is in the end just a
> > rounding-error on the bandwidth measurements. It doesn't even show up on
> > measurements.
>
> I'm not worried about that one, I'm trying to deal with the concern Nick
> ODell wrote, where people have an incentive to broadcast transactions
> repeatedly to get theirs to stick in mempools. It seems reasonable to me
> that we shouldn't give an advantage to a frequently-broadcast transaction.

That concern doesn't really bother me. If you end up sending the same thing
every ten minutes, I think you will get banned for some time by the 5 year old
code that satoshi wrote.

--
Tom Zander

Christophe Biocca

unread,
Sep 12, 2015, 4:16:24 PM9/12/15
to bitcoin-xt, t...@thomaszander.se
It's pretty clear we're talking past each other, so I'll make one last attempt. Sorry about spamming everyone.

> One node dropping a random transaction means there is a pretty darn good chance that one of the 6000 other nodes still has it.

I agree, which is why "S" is picked randomly and hence is different for each node. This guarantees each node drops different transactions when filling up.

> you get to benefit that attackers can't predict what you will do

Also covered, because "S" is internal, the attacker has no visibility into what the node is dropping and why. It's the exact same type of protection as random picking (which uses a deterministic RNG with a random seed) gives you.

> But this avoids the self-correcting behavior that in a distributed setting helps avoid randomness being destructive. 

What self correcting behaviour? Receiving a transaction twice means it has 2 times the chance of making it into the mempool as a transaction received once. Is that good? If so, why? My proposal removes that behaviour and nothing else.

It seems to me that if the goal is to maximize the chance of a transaction surviving somewhere on the network, having each node order things differently is good, but having a single node have no consistency with its past actions makes it easier, not harder, to mess with that node's mempool.

Robin

unread,
Sep 12, 2015, 6:49:32 PM9/12/15
to bitco...@googlegroups.com
On 12/09/15 16:56, Christophe Biocca wrote:
> Isn't that the case already with Gavin's code? Low priority low fee transactions will not make it in and will be dropped immediately. Are those relayed at all? If you take the n-lowest transactions and compare them together before making an eviction decision (based on fee and priority), there's still a way for a tx to get in if its fee is high enough.
Wow I see, unless it's a priority transaction your fee needs to be >= estimateFee for 2 blocks. I'm not sure I agree with that mechanism. The fee estimations were increased a fair bit by the recent spam.
In any of the solutions I would for the sake of relaying suggest to avoid dropping them immediately. Particularly dropping low fee TX would drive up the required fees and make the network susceptible to fee manipulation with these attacks. Imo this should be mostly left up to the miners.

> That only happens if txids are already clustered together (since `xor S` flips the same bits in the same way). The pathological case is that the entire network mempool transactions' txids share a 16 bit prefix or something. Then they'll be at the upper end of the range on half of the nodes, and at the lower end on the other half, which guarantees a reasonable outcome. There are more robust techniques that can be employed (such as sha256(txid + S)) which completely randomize the order that each node has, at the cost of slightly more CPU per transaction (which isn't really an issue).
Agreed, normally speaking there should be a good spread, boosting the spread with extra hashing would make it possible to further even it out.
The extra CPU I think we should avoid if possible. Precisely because TX volume is the thing we hope to grow as Bitcoin becomes more and more successful.

> The only downside I can see is that if a peer chooses you as their "trickle node" in a DOS situation, they'll send you their entire mempool. If you've got low overlap (because everyone's pools got out of sync), then you'll either really increase their "misbehaving" count, or let through a lot of transactions you've previously rejected.
Actually a lower overlap would cause less trouble, because TX we haven't seen would only be in our bloom by false-positive, so not increasing misbehaving by much. It's only those we've previously evicted.
But you're right with the 50k TX setting the bloom would easily take in 20-40k TX worth of rejections. So misbehave is probably not a good addition to this. The bloom would still work without it though.


Tom Zander

unread,
Sep 12, 2015, 7:09:14 PM9/12/15
to bitco...@googlegroups.com
On Saturday 12. September 2015 13.16.24 Christophe Biocca wrote:
> It's pretty clear we're talking past each other,

Well, yes and no.

You seem to be arguing that your suggested solution is just as good as the one
we have.
Whereas I'm arguing that the solution we have is at least as good as the one
you suggested, maybe better. But lets not argue about little details. They
would need to be tested.

See, the same argument, kinda...

At the end of the day I'd love to see code that improves a little bit based on
the concepts we already have.
Going in a different direction, well, I guess we need a really compelling
reason to do that.
And we seem to agree that your idea is not bad, but not compellingly better.
--
Tom Zander

NxtChg

unread,
Sep 13, 2015, 2:20:52 AM9/13/15
to bitco...@googlegroups.com
>And we seem to agree that your idea is not bad, but not compellingly better.

I don't agree. His idea _is_ compellingly better.

You seem to have missed his point and then just dismissed it instead of trying to understand.

Both his idea and the bloom filter idea solve the main problem that people will rebroadcast their transactions over and over to improve their chances.

The solution we have is _not_ better, if rebroadcasting is a concern.

Now, whether it's a valid concern or not is another topic, but if it is, both ideas are better than what we have now.

Robin

unread,
Sep 13, 2015, 12:12:00 PM9/13/15
to bitco...@googlegroups.com
Alright, I believe we shouldn't drag this discussion on for too long and implement something sooner rather than later, so let me attempt to give an overview/summary and work towards a conclusion.

--

I think the priorities of whichever suggestions make it in should be:
  1. Must be cheap and simple to implement for all parties. (Including code that wants to reduce the risk that valid TX they generate are evicted and dropped.)
  2. Should minimise the influence our resource management has on the contents of the network-wide mempool.
  3. If we cannot avoid influencing it, prioritize algorithms that maximise the odds for that influence to be negated by other nodes.
  4. If we cannot avoid influencing it even with multiple nodes, prioritize higher fees.

Do you agree on these priorities?

--

Based on those priorities, my take on some of the suggestions.

Moving from #transactions limit to a #bytes limit to trigger evictions(from #61):

  1. No difference from current code.
  2. No difference from current code.
  3. No difference from current code.
  4. No difference from current code.

Why use it? Because it will be easier for users to set the right value that won't crash their nodes.
RAM is measured in bytes, so the mempool limit should be too.

Picking 2-4 random samples instead of 1 to compare by fee/kb before evicting(by Chris Chua):

  1. O(1) algorithm, like current code.
  2. No difference from current code, unless the sample size is too large.
  3. No difference from current code, unless the sample size is too large.
  4. Significantly better than current code!

This improves the current code without drawbacks. Love this idea.

Random seed based deterministic ordering for evictions(by Christophe Biocca):

  1. Sorting algorithm will be more CPU intensive for accepting new TX to the mempool, considering thousands of elements in the set are to be expected. Increased complexity seems like a reasonable trade-off.
  2. Significantly better, because it can replace the current dropping of new TX when mempool is full.
  3. Significantly better, because the random seed will assure more negation is possible than current fee estimation code based solution.
  4. Regression from current code, because it will primarily sort on the random seed as opposed to fees.

While it does come with complexity and CPU cost, it's also a significant improvement in reducing the influence we have on the network-wide mempool. Would lean in favour of implementing it, but it's a tough trade-off.

Bloom filter based re-insert prevention(by me, without misbehave version):

  1. Checking against the bloom filter will be marginally, though still O(1), CPU intensive for accepting new TX to the mempool. Increased complexity seems like a reasonable trade-off.
  2. Significantly better, because it can replace the current dropping of new TX when mempool is full.
  3. Significantly better, because new TX will be relayed first before being eligible for eviction (except for false-positives).
  4. Regression from current code, because it will prioritize #2 and #3. Can be solved with Chris Chua's suggestion though.

This is an alternative solution to Christophe Biocca's seed based ordering. The advantages are largely the same, but I think this is cheaper to implement. Complexity is similar.

--

Because of this, my current favourite combination is:

  • Moving from #transactions limit to a #bytes limit to trigger evictions.
  • Picking 2-4 random samples instead of 1 to compare by fee/kb before evicting.
  • Bloom filter based re-insert prevention (without misbehave version).

Perhaps I'm biased to pick my own alternative over Christophe's but there you have my reasoning behind it.

NxtChg

unread,
Sep 13, 2015, 12:25:30 PM9/13/15
to bitco...@googlegroups.com
>Alright, I believe we shouldn't drag this discussion on for too
>long and implement something sooner rather than later

First, we should make sure this is indeed a valid concern :)

I personally doubt that.

1) in case of an attack the number of spam txs is much higher than the number of real txs, this means the chances of eviction are much greater for spam txs.

(but this requires either somebody good with math or a simulation to confirm)

2) your transaction will still live in other nodes, again this needs to be estimated/simulated.

3) are there any penalties for rebroadcasting currently in the code?

4) how often will spam attacks happen, especially once the block limit is raised?

Some other arguments pro/con?

Robin

unread,
Sep 13, 2015, 12:57:30 PM9/13/15
to bitco...@googlegroups.com
On 13/09/15 18:25, NxtChg wrote:
Alright, I believe we shouldn't drag this discussion on for too 
long and implement something sooner rather than later
First, we should make sure this is indeed a valid concern :)

I personally doubt that.
The way I interpret the current XT code, it should still be possible to crash nodes if you can avoid the #transactions limit with gigantic TX instead.
And if XT is universally deployed, the way evictions currently work is open to fee manipulation.
Not particularly desirable if you ask me.


1) in case of an attack the number of spam txs is much higher than the number of real txs, this means the chances of eviction are much greater for spam txs.

(but this requires either somebody good with math or a simulation to confirm)
The problem with the "spam txs" classification is that they are still valid transactions, or they wouldn't be considered for the mempool in the first place.
Ideally, that they are used in an attack shouldn't matter, they should be processed as usual because they are valid.
That's what (as I see it) creates a need for a local bias that doesn't turn into a global bias.


2) your transaction will still live in other nodes, again this needs to be estimated/simulated.
This relies completely on the assumption that your source of random numbers is uniformly random and highly unlikely to be the same as other nodes.
If you cannot rely on your source of randomness, there would be much more serious problems relating to the cryptography itself rather than anti-DoS.
If you want simulations on this, you would have to look into the properties of /dev/random and the likes. But I think this is overkill.

Another assumption is that the node you're refusing TX from has access to other nodes they can relay to.
Again if this is not the case there are bigger problems, (like the zero-trust principles being thrown out the window).

If you're running for example a service for mobile clients, you would better just have sufficient RAM to never evict.


3) are there any penalties for rebroadcasting currently in the code?
Well at least NOT the misbehave system. This is only used against invalid TX or blocks. I'm not aware of any other penalties in the current code. Perhaps someone else knows.


4) how often will spam attacks happen, especially once the block limit is raised?
I don't think you can answer that question without asking, do they stand anything to gain from an attack?
If there are reasonably effective anti-DoS systems in place they are much less likely to gain anything from an attack and thus the attacks are less likely to happen.
Though your question does have a valid implication: How much does it cost us to have these protections when we're not under attack?

My bloom filter can be strongly optimised and even skipped altogether when evictions are not happening.
While Christophe's seed ordering will need to be done either all the time, even when not close to the limit, or lazily meaning you'll have to sort 10.000's of TX all at once and get a crazy CPU spike.
(Unless I'm misunderstanding, Christophe want to comment on this?)

Some other arguments pro/con?
You tell me :] I'm up for discussing it.

Tom Zander

unread,
Sep 13, 2015, 1:12:58 PM9/13/15
to bitco...@googlegroups.com
On Sunday 13. September 2015 19.25.28 NxtChg wrote:
> >Alright, I believe we shouldn't drag this discussion on for too
> >long and implement something sooner rather than later
>
> First, we should make sure this is indeed a valid concern :)
>
> I personally doubt that.

agreed.

> 1) in case of an attack the number of spam txs is much higher than the
> number of real txs, this means the chances of eviction are much greater for
> spam txs.
>
> (but this requires either somebody good with math or a simulation to
> confirm)

Doesn't really need a whole lot of math; in the past we've seen that well over
99% of the transactions were "spam".
As such the chance of evicting a not-spam transaction is < 1%.
And statistics tells us that if we repeat that 1000 times, the chance each
single time is still the same < 1%.

So your intuition is valid.

If we end up using the 4 samples and throw away the lowest fee solution
decided earlier, the chances of a of evicting the wrong one drop even further.
It now requires us to choose 4 valid transactions by random in order to make
the wrong decision. Which means that the chances are pow(1%, 4) = 1e-8

> 2) your transaction will still live in other nodes, again this needs to be
> estimated/simulated.

With 6000 nodes the chance of them all evicting your correct transaction
become a statistical impossibility.

--
Tom Zander

Robin

unread,
Sep 13, 2015, 2:10:59 PM9/13/15
to bitco...@googlegroups.com
On 13/09/15 19:10, Tom Zander wrote:
> On Sunday 13. September 2015 19.25.28 NxtChg wrote:
>>> Alright, I believe we shouldn't drag this discussion on for too
>>> long and implement something sooner rather than later
>> First, we should make sure this is indeed a valid concern :)
>>
>> I personally doubt that.
> agreed.

If you look at logs spitting out nothing but this:

2015-09-13 18:04:43 mempool full, dropped bee9552e6ae0094eaa00b9075ec3aa1ac1d7f6e50ab3695353f916a1443bac91
2015-09-13 18:04:43 mempool full, dropped 31190a29ce566b63dc8397d400da0d1091d9f9de07f12efaa2d54dc57a1a11b2
2015-09-13 18:04:43 mempool full, dropped 104c7a40c9d3fe0fd0bcf6e017643f42683a52d12cffcdfa08acd3946addeb64
2015-09-13 18:04:43 mempool full, dropped 151e1cb845b272e1db644bb62e006ddba2c132c6f29bbaea67ee8908d5431497
2015-09-13 18:04:43 mempool full, dropped dc889ecf970743da71a7264b53cd553922a8e7f0a5417a3331eb9f93bb17198a
2015-09-13 18:04:43 mempool full, dropped ed3429ef58a22993234122b9e388f3d32cf5025fb884ad8f351a6d65ec91f808
2015-09-13 18:04:43 mempool full, dropped 151e1cb845b272e1db644bb62e006ddba2c132c6f29bbaea67ee8908d5431497
2015-09-13 18:04:43 mempool full, dropped 7282ea6db14442bba512ceaf6bc20779e6c79da60de97b91877a5f1ae6ac24c0
2015-09-13 18:04:44 mempool full, dropped 270beee5cfd45120aec1cef664bf810ed1da2fbfc7d3943bb71c5191bcc13b70
2015-09-13 18:04:44 mempool full, dropped 73115fa1df216484eb66f9ec148dffb7a980518d8aa8ef952f5ea88322f36ffc
2015-09-13 18:04:44 mempool full, dropped 151e1cb845b272e1db644bb62e006ddba2c132c6f29bbaea67ee8908d5431497
2015-09-13 18:04:44 mempool full, dropped 151e1cb845b272e1db644bb62e006ddba2c132c6f29bbaea67ee8908d5431497
2015-09-13 18:04:44 mempool full, dropped bae25dc2f77a587c0e2b88af1e926e329a816fa7cd1e9c7ab8286a4b4fa9652d
2015-09-13 18:04:44 mempool full, dropped 82a9f63688035665c1fd829ad836b1c65fc6b62098f23fbb8f1a540f6776386a
2015-09-13 18:04:44 mempool full, dropped ec4c3c3095c3dc3ba6e183b41d4ae1e4d34bd8e959dcb88b7f15da58be6202a1
2015-09-13 18:04:44 mempool full, dropped ed3429ef58a22993234122b9e388f3d32cf5025fb884ad8f351a6d65ec91f808
2015-09-13 18:04:44 mempool full, dropped 151e1cb845b272e1db644bb62e006ddba2c132c6f29bbaea67ee8908d5431497
2015-09-13 18:04:44 mempool full, dropped 104c7a40c9d3fe0fd0bcf6e017643f42683a52d12cffcdfa08acd3946addeb64
2015-09-13 18:04:45 mempool full, dropped 151e1cb845b272e1db644bb62e006ddba2c132c6f29bbaea67ee8908d5431497
2015-09-13 18:04:45 mempool full, dropped 270beee5cfd45120aec1cef664bf810ed1da2fbfc7d3943bb71c5191bcc13b70
2015-09-13 18:04:45 mempool full, dropped 6c59845d41aa168b79e05b66e6a222d9f3d5b8eb109bd05f769548ed76046b33
2015-09-13 18:04:45 mempool full, dropped 7a8aef5ed7b8faaab0eaf349ee66be0f6ed18e87fccf024142aeef4afd47e3c9
2015-09-13 18:04:45 mempool full, dropped ab822ea9ec7836dab2114487c0ad90b40acaaa1f673ed9149cb3184af58c9d72
2015-09-13 18:04:45 mempool full, evicted e20ce0e23e4c8f64656021347905a3962dfcc59d2a2ff22d1ff3db03a00aad5c

You don't suppose we should make sure those don't inflate prices or vanish from the network?
These are still valid TX, spam or not.

Christophe Biocca

unread,
Sep 13, 2015, 2:12:28 PM9/13/15
to bitcoin-xt, bea...@oscp.info
Robin,

I completely agree with the general arguments you've made and the set of priorities you've identified. I do think it's important that, if an attacker's transactions are worth including by any miner, then they will. It helps ensure that DOS-attack attempts are expensive.



> While Christophe's seed ordering will need to be done either all the time, even when not close to the limit, or lazily meaning you'll have to sort 10.000's of TX all at once and get a crazy CPU spike. (Unless I'm misunderstanding, Christophe want to comment on this?)

Technically, you'd use a min-heap or somesuch, since 90% of the activity occurs at the bottom of it. That's still O(log n) per transaction insertion though. Lazily heapifying an unordered list in O(n), which isn't crazy, but isn't great either.

As long as the Misbehaving() call is removed, I have no real objections to your approach (and it can be improved by combining with Chris Hua's idea, as well). I just lack the ability to evaluate the claims made about the bloom filter performance, recall, and false positive rate, since it's not something I'm well versed in. If you're confident it's achievable then I say it's worth implementing.

My reason for favoring the sort approach is that we could just cannibalize the sort-by-fee code. Once you have a sort in place, changing the criteria (or set of criteria) is easy.

Robin

unread,
Sep 13, 2015, 2:56:16 PM9/13/15
to bitco...@googlegroups.com
On 13/09/15 20:12, Christophe Biocca wrote:
> I just lack the ability to evaluate the claims made about the bloom filter performance, recall, and false positive rate, since it's not something I'm well versed in. If you're confident it's achievable then I say it's worth implementing.

The bloom filter properties are well defined. Imo a very interesting thing to learn about, since blooms are already in use for various different things in the bitcoin code and elsewhere (DHT for example).
https://en.wikipedia.org/wiki/Bloom_filter lists formulas to determine false-positive rates and how to find an optimum. As well as performance details.

The short version (if I've gotten it right myself, but that can be verified :P).
Recall: 100%
Performance:
query: O(k) where k=#hashing functions used
insert: O(k) where k=#hashing functions used
False-positive rate: The more RAM or CPU you're willing to allocate, the lower.
RAM: Negligible. Just 3 bytes allows you to store 100000's of elements with FP < 1e^-6, given an optimal k. Using more RAM you can lower your k and save CPU instead.


NxtChg

unread,
Sep 13, 2015, 3:14:58 PM9/13/15
to bitco...@googlegroups.com

>RAM: Negligible. Just 3 bytes allows you to store 100000's of elements with FP < 1e^-6, given an optimal k.

This can't be right, even from very basic considerations: 3 bytes can store 16 M unique numbers. 100 K / 16 M is not 1e-6.

16 items I would believe in :)

Bloom filters are not panacea, they are actually quite expensive, especially with low FP rates.

I have a tool somewhere, which I wrote to calc bloom filter parameters, I can double-check you later.

Also, your bloom filter will have to store txs for quite some time (hours? days?), this means it must be quite big.

NxtChg

unread,
Sep 13, 2015, 3:41:37 PM9/13/15
to bitco...@googlegroups.com

>RAM: Negligible. Just 3 bytes allows you to store 100000's of
>elements with FP < 1e^-6, given an optimal k. Using more RAM you
>can lower your k and save CPU instead.

Actually, there are some calculators online:

http://hur.st/bloomfilter?n=100000&p=1.0E-6

So it's not 3 bytes, it's 350 Kb :)

Anyway, 1 M items will be 35 Mb or 3 Mb for 1e-5, so yeah, bloom size will be negligible.

I forgot mempool limits by transactions and transactions can be quite big in Bitcoin :)

Limiting by bytes seems like a better idea.

Robin

unread,
Sep 13, 2015, 3:58:37 PM9/13/15
to bitco...@googlegroups.com
Yep, got it wrong after all. Quickly looked at the formulas and did it from the top of my head.
Would have been easier to read for me if they had put it code like this site.

Negligible none the less. Would like to try this tool with manual k values. Since CPU will be more of a concern than RAM here.


Christophe Biocca

unread,
Sep 13, 2015, 5:56:04 PM9/13/15
to bitcoin-xt, bea...@oscp.info
Seems like we'd be ok with a 1-5% or even higher false positive rate here, no?
Given that our bloom filter will have different false positives than our neighbours, and that every relayer (including the original sender), relays to many different nodes at once, tx's getting wrongly filtered wouldn't do much harm.

NxtChg

unread,
Sep 14, 2015, 5:55:18 AM9/14/15
to bitco...@googlegroups.com
> Would like to try this tool with manual k values. Since CPU will be more of a concern than RAM here.

You don't have to calculate 20 hashes. Make the bloom size such that each bucket has 2^N items, for example 64 K, then use 2-byte pairs from the hash.

So you will need a hash of 40 bytes, can easily make it with just 2 x 32. Or maybe you could even fit it into 32 bytes, because there are so many buckets.

These parameters can be fine-tuned, but I still think implementing this is not worth it, this is a typical PTS - Peter Todd Scenario: possible, but highly improbable :)

@Christophe: Yep, 1e-6 is overkill.

Robin

unread,
Sep 14, 2015, 6:13:12 AM9/14/15
to bitco...@googlegroups.com
Keep in mind as well the filter doesn't need to fit the whole mempool. My original suggestion was to make it hard to predict and large enough to address re-broadcasting. A filter that can fit 20k items with low FP and resets then is fine too. In fact we can do a random number between like 20k and 15k that is our reset count to further scramble things. Not much left of a rebroadcast attack if it needs to come up with >15k *other* valid TX to flush the filter.

Nick ODell

unread,
Sep 15, 2015, 8:18:05 PM9/15/15
to bitcoin-xt, bea...@oscp.info
Robin,

I agree with those priorities, those summaries of the approaches, and that suggested implementation.

I think the CPU cost of using the bloom filter probably doesn't matter - I'm betting transaction processing times will be dominated by looking up the inputs and checking the signatures.

One more issue/suggested fix:

Someone can re-sign the same transaction repeatedly, giving it different txid's, meaning that one of them will probably bypass your filter. Instead of checking against the bloom filter by txid, you can take the outpoint of the first input, and check that. You can still generate many versions of the same transaction, but doing so is slightly harder.

Robin

unread,
Sep 16, 2015, 2:21:30 AM9/16/15
to bitco...@googlegroups.com
Good point, even though this lets you re-order, the inputs you can use are less than the txids you can generate.

Nick ODell <nick...@gmail.com> schreef op 16 september 2015 02:18:05 CEST:

Dave Scotese

unread,
Sep 16, 2015, 1:47:53 PM9/16/15
to bitcoin-xt
Instead of adding transactions to the bloom filter, you could add TXOUTs that are being spent in the transaction.  Thus, once you evict a transaction, you'd make it impossible to accept any transaction that spends any of the TXOUTs that you've already evicted.

Feel free to point out that I totally misunderstood something :-)

notplato

--
You received this message because you are subscribed to the Google Groups "bitcoin-xt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bitcoin-xt+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
I like to provide some work at no charge to prove my value. Do you need a techie? 
I own Litmocracy and Meme Racing (in alpha).
I'm the webmaster for The Voluntaryist which now accepts Bitcoin.
I also code for The Dollar Vigilante.
"He ought to find it more profitable to play by the rules" - Satoshi Nakamoto

Nick ODell

unread,
Sep 20, 2015, 7:09:12 PM9/20/15
to bitcoin-xt
Small documentation issue: the commit description says that it will keep 10 blocks worth of transactions, but the code actually keeps 25.


On Wednesday, September 9, 2015 at 6:16:08 AM UTC-6, Mike Hearn wrote:
I wrote an article on the new memory pool limiting feature here:


It contains some basic technical info and I also argue that random selection actually makes more sense than ordering by fee. I feel about 80% confident in this argument but there might be something I've overlooked. Please do argue with me if you'd like to see a switch to shoot-lowest-fee in future.
Reply all
Reply to author
Forward
0 new messages