Re: [cockroach-db] intent resolution bogging down large counts. why?

Skip to first unread message

Nathan VanBenschoten

Oct 15, 2018, 10:48:59 AM10/15/18
to Jordan Lewis, CockroachDB
I just ran this myself and found that the first table only spans a single range while the second spans two after they are both populated. The effect of this is that only the second insertion triggers async intent resolution. This might not finish before the `COUNT(*)` so in all likelihood it is running into intents that it's forced to resolve. It's possible that we're hitting the same degenerate case that we see in We should verify. Or the `COUNT(*)` stuck behind intent resolution in the command queue, which has been discussed in

The QueryIntent overhead is concerning for an implicit transaction but is explained by `maxInsertBatchSize`. Since we break the implicit transaction into multiple kv operations, we end up using transactional pipelining, which requires this second pass. We can test this out using the `kv.transaction.write_pipelining_max_batch_size` cluster setting. When I set that below `maxInsertBatchSize`, the statement speeds up by about 40%. That's probably a good change to make before 2.1, because bulk operations shouldn't be using transactional pipelining.

On Mon, Oct 15, 2018 at 8:37 AM Jordan Lewis <> wrote:
I noticed the following strange performance-related behaviour last week: it seems that inserting data in a transaction and the first `COUNT(*)` on that data gets disproportionately slower the larger the data gets.

Take a look at the timings in this transcript:

ro...@> create table a (a int primary key);

Time: 5.078672ms

ro...@> insert into a select * from generate_series(0,100000);
INSERT 100001

Time: 4.766588225s

ro...@> create table b (a int primary key);

Time: 4.780241ms

ro...@> insert into b select * from generate_series(0,1000000);
INSERT 1000001

Time: 1m9.055013685s

ro...@> select count(*) from a;
(1 row)

Time: 64.667288ms

ro...@> select count(*) from b;
(1 row)

Time: 22.201852475s

As you can see, both the `INSERT` and subsequent `COUNT` on the data is more than 10x slower, even though the data is only 10x bigger. The second `COUNT` is *343 times slower* than than the first `COUNT`, despite being on a dataset only 10x greater in size! And the example above was a modest one - I've seen the same test take over a minute for that second `COUNT`. The same also goes for the INSERTS - the second one is more than 10x slower than the first, despite just being 10x larger. What's going on?

I did a little bit of investigation into this with profiling. I found that, in the second profile, a huge amount of time and allocations were attributed to `MVCCMetadata.Unmarshal`, which was generated by several different paths through the intent resolution code - the scan path was doing it directly, but also 3 commands were responsible: ResolveIntentRange, ResolveIntent, and QueryIntent.

QueryIntent, for example, seems to call `MVCCGet` on a single key, which needs to allocate an `MVCCMetadata` and `TxnMeta` field, just to throw it away later. And in this profile, it seems like we were spending a huge amount of time doing those things.

My understanding of intent resolution is rather weak. What's going on here? Are were essentially resolving 1000000 intents during the scan, and that's why we're slow? But why wasn't that happening in the 100000 case? And if that's what's happening, is there some kind of pathological one-at-a-time behavior that's causing things to flow through `QueryIntent` instead of the equivalent batch method?

Or, finally, is this some kind of well-understood anomaly that occurs in large transactions only and that we don't care about? (I think that will have to be documented if so - as a user this behavior is surprising and concerning.)

- Jordan

Jordan Lewis

Oct 16, 2018, 9:38:58 AM10/16/18
to Nathan VanBenschoten,

Nathan VanBenschoten

Nov 26, 2018, 3:09:00 PM11/26/18
to Jordan Lewis,
The QueryIntent overhead should be addressed by These bulk inserts that miss the 1PC fast-path will no longer be accidentally causing the writes to be pipelined. The only remaining cause of perf degradation should be when the writes begin to span ranges. There's an ongoing effort to make async intent resolution faster and cheaper, but this is fairly fundamental to our architecture.
Reply all
Reply to author
0 new messages