pt-table-checksum and chunking

Maarten van Baarsel

unread,

Apr 3, 2012, 12:06:22 PM4/3/12

to percona-d...@googlegroups.com

Hi,

I've found that the chunking in pt-table-checksum (2.0.3) needs some adjusting
or perhaps some extra tuning knobs. If you care you can read about my
experience below:

The size and rowcount of my tables vary hugely. Some tables have 1900 rows but
are 1.5GB in size, some have 1M rows but are only small in size, some have
200k rows and are 22G. This is the result of having a few customers that have
really different needs.

The first problem that I have is that --chunk-size-limit needs to be at least
10 in my case, because if I don't set it that high the InnoDB
row-count-guessing causes some tables not to be checksummed, like this:

04-03T16:16:36 Skipping table aalborg.it_attachment because on the master it
would be checksummed in one chunk but on these replicas it has too many rows:
139927 rows on erika
The current chunk size limit is 100000 rows (chunk size=10000 * chunk size
limit=10).

The table has about 50K rows.

So, perhaps the table on the master should be chunked always if the row count
is bigger than chunk-size-limit, even though it isn't larger than
chunk-size-limit * chunk-size?

Or could the chunk-size be a little bit 'harder'?

The second thing I run into is that if I let the default algorithm run its
course the 200k/22G database ends up in one (huge!) chunk, perhaps because the
tables preceding it (alphabetically?) are relatively small and run quickly.
This causes pt-table-sync (when needed) to run for hours on that table when
used with --replicate.

Could the default algorithm perhaps 'reset' the chunk-size on each new table?

thanks,
Maarten.

David Juntgen

unread,

Apr 3, 2012, 12:10:16 PM4/3/12

to percona-d...@googlegroups.com

I too ran into this same issue. if you set --chunk-size-limit=0, the limit is lifted.

--Dave

--
You received this message because you are subscribed to the Google Groups "Percona Discussion" group.
To post to this group, send email to percona-d...@googlegroups.com.
To unsubscribe from this group, send email to percona-discuss...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/percona-discussion?hl=en.

--
David W. Juntgen

Baron Schwartz

unread,

Apr 3, 2012, 12:29:15 PM4/3/12

to percona-d...@googlegroups.com

> So, perhaps the table on the master should be chunked always if the row count
> is bigger than chunk-size-limit, even though it isn't larger than
> chunk-size-limit * chunk-size?

chunk-size-limit is typically a very small number, so it isn't the number you think it is :-)

> The second thing I run into is that if I let the default algorithm run its
> course the 200k/22G database ends up in one (huge!) chunk, perhaps because the
> tables preceding it (alphabetically?) are relatively small and run quickly.
> This causes pt-table-sync (when needed) to run for hours on that table when
> used with --replicate.
>
> Could the default algorithm perhaps 'reset' the chunk-size on each new table?

We would quickly run into problems where the "reset" value is not optimal and every table initially starts out with bad performance.

I think that what you probably need is a ceiling on the chunk size. This might just be another way of saying that you might want to set your --chunk-time to something smaller. I would like you to try that and see if that works for you before we consider making the tool more complex.

- Baron

Maarten van Baarsel

unread,

Apr 4, 2012, 3:53:02 AM4/4/12

to percona-d...@googlegroups.com

On 3/4/2012 18:10 , David Juntgen wrote:

> I too ran into this same issue. if you set --chunk-size-limit=0, the
> limit is lifted.

Thanks for the suggestion; I tried that before, but I got this:

https://bugs.launchpad.net/percona-toolkit/+bug/938660

M.

Maarten van Baarsel

unread,

Apr 4, 2012, 5:20:25 AM4/4/12

to percona-d...@googlegroups.com

On 3/4/2012 18:29 , Baron Schwartz wrote:
>> So, perhaps the table on the master should be chunked always if the
>> row count is bigger than chunk-size-limit, even though it isn't
>> larger than chunk-size-limit * chunk-size?
>
> chunk-size-limit is typically a very small number, so it isn't the
> number you think it is :-)

I admit to guessing the behaviour of the tool based on looking at the
--progress output. But with "--chunk-size-limit 10 --chunk-size 10000" I
saw tables with 50K rows being chunked in one piece, and tables with
300K rows in 30 pieces, like this:

TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
04-04 0 0 48749 1 0 0.975 borg.it_issue
04-04 0 0 127185 15 0 1.303 borg.it_issuemail

(slightly edited to have it not wrap)

I'd have expected the 48K rows table to have at least 4 or 5 chunks too,
based on the --chunk-size I gave.

>> The second thing I run into is that if I let the default algorithm
>> run its course the 200k/22G database ends up in one (huge!) chunk,
>> perhaps because the tables preceding it (alphabetically?) are
>> relatively small and run quickly. This causes pt-table-sync (when
>> needed) to run for hours on that table when used with --replicate.
>>
>> Could the default algorithm perhaps 'reset' the chunk-size on each
>> new table?
>
> We would quickly run into problems where the "reset" value is not
> optimal and every table initially starts out with bad performance.

But the current behaviour only works well when the checksummed tables
are homogeneous in average rowsize.

A switch for it, perhaps? (kitchensink looming).

> I think that what you probably need is a ceiling on the chunk size.
> This might just be another way of saying that you might want to set
> your --chunk-time to something smaller. I would like you to try that
> and see if that works for you before we consider making the tool more
> complex.

That wouldn't catch the case of the sudden 200K/22G database, right? It
would only discover after starting the query that it's taking 800+
seconds (which is how I discovered this by the way, by looking at the
slow query log with pt-query-digest ;)

A small test confirms my thoughts (50K/15G table takes 380+ secs to
checksum, whether I set --chunk-time on 0.1 or 0.01 [1].

I think I'd want an absolute limit on the chunk-size (on the master
only), indeed.

Maarten.

[1] but there is no discernible difference at all in the tools' output,
which is weird, so perhaps I'm making a mistake?

pt-table-checksum --recursion-method
dsn=h=127.0.0.1,P=3306,D=maatkit,t=pt_check_slave_delay_dsns --function
MURMUR_HASH --replicate maatkit.pt_checksum --ignore-tables
"$ignore_tables,security_log" --ignore-databases
"$ignore_databases,mysql,maatkit" --no-check-replication-filters
--chunk-size-limit 30 --chunk-time 0.01 $quiet --max-lag 600
h=127.0.0.1,P=3306

TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
04-04T 0 0 0 1 0 0.067 borg.a_files
04-04T 0 0 3399 1 0 0.323 borg.a_group
04-04T 0 0 27424 1 0 0.321 borg.a_groupprop
04-04T 0 0 223208 1 0 1.404 borg.a_history
04-04T 0 0 7951 1 0 0.142 borg.a_item
04-04T 0 0 10558 1 0 0.324 borg.a_item_group
04-04T 0 0 156 1 0 0.073 borg.a_user
04-04T 0 0 839 1 0 0.323 borg.a_user_user
04-04T 0 0 119 1 0 0.106 borg.a_userprop

0.1 run:

TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
04-04T 0 0 0 1 0 0.062 borg.a_files
04-04T 0 0 3399 1 0 0.069 borg.a_group
04-04T 0 0 27424 1 0 0.069 borg.a_groupprop
04-04T 0 0 223208 1 0 1.400 borg.a_history
04-04T 0 0 7951 1 0 0.319 borg.a_item
04-04T 0 0 10558 1 0 0.064 borg.a_item_group
04-04T 0 0 156 1 0 0.077 borg.a_user
04-04T 0 0 839 1 0 0.069 borg.a_user_user
04-04T 0 0 119 1 0 0.058 borg.a_userprop

Baron Schwartz

unread,

Apr 4, 2012, 9:22:18 AM4/4/12

to percona-d...@googlegroups.com

Maarten,

>> chunk-size-limit is typically a very small number, so it isn't the
>> number you think it is :-)
>
> I admit to guessing the behaviour of the tool based on looking at the
> --progress output. But with "--chunk-size-limit 10 --chunk-size 10000" I
> saw tables with 50K rows being chunked in one piece, and tables with
> 300K rows in 30 pieces, like this:
>
> TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
> 04-04 0 0 48749 1 0 0.975 borg.it_issue
> 04-04 0 0 127185 15 0 1.303 borg.it_issuemail
>
> (slightly edited to have it not wrap)
>
> I'd have expected the 48K rows table to have at least 4 or 5 chunks too,
> based on the --chunk-size I gave.

The tool estimates the size of the table with EXPLAIN and if it looks like it will fit into a single chunk that's not too large, based on --chunk-size-limit multiplied by --chunk-size, it will do it as a single chunk. You told the tool that 100000 rows in a single chunk is fine, so it obeyed you.

It looks to me like the behavior might need to be modified slightly. Instead of doing tables in a single chunk, we might still want to nibble them even when they look smaller than a chunk is permitted to be. However, there will be another edge case that will immediately make performance decrease. This is when a server has hundreds of thousands of tables. The time taken to run the nibbling queries for all of those tables will be much higher than just checksumming whole tables at a time. And we do have users with hundreds of thousands or millions of tables.

>
>>> The second thing I run into is that if I let the default algorithm
>>> run its course the 200k/22G database ends up in one (huge!) chunk,
>>> perhaps because the tables preceding it (alphabetically?) are
>>> relatively small and run quickly. This causes pt-table-sync (when
>>> needed) to run for hours on that table when used with --replicate.
>>>
>>> Could the default algorithm perhaps 'reset' the chunk-size on each
>>> new table?
>>
>> We would quickly run into problems where the "reset" value is not
>> optimal and every table initially starts out with bad performance.
>
> But the current behaviour only works well when the checksummed tables
> are homogeneous in average rowsize.
>
> A switch for it, perhaps? (kitchensink looming).

I am not convinced of the need for adding more complexity -- read on for why:

>> I think that what you probably need is a ceiling on the chunk size.
>> This might just be another way of saying that you might want to set
>> your --chunk-time to something smaller. I would like you to try that
>> and see if that works for you before we consider making the tool more
>> complex.
>
> That wouldn't catch the case of the sudden 200K/22G database, right? It
> would only discover after starting the query that it's taking 800+
> seconds (which is how I discovered this by the way, by looking at the
> slow query log with pt-query-digest ;)
>
> A small test confirms my thoughts (50K/15G table takes 380+ secs to
> checksum, whether I set --chunk-time on 0.1 or 0.01 [1].

You specified --chunk-size explicitly. This makes the tool ignore the --chunk-time, as stated by the documentation. Maybe we need to make --chunk-size and --chunk-time mutually exclusive so you can't be confused by --chunk-time being ignored. I believe that if you set --chunk-time to a small value and do NOT set the --chunk-size-limit multiplier to a large value, you will not see 800-second queries. I think you are defeating the tool's built-in safeguards.

Maarten van Baarsel

unread,

Apr 4, 2012, 11:54:05 AM4/4/12

to percona-d...@googlegroups.com

On 04-04-2012 15:22:18, Baron Schwartz wrote:

Baron,

>> I'd have expected the 48K rows table to have at least 4 or 5 chunks too,
>> based on the --chunk-size I gave.
>
> The tool estimates the size of the table with EXPLAIN and if it looks like
> it will fit into a single chunk that's not too large, based on
> --chunk-size-limit multiplied by --chunk-size, it will do it as a single
> chunk. You told the tool that 100000 rows in a single chunk is fine, so it
> obeyed you.

Thanks, so that works as I expected. The reason I'm trying both --chunk-size
and --chunk-size-limit though is because the automatic algorithm decided to
chunk the 22G table in one go.

And I was providing a larger-than-default chunk-size-limit because if I don't
tables get skipped because of the rowcount estimates on the slaves.

> It looks to me like the behavior might need to be modified slightly.
> Instead of doing tables in a single chunk, we might still want to nibble
> them even when they look smaller than a chunk is permitted to be. However,
> there will be another edge case that will immediately make performance
> decrease. This is when a server has hundreds of thousands of tables. The
> time taken to run the nibbling queries for all of those tables will be
> much higher than just checksumming whole tables at a time. And we do have
> users with hundreds of thousands or millions of tables.

Perhaps only nibble them if the estimated size is between chunk-size and
chunk-size * chunk-size-limit, and then in chunk-size sized nibblets?

>> A small test confirms my thoughts (50K/15G table takes 380+ secs to
>> checksum, whether I set --chunk-time on 0.1 or 0.01 [1].
>
> You specified --chunk-size explicitly. This makes the tool ignore the
> --chunk-time, as stated by the documentation. Maybe we need to make
> --chunk-size and --chunk-time mutually exclusive so you can't be confused
> by --chunk-time being ignored.

I've supplied the commandline I ran as a post scriptum, and I do not believe I
had a --chunk-limit in there (it did have a chunk-size-limit).

Ah, further testing reveals that setting chunk-size-limit indeed also defeats
chunk-time, if I remove that option tables are chunked.

I'd have to experiment to see what chunk-time would work for us, I see that
0.05 is a little too small.

> I believe that if you set --chunk-time to a small value and do NOT set the
> --chunk-size-limit multiplier to a large value, you will not see 800-second
> queries. I think you are defeating the tool's built-in safeguards.

But if I don't set chunk-size-limit tables get skipped from checksumming
because the estimate of the number of rows on the slave is more than twice the
estimate of the number of rows on the master (which isn't the case, the slaves
are mostly in sync). I do admit to trying to defeat that safeguard :)

(Haven't seen it skipping tables yet with the smaller chunk-time)

Anyway, I'll report back after a full run with only chunk-time set!

Maarten.

Baron Schwartz

unread,

Apr 4, 2012, 12:17:42 PM4/4/12

to percona-d...@googlegroups.com

Maarten,

On Apr 4, 2012, at 11:54 AM, Maarten van Baarsel wrote:

>>> A small test confirms my thoughts (50K/15G table takes 380+ secs to
>>> checksum, whether I set --chunk-time on 0.1 or 0.01 [1].
>>
>> You specified --chunk-size explicitly. This makes the tool ignore the
>> --chunk-time, as stated by the documentation. Maybe we need to make
>> --chunk-size and --chunk-time mutually exclusive so you can't be confused
>> by --chunk-time being ignored.
>
> I've supplied the commandline I ran as a post scriptum, and I do not believe I
> had a --chunk-limit in there (it did have a chunk-size-limit).
>
> Ah, further testing reveals that setting chunk-size-limit indeed also defeats
> chunk-time, if I remove that option tables are chunked.

That looks like a bug to me. I don't think that is intended. But I would have to review the tool's internals to be sure.

>
>> I believe that if you set --chunk-time to a small value and do NOT set the
>> --chunk-size-limit multiplier to a large value, you will not see 800-second
>> queries. I think you are defeating the tool's built-in safeguards.
>
> But if I don't set chunk-size-limit tables get skipped from checksumming
> because the estimate of the number of rows on the slave is more than twice the
> estimate of the number of rows on the master (which isn't the case, the slaves
> are mostly in sync). I do admit to trying to defeat that safeguard :)

It sounds like you are hitting a catch-22 too, though. I am listening -- I am not just telling you you're doing it wrong :-)

Please report back and then we can consider what's the best way to make the tool magically do the right thing. I do want it to be simple for the users, even if that means it's more complex internally, within reason.

Thanks,
Baron

Maarten van Baarsel

unread,

Apr 5, 2012, 5:07:24 AM4/5/12

to percona-d...@googlegroups.com

On 4/4/2012 18:17 , Baron Schwartz wrote:

Baron,

>> Ah, further testing reveals that setting chunk-size-limit indeed
>> also defeats chunk-time, if I remove that option tables are
>> chunked.
>
> That looks like a bug to me. I don't think that is intended. But I
> would have to review the tool's internals to be sure.

Should I file a bugreport for this?

> It sounds like you are hitting a catch-22 too, though. I am listening
> -- I am not just telling you you're doing it wrong :-)

No worries, I appreciate the time :)

> Please report back and then we can consider what's the best way to
> make the tool magically do the right thing. I do want it to be simple
> for the users, even if that means it's more complex internally,
> within reason.

The full run last night with --chunk-time 0.25 skipped only two tables
(out of 1000):

04-05T03:36:12 Skipping table cbr.blob because on the master it
would be checksummed in one chunk but on these replicas it has too many
rows:
22128 rows on erika.ii.nl
The current chunk size limit is 10254 rows (chunk size=5127 * chunk size
limit=2.0).

04-05T04-05T04:39:22 Skipping table wikidb.text because on the master it
would be checksummed in one chunk but on these replicas it has too many
rows:
17499 rows on piro.ii.nl
The current chunk size limit is 16582 rows (chunk size=8291 * chunk size
limit=2.0).

The overall behaviour seems a bit saner. It took a little over 2 hours
(which I think is amazing for over 110G of data), and the 22G table got
chunked into 3047 pieces now. Took 970 secs (a bit more than earlier),
but I think that is to be expected.

Having given praise, there still are a few problems:

This one took 240 secs for 1 chunk:

04-05T 0 0 40580 44 0 23.722 cpr.it_mail
04-05T 0 0 394 1 0 0.318 cpr.it_mailattachment
04-05T 0 0 382 1 0 0.378 cpr.it_mailbulk
04-05T 0 0 11319 1 0 0.334 cpr.it_mailimportlog
04-05T 0 0 35474 5 0 0.510 cpr.it_mailuserread
04-05T 0 0 701 1 0 0.322 cpr.it_queue
04-05T 0 0 12849 1 0 241.126 cpr.mg_attachment

(3.6GB table)

166 secs:

04-05T 0 0 23709 6 0 6.682 cpr.bedrijf
04-05T 0 0 23144 7 0 6.864 cpr.bedrijf_20120117
04-05T 0 0 13 1 0 0.308 cpr.dbv5_access
04-05T 0 0 2 1 0 0.371 cpr.f_auth
04-05T 0 0 3629 1 0 166.688 cpr.files

(2.1GB)

115:

04-05T 0 1 1 0 0.070 baer.hs_item
04-05T 0 0 0 1 0 0.363 baer.hs_itemfile
04-05T 0 0 522 1 0 0.348 baer.job_addresses
04-05T 0 0 5195 1 0 0.392 baer.job_contacts
04-05T 0 0 1904 1 0 115.391 baer.job_documentation

(1.6GB)

So there's some tuning yet left to do. If you want I can e-mail you the
full output offlist, I'm not comfortable archiving all my tablenames
for all eternity in the Google archives.

Perhaps check the database file size or average row size from
information_schema.tables for some new heuristic? A new option like
chunk-time that strives for a fixed number of bytes checksummed per
query instead of a fixed number of seconds per query?

thanks,
Maarten.

Baron Schwartz

unread,

Apr 5, 2012, 8:13:15 AM4/5/12

to percona-d...@googlegroups.com

Maarten,

Thanks for the added info. I'm in conference mode now and will have to defer further follow-up till afterwards.

- Baron

> --
> You received this message because you are subscribed to the Google Groups "Percona Discussion" group.
> To post to this group, send email to percona-d...@googlegroups.com.
> To unsubscribe from this group, send email to percona-discuss...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/percona-discussion?hl=en.
>

--
Chief Performance Architect at Percona <http://www.percona.com/>
+1 (888) 401-3401 x507
Calendar: <https://tungle.me/percona_baron>
Percona Live MySQL Conference & Expo <http://www.percona.com/live>

Mrten

unread,

Apr 16, 2012, 3:52:55 PM4/16/12

to percona-d...@googlegroups.com

On 5-4-2012 14:13, Baron Schwartz wrote:
> Maarten,
>
> Thanks for the added info. I'm in conference mode now and will have to defer further follow-up till afterwards.

I wanted to report that the original problem that induced this thread
(which was that pt-table-sync took ages to run) has been solved by the
smaller chunks on the tables that I had differences on.

thanks so far,
Maarten.

Reply all

Reply to author

Forward