Help in understanding gcache

164 views
Skip to first unread message

PATRICKZ...@comcast.net

unread,
Jan 31, 2012, 10:49:34 AM1/31/12
to codersh...@googlegroups.com
Hi,

I have a 3-node Percona Xtradb Cluster with wsrep 22.1 (5.5.17-22.1-log) and have a question about what I *think* might be gcache usage.

I have a single table with about 10 million rows with a total ibd file size of 650MB.  I run a DELETE limited to 6 million rows (yes - not very real-world practical but I'm testing and educating myself on Galera behavior) - it takes just under a minute for the delete to complete - no issues as of yet.

I then log in to node2 and see the following:
|  7 | system user     |           | NULL | Sleep   |  618 | Delete_rows_log_event::find_row(1791322) | NULL             |         0 |             0 |         1 |

Node3 - same-ish result:
|  3 | system user     |           | NULL    | Sleep   | 626 | Delete_rows_log_event::find_row(1791322) | NULL             |         0 |             0 |         1 |

On my first pass through this scenario - all my gcache settings were the default (128MB for gcache.size and gcache.page size) and I saw a gcache.page.000000 file created on nodes 2 & 3 - size = 396MB.  I let the delete_rows_log_event run over night (~12 hours) and when I logged on this morning, it was still running.  Killed the cluster, set it up again but this time with a gcache.size of 2GB.

Re-ran my huge delete test - took about 55 seconds again on node1 - but the delete_rows_log_event is still running after 20 minutes.  I didn't clean up the gcache.page.000000 file (should I have?) and it's still the same size and I'm still seeing the same behavior.

Nothing in any of the error logs for the three nodes since the cluster restart.

Anyone have any insight as to what is happening here?  Are there other parameters to tweak in order to get this delete accomplished on the other two nodes?

Thanks in advance
Patrick

Alex Yurchenko

unread,
Jan 31, 2012, 11:43:15 AM1/31/12
to codersh...@googlegroups.com
Hi Patrick,

This requires some thinking, but I can tell you right away that this is
not related to gcache in any way.

Actually, it seems that delete event is already being applied by
mysqld, so it is past Galera. I suppose that the problem here is in
processing ROW-based replication events by mysqld/InnoDB.

Perhaps if you could try the same test, but set binlog_format=STATEMENT
before executing DELETE? STATEMENT-level replication is not fully
supported yet (and LIMIT may be not deterministic in this case), but at
least it could narrow down the search. If binlog_format=STATEMENT makes
a difference, I'd suggest that running 600 deletes of 10000 rows in
autocommit mode could be much faster.

As for the gcache settings, you may want to set gcache.size parameter
to something like the size of your database - this will be used for
incremental state transfer.

Regards,
Alex

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

PATRICKZ...@comcast.net

unread,
Jan 31, 2012, 3:09:18 PM1/31/12
to Alex Yurchenko, codersh...@googlegroups.com
Thanks for the explanation Alex - RBR makes sense as to why I'm seeing what I'm seeing.  Would obviously discourage any end user from executing such large deletes ;-)

Patrick


From: "Alex Yurchenko" <alexey.y...@codership.com>
To: codersh...@googlegroups.com
Sent: Tuesday, January 31, 2012 11:43:15 AM
Subject: Re: [codership-team] Help in understanding gcache
--
You received this message because you are subscribed to the Google Groups "codership" group.
To post to this group, send email to codersh...@googlegroups.com.
To unsubscribe from this group, send email to codership-tea...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/codership-team?hl=en.

Reply all
Reply to author
Forward
0 new messages