Galera cluster slow when gcache page is created

Erik

unread,

Feb 5, 2013, 11:20:48 AM2/5/13

to codersh...@googlegroups.com

Hi,

We run a 3 node Galera cluster. Almost all MySQL traffic is routed to 1 of the nodes (for the time being).

Occassionally we see some serious dropbacks in MySQL performance from clients.

Examining the logs tells us these moments happen when gcache pages are created.

eg.

node1:

130205 16:29:53 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000000 of size 134217728 bytes

130205 16:30:14 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000000

node2:

130205 16:29:53 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000001 of size 134217728 bytes

130205 16:29:59 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000001

node3:

130205 16:29:53 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000001 of size 134217728 bytes

130205 16:30:00 [Note] WSREP: Deleted page /var/lib/mysql/gcache.page.000001

(node1 receives all the traffic)

Why are those gcache.page files created ? Can it be avoided or otherwise timed on other moments ?

some config which could be relevant:

innodb-buffer-pool-size=10964M

innodb-flush-log-at-trx_commit=2

innodb-file-per-table=1

innodb_log_file_size=256M

innodb-log-files-in-group=3

innodb-thread-concurrency=0

innodb_autoinc_lock_mode=2

binlog_format=ROW

key_buffer_size = 256M

max-allowed-packet = 128M

sort-buffer-size = 512K

read-buffer-size = 256K

read-rnd-buffer-size = 512K

query-cache-type = 0

query-cache-size = 0

table-open_cache=1024

wsrep_slave_threads=1

server mem=16GB

Any ideas/comments ? thx !

regards,

Erik

Ilias Bertsimas

unread,

Feb 5, 2013, 11:46:15 AM2/5/13

to codersh...@googlegroups.com

Hi Erik,

I have not seen that message in galera for a while since an early release. Which version are you running ?

Can you provide the output of SHOW VARIABLES LIKE 'wsrep_%'; from mysql cli from one of the nodes ?

Also can you give us the wsrep options you use ?

I can see from your config that you use 1 slave thread which means there is no parallel apply of replication events on the slaves.

I may be wrong but I think those pages are created as a buffer to store writesets as due to the serial nature of the replication the slave thread cannot apply the writesets fast enough.

Kind Regards,

Ilias.

Erik

unread,

Feb 5, 2013, 11:57:23 AM2/5/13

to codersh...@googlegroups.com

+----------------------------+--------------------------------------+

| Variable_name | Value |

+----------------------------+--------------------------------------+

| wsrep_local_state_uuid | 9ba720b3-8d53-11e1-0800-52407463bac7 |

| wsrep_protocol_version | 3 |

| wsrep_last_committed | 49432608 |

| wsrep_replicated | 6971 |

| wsrep_replicated_bytes | 77420952 |

| wsrep_received | 2300 |

| wsrep_received_bytes | 94702372 |

| wsrep_local_commits | 6971 |

| wsrep_local_cert_failures | 0 |

| wsrep_local_bf_aborts | 0 |

| wsrep_local_replays | 0 |

| wsrep_local_send_queue | 0 |

| wsrep_local_send_queue_avg | 0.038007 |

| wsrep_local_recv_queue | 0 |

| wsrep_local_recv_queue_avg | 0.936805 |

| wsrep_flow_control_paused | 0.002711 |

| wsrep_flow_control_sent | 2 |

| wsrep_flow_control_recv | 10 |

| wsrep_cert_deps_distance | 169.722689 |

| wsrep_apply_oooe | 0.010210 |

| wsrep_apply_oool | 0.001032 |

| wsrep_apply_window | 1.012734 |

| wsrep_commit_oooe | 0.000000 |

| wsrep_commit_oool | 0.000000 |

| wsrep_commit_window | 1.004015 |

| wsrep_local_state | 4 |

| wsrep_local_state_comment | Synced (6) |

| wsrep_cert_index_size | 442 |

| wsrep_cluster_conf_id | 49 |

| wsrep_cluster_size | 3 |

| wsrep_cluster_state_uuid | 9ba720b3-8d53-11e1-0800-52407463bac7 |

| wsrep_cluster_status | Primary |

| wsrep_connected | ON |

| wsrep_local_index | 2 |

| wsrep_provider_name | Galera |

| wsrep_provider_vendor | Codership Oy <in...@codership.com> |

| wsrep_provider_version | 2.1dev(r109) |

| wsrep_ready | ON |

+----------------------------+--------------------------------------+

Percona-XtraDB-Cluster-server-5.5.23-23.5.333.rhel6.x86_64

Percona-XtraDB-Cluster-galera-2.0-1.109.rhel6.x86_64

Ilias Bertsimas

unread,

Feb 5, 2013, 12:25:57 PM2/5/13

to codersh...@googlegroups.com

Hi Erik,

It seems you have quite an old version of PXC. Your version was release on May 14, 2012. I would suggest to upgrade to the latest as there are a lot of bug fixes and performance improvement implemented since then.

Is there any reason you have 1 slave thread only ?

I can see from the wsrep variables you provided that you have wsrep_flow_control_paused which means the replication is paused as the nodes can't keep up with the workload.

When the replication is paused you can easily see degraded performance as queries processing is stalled while it happens.

Also it seems you use more than one servers for writes as I can see incoming traffic on that node.

I would advise you to upgrade you PXC to the latest version if possible and also increase the slave threads if there is no specific reason to keep the at 1. That will probably help the other nodes keep up with the replication. You can try 1 thread per cpu core but as

it is mentioned on the galera wiki 4 threads can fully saturate a core. So you can go higher if you want.

Kind Regards,

Ilias.

Teemu Ollakka

unread,

Feb 5, 2013, 12:36:25 PM2/5/13

to codersh...@googlegroups.com

Hi,

One possible explanation for gcache page creation is bulk inserts/deletes for which write set size grows so high that it does not fit into gcache. To avoid gcache page creation, increase gcache size by setting gcache.size in wsrep_provider_options in config file. For reference see:

http://www.codership.com/wiki/doku.php?id=galera_parameters

- Teemu

Alex Yurchenko

unread,

Feb 6, 2013, 7:05:21 AM2/6/13

to codersh...@googlegroups.com

Both Teemu and Ilias seem to be right here on all accounts, so I'll
just try to wrap it up:

You seem to be running with default gcache.size of 128M which allows
allocation of no more than 64M for a writeset (it is a ring buffer).
Anything bigger will create a page.

You seem to be occasionally producing some heavy transactions which
modify a lot of rows. 1M rows DELETE on a non-trivial table will create
a writeset bigger than 64M, and so a new page for it will be created.
But it is not a cause of slowdown given how much faster it is compared
to precessing of all that data.

You have single-threaded slaves so they get to process one writeset at
a time, so when they hit this huge writeset, master has to wait for
them. This causes a slowdown.

Unfortunately increasing the number of slave threads won't help you
much here, because commit must happen in order and so those additional
slave threads will eventually block waiting for the that huge writeset
to commit.

The real solution is to change your application or habits and not
produce such huge writesets, e.g. see
http://www.xaprb.com/blog/2013/01/28/deleting-millions-of-rows-in-small-chunks-with-common_schema/

And upgrading to the latests release will make your slaves faster at
processing such writesets by about 30%.

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

Reply all

Reply to author

Forward