Monitoring Galera

567 views
Skip to first unread message

Danil Kazachkov

unread,
Jun 24, 2014, 6:01:43 AM6/24/14
to codersh...@googlegroups.com

Hello!
Could you share the experience of monitoring Galera Cluster? Is it maybe some existing programs or plugins? I need to monitor:
1) List all queries, currently running.
2) List all replication processes currently running (SST and IST).
3) Monitor resource allocation among queries and replication processes.
4) Monitor network connections and relationship to running queries.

Thanks in advance!

Graham Green

unread,
Jun 24, 2014, 12:41:22 PM6/24/14
to codersh...@googlegroups.com
SeveralNines product ClusterControl is pretty good if you're looking for something rolled up in a neat package. There is a free community version (lacks complete functionality) and a paid product. I've been using the community edition for a while and it has become a pretty robust solution over the last 6 months

FromDual has written plugins for Nagios, I have no experience with this.

Percona has monitoring tools which can be integrated in Nagios and Cacti, again I have no experience with these but based on my experience with other Percona products they should be pretty robust.

I hope this helps.

Daniel Black

unread,
Jun 24, 2014, 6:14:58 PM6/24/14
to Danil Kazachkov, codersh...@googlegroups.com

On general graphing I've got a munin mysql plugin update https://github.com/munin-monitoring/munin/pull/164 in progress.

It certainly does primary/non-primary and cluster size alerts/graphs, flow control, though currently needs https://github.com/codership/galera/pull/57 and https://github.com/codership/galera/pull/50 to be merged to get more meaningful graphs (rather than gauges that average over the uptime) and the max/min values add measurement of brief volatile bulk changes that occur between status probes.

Improvements welcome.

----- Original Message -----
> Hello!
> Could you share the experience of monitoring Galera Cluster? Is it
> maybe some existing programs or plugins? I need to monitor:
> 1) List all queries, currently running.

Given binary row replication a query isn't generally available in galera. In fact I've been having trouble seeing anything meaningful (https://mariadb.atlassian.net/browse/MDEV-6327).

> 2) List all replication processes currently running (SST and IST).

Watching Galera node state will give you an idea as to what SST are in place and when. I haven't seen how to monitor IST.

> 3) Monitor resource allocation among queries and replication
> processes.

Index size, wsrep_cert_deps_distance and flow control graphs will give you an idea of how the each node is keeping up. wsrep_flow_paused/wsrep_flow_paused_ns graphs show when pausing occurs.

On resource allocation among queries this happens mainly in the innodb engine. Munin has graphs for general innodb status. See my previous post to the list about galera memory usage and Alexey's explanation.

> 4) Monitor network connections and relationship to running queries.

wsrep_replicated_bytes/wsrep_received_bytes show network traffic within cluster. Other munin plugins can monitor general or specific network traffic. I've no idea how to map this to running queries or how it would be useful.

--
Daniel Black, Engineer @ Open Query (http://openquery.com.au)
Remote expertise & maintenance for MySQL/MariaDB server environments.

Guillaume Coré

unread,
Jun 25, 2014, 5:22:21 AM6/25/14
to codersh...@googlegroups.com
I wrote a naive (but simple and functional) Nagios plugin :

https://github.com/fridim/nagios-plugin-check_galera_cluster


This is what we use here. I based my script on the documentation page « Monitoring the Cluster ».

It's very basic but can be easily improved if you need additional checks.

I hope this helps.

Guillaume
--
You received this message because you are subscribed to the Google Groups "codership" group.
To unsubscribe from this group and stop receiving emails from it, send an email to codership-tea...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

signature.asc

ad...@extremeshok.com

unread,
Jun 25, 2014, 5:29:56 AM6/25/14
to Guillaume Coré, codersh...@googlegroups.com
could you guys please post some sample screenshots...

would really like to see how it compares.

i have written custom scripts for monit and m/monit that does all the monitoring and tracking of my clusters,

Sent from my iPad

Danil Kazachkov

unread,
Jun 25, 2014, 10:10:43 AM6/25/14
to codersh...@googlegroups.com, lemd...@gmail.com, daniel...@openquery.com
 
Thanks for the answer! I have a few questions:

1) > Given binary row replication a query isn't generally available in galera. In fact I've been having trouble seeing anything meaningful (https://mariadb.atlassian.net/browse/MDEV-6327). 
Does the above mean that in fact we can view queries only with the help of the "SHOW FULL PROCESSLIST" command?

2) > Watching Galera node state will give you an idea as to what SST are in place and when. I haven't seen how to monitor IST. 

Doesn't Galera really offer any means at all to monitor the state of IST? Otherwise, how does a Galera node copy data from gcache?

3) > wsrep_replicated_bytes/wsrep_received_bytes show network traffic within cluster. Other munin plugins can monitor general or specific network traffic. I've no idea how to map this to running queries or how it would be useful. 

Re: how it would be useful:

Let's look at the following use case:

1. Queries take too much time to be executed. 
2. We look at which process at the server is causing the low performance (for example, it's MySQL using 100% of the server's processor power).
3. We look inside MySQL to see which query is taking so much processor power.

In this case mapping server's resources to running queries can definitely be useful.

Daniel Black

unread,
Jun 25, 2014, 10:48:00 PM6/25/14
to Danil Kazachkov, codersh...@googlegroups.com


----- Original Message -----
> Thanks for the answer! I have a few questions:
>
>
> 1) > Given binary row replication a query isn't generally available in
> galera. In fact I've been having trouble seeing anything meaningful (
> https://mariadb.atlassian.net/browse/MDEV-6327 ).
> Does the above mean that in fact we can view queries only with the
> help of the "SHOW FULL PROCESSLIST" command?

no. SHOW FULL PROCESSLIST is equally as blank.

> 2) > Watching Galera node state will give you an idea as to what SST
> are in place and when. I haven't seen how to monitor IST.
>
> Doesn't Galera really offer any means at all to monitor the state of
> IST? Otherwise, how does a Galera node copy data from gcache?

see attached mysql_wsrep_local_state-day.png. It shows a node offline, and then joining for a bit before becoming joined.

Its donor shows a corresponding drop to state 2.

IST is a really fast part of a SST recover so I wouldn't bother.

Are you perhaps confusing IST with replication/certification/application delay?

> 3) > wsrep_replicated_bytes/wsrep_received_bytes show network traffic
> within cluster. Other munin plugins can monitor general or specific
> network traffic. I've no idea how to map this to running queries or
> how it would be useful.
>
>
> Re: how it would be useful:
>
>
> Let's look at the following use case:
>
>
> 1. Queries take too much time to be executed.
> 2. We look at which process at the server is causing the low
> performance (for example, it's MySQL using 100% of the server's
> processor power).
> 3. We look inside MySQL to see which query is taking so much processor
> power.
>
> In this case mapping server's resources to running queries can
> definitely be useful.

use the slow query log.
mysql_wsrep_local_state-day.png

Danil Kazachkov

unread,
Jun 26, 2014, 3:12:05 AM6/26/14
to codersh...@googlegroups.com, lemd...@gmail.com, daniel...@openquery.com

----- Original Message -----
> Thanks for the answer! I have a few questions:

>
> 2) > Watching Galera node state will give you an idea as to what SST
> are in place and when. I haven't seen how to monitor IST.
>
> Doesn't Galera really offer any means at all to monitor the state of
> IST? Otherwise, how does a Galera node copy data from gcache?

>see attached mysql_wsrep_local_state-day.png. It shows a node offline, and then joining for a bit before becoming joined.

>Its donor shows a corresponding drop to state 2.

>IST is a really fast part of a SST recover so I wouldn't bother.

>Are you perhaps confusing IST with replication/certification/application delay?

Is not IST a kind of replication? And how do you monitor delays of replication and certification?    

Daniel Black

unread,
Jun 26, 2014, 3:38:43 AM6/26/14
to Danil Kazachkov, codersh...@googlegroups.com

> >Are you perhaps confusing IST with
> >replication/certification/application delay?
>
> Is not IST a kind of replication? And how do you monitor delays of
> replication and certification?

as attached.


--
mysql2_wsrep_distance-day.png
mysql_wsrep_queue-day.png

Danil Kazachkov

unread,
Jun 26, 2014, 4:43:31 AM6/26/14
to codersh...@googlegroups.com, lemd...@gmail.com, daniel...@openquery.com
Thanks for your help!
Are you telling me where I can find more information about Galera variables? I have looked at http://www.percona.com/doc/percona-xtradb-cluster/5.6/wsrep-status-index.html#wsrep_cluster_size, but there were too little info about variables.

Daniel Black

unread,
Jun 26, 2014, 5:09:09 PM6/26/14
to Danil Kazachkov, codersh...@googlegroups.com


----- Original Message -----
> Thanks for your help!
> Are you telling me where I can find more information about Galera
> variables?

I was giving you some inferences you can make by graphing variables.
> , but there were too little info about variables.

http://galeracluster.com/documentation-webpages/galerastatusvariables.html or the more up to date https://github.com/codership/galera/blob/master/docs/source/galerastatusvariables.rst

If there isn't sufficient information here look at the code and/or ask the list (and not me explicitly).
Reply all
Reply to author
Forward
0 new messages