Galera Cluster uptime

51 views
Skip to first unread message

Karl Erik Levik

unread,
Apr 25, 2018, 11:17:55 AM4/25/18
to codership

Is there a way to show the cluster's uptime?


Obviously, we can easily see the uptime for each individual node, but that's not the same as the uptime of the cluster.


And this is also not the same as the largest uptime among the nodes, since the cluster uptime could be longer.


(This question was originally asked on DBA stackexchange. )


Regards,

Karl


Brian :

unread,
Apr 25, 2018, 3:14:38 PM4/25/18
to codership
Hi Karl

Unless there is a galera variable for this ( don't think there is -->
https://mariadb.com/kb/en/library/galera-cluster-status-variables/ )
then the best way to measure this would be to measure availability of
the cluster via a monitoring tool (zabbix - nagios - insert favorite
here ). So if you are using a DB proxy to distribute load between
your galera members than measure uptime by monitoring SQL requests to
the proxy. Time since last fail =cluster uptime.
> --
> You received this message because you are subscribed to the Google Groups
> "codership" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to codership-tea...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

shinguz

unread,
Apr 25, 2018, 3:33:38 PM4/25/18
to codership
Hi Karl

I see a way but not an easy one: The status information which reflects a "cluster uptime" is wsrep_conf_id increasing from 1 to infinite over the lifetime of a Galera Cluster until next full cluster stop (followed by a bootstrap starting again with conf_id = 1).

So if you search for conf_id in your Cluster error logs (of all 3 nodes) you get the last conf_id = 1 and this is the "time since cluster is last bootstrapped aka cluster uptime":

shell> grep -e conf_id -e 'Quorum results' laptop4_error.log
2018-02-08T18:09:55.827674Z 0 [Note] WSREP: Quorum results:
        conf_id    = 1,
2018-02-08T18:09:57.400076Z 0 [Note] WSREP: Quorum results:
        conf_id    = 2,
2018-02-08T18:12:18.580247Z 0 [Note] WSREP: Quorum results:
        conf_id    = 3,
2018-02-08T18:12:36.130530Z 0 [Note] WSREP: Quorum results:
        conf_id    = 4,
...
2018-02-14T07:09:30.083750Z 0 [Note] WSREP: Quorum results:
        conf_id    = 18,
2018-02-16T07:27:40.561137Z 0 [Note] WSREP: Quorum results:
        conf_id    = 19,

If you want to institutionalize this a bit more you should write a watchdog catching conf_id = 1 + timestamp and then you have the "cluster uptime" as well.

Regards,
Oli

Karl Erik Levik

unread,
Apr 27, 2018, 4:07:40 PM4/27/18
to codership
Thank you, Oli and Brian!

Much appreciated.

Kind regards,
Karl


Reply all
Reply to author
Forward
0 new messages