Monitoring on wsrep_cert_deps_distance

26 views
Skip to first unread message

Wayne Gemmell

unread,
Oct 17, 2024, 5:46:00 AM10/17/24
to codersh...@googlegroups.com
Hi All

How can I use wsrep_cert_deps_distance for monitoring? My send and receive queues are generally at zero and wsrep_cert_deps_distance hovers around 50. Is this healthy? Should I just monitor on the value increasing significantly? I'm not sure what a significant increase but 10 above baseline over 10 minutes or similar?

Regards
Wayne Gemmell

cyusedfzfb

unread,
Oct 17, 2024, 6:46:14 AM10/17/24
to codersh...@googlegroups.com

Hi,

Just out of curiosity: what are you monitoring for, in your/our galera cluster?

What we are monitoring:

* the number of GRA* files existant in the mysql datadir (should generally NOT increase)
* the number of flow controls sent by each node (should not be high and more or less equal between nodes)
* the local recv queue size (should not be high and more or less equal between nodes)
* the cert.index.size on each node (should not be high and more or less equal between nodes)
* the uuid state of the custer as a whole (should almost never change)
* and the cluster size (should not change for longer periods of time)

Anything we miss..? Should we include your mentioned wsrep_cert_deps_distance?

Sorry for jumping in in your thread, but it seemed related enough to be ok...

MJ

--
You received this message because you are subscribed to the Google Groups "codership" group.
To unsubscribe from this group and stop receiving emails from it, send an email to codership-tea...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/codership-team/CAOM7W_1vpJ%3D%3Dzm6iaE_%2BDtHUOO1UtRV5iov8tsyT3dOxrFGM5w%40mail.gmail.com.

Wayne Gemmell

unread,
Oct 17, 2024, 7:37:26 AM10/17/24
to cyusedfzfb, codersh...@googlegroups.com
Hi

I'm trying to figure that out. We check
- wsrep_cert_deps_distance (not sure why yet)
- length of send and receive queues.
- replication latency.
- wsrep_apply_window
- wsrep_local_replays
- uuid (must not change)
- protocol (must not change)
- cluster size (must not change)
I don't have notifications on everything because I'm still getting baselines and deciding what is important and what isn't.
Thanks for your list though. Some interesting things to add to my monitoring . It will be nice if anyone reading this can add anything we've missed.

Regards
Wayne

Alexey Yurchenko

unread,
Oct 17, 2024, 10:07:40 AM10/17/24
to codership
Hi,

wsrep_cert_deps_distance tells how many writesets can be applied concurrently (i.e. touch independent rows) ON AVERAGE. So this is a metric that tells something about your load and how many appliers one may want to configure but is hardly worth to monitor constantly. For one, it is an AVERAGE measure so you won't see acute changes there.

Kind regards,
Alex

Wayne Gemmell

unread,
Oct 17, 2024, 10:18:08 AM10/17/24
to Alexey Yurchenko, codership
Thank you. That's great.

Reply all
Reply to author
Forward
0 new messages