Monitoring wsrep_local_recv_queue_avg to detect a slow node

341 views
Skip to first unread message

beres...@gmail.com

unread,
Feb 2, 2018, 4:53:24 AM2/2/18
to codership
According to the documentation ( http://galeracluster.com/documentation-webpages/detectingaslownode.html ), there are two variables that can be used to detect a slow node: wsrep_local_recv_queue_avg and wsrep_flow_control_sent. Both of these variables show the values since the last FLUSH STATUS command ( http://galeracluster.com/documentation-webpages/galerastatusvariables.html?highlight=flush%20status#wsrep-local-recv-queue-avg ).

Here is the question: usually servers run for months without flushing status - maybe after maintenance involving a reboot. Shouldn't we call FLUSH STATUS after checking these variables to get more recent values on the next run? It makes sense to flush after detecting flow control messages and fixing the problem to reset it to zero, but what about average of local receive queue? Since server is running for a long time, wouldn't the average value be low even when something happens?

alexey.y...@galeracluster.com

unread,
Feb 2, 2018, 12:29:20 PM2/2/18
to beres...@gmail.com, codership
Indeed, the average values are calculated over the whole time since the
last FLUSH STATUS, so if you want a better temporal resolution - you
have to create it )

Reply all
Reply to author
Forward
0 new messages