Prometheus plugin+Grafana and negative rate for outgoing (ack:ed) messages

41 views
Skip to first unread message

thoma...@gmail.com

unread,
Apr 4, 2023, 8:39:02 AM4/4/23
to rabbitmq-users
Hello RabbitMQ users,

I am scratching my head trying to understand Prometheus data from one of our RabbitMQ production servers.
Plotting the the "rabbitmq_channel_messages_delivered_ack_total" metric from the Prometheus plugin gives the following graph:
mb_outgoingrate_grafana.png
That would be something like 100M messages/s and in our environment this is not a realistic number.

If I compare with the RabbitMQ Management data I have this:
mb_outgoingrate_GUI.png
This is realistic figures. Around 10K/s in the peaks.

To understand if this is the metrics from the plugin or Grafana messing up I wrote a small python script that just reads the "rabbitmq_channel_messages_delivered_ack_total" metric every 20s (script is attached).
This is a snippet of the output:
14:02:43
HTTP Status Code: 200, request duration: 0.40266
Messages_acked_total: 4789324422, Change: 139130, Rate: 6956.5
14:03:03
HTTP Status Code: 200, request duration: 3.413231
Messages_acked_total: 4789466153, Change: 141731, Rate: 7086.55
14:03:26
HTTP Status Code: 200, request duration: 0.510739
Messages_acked_total: 4789305173, Change: -160980, Rate: -8049.0
14:03:47
HTTP Status Code: 200, request duration: 0.551016
Messages_acked_total: 4789490706, Change: 185533, Rate: 9276.65
14:04:08
HTTP Status Code: 200, request duration: 0.547189
Messages_acked_total: 4789686584, Change: 195878, Rate: 9793.9
14:04:28
HTTP Status Code: 200, request duration: 0.335923
Messages_acked_total: 4789816045, Change: 129461, Rate: 6473.05
14:04:48
HTTP Status Code: 200, request duration: 0.453783
Messages_acked_total: 4789942095, Change: 126050, Rate: 6302.5
14:05:09
HTTP Status Code: 200, request duration: 0.586313
Messages_acked_total: 4790040922, Change: 98827, Rate: 4941.35
14:05:30
HTTP Status Code: 200, request duration: 0.379032
Messages_acked_total: 4790151538, Change: 110616, Rate: 5530.8
14:05:50
HTTP Status Code: 200, request duration: 0.357471
Messages_acked_total: 4790090501, Change: -61037, Rate: -3051.85
14:06:10
HTTP Status Code: 200, request duration: 0.399605
Messages_acked_total: 4790193293, Change: 102792, Rate: 5139.6

As it turns out, the "spikes" in the Grafana plot matches these negative rate figures that I get sometimes.

How come I get a negative rate? Any ideas how to workaround this?
Server is a standalone RabbitMQ running 3.10.5 and Erlang 24.3.4.3.

Best Regards,

Thomas

getMBprometheusdata_cleaned.py
Reply all
Reply to author
Forward
0 new messages