How to interpret the RabbitMQ Message stats?

2,039 views
Skip to first unread message

Olivier Pessin

unread,
Aug 14, 2018, 4:10:31 AM8/14/18
to rabbitmq-users
Hello,

(NB: I posted the same question on StackOverflow a few days ago.. no answer yet.)

I to want get and historize queue metrics for the "Enqueued, Dequeued an Size" (Terminology formerly met on ActiveMQ). The moving charts provided in the management plugin are not enough for the monitoring that I need to do.


So with RabbitMQ, I'm getting data from https://rabbitmq-server:15672/api/queues/myvhost


This returns json.. for a queue, I can obtain real life production data like :

"messages":0,                    // for "Size"
"message_stats":{
         "deliver_get":171528,   // for "Dequeued"
         "ack":162348,
         "redeliver":9513,
         "deliver_no_ack":0,
         "deliver":171528,
         "get":0,
         "publish":51293         // for "Enqueued"
(...)


I'm in particular surprised by the publish counter:

  • Its value can even decrease between 2 measures done with a couple of minutes of delay ! (see sample chart around 17:00)
  • As you can see on my data, the deliver_get is significantly larger than the publish .

https://rabbitmq-server:15672/doc/stats.html doesn't give a lot of details that could explain what I actually notice. Also, under the message_stats object that I obtain, I'm missing the some counters like confirm and return which could be related to the enqueuing.


Are there relationships between these metrics ? (like deliver_get + messages = redeliver + publish.. but that one doesn't work with my figures)


Is there another more detailed documentation about these metrics ?


enter image description here



Any clue would be appreciated.


Regards,

Olivier

Michael Klishin

unread,
Aug 14, 2018, 2:20:06 PM8/14/18
to rabbitm...@googlegroups.com
There's more than one way to consume messages, therefore there's more than one metrics: deliver_get is for basic.get
("pull" [1]), deliver is for deliveries to a consumer ("push" [3]) with a manual acknowledgement mode, deliver_no_ack is the same
thing but with automatic acknowledgements. I'd need to take a look at the code to see what "publish" is.

Are you sure you are taking the fact that those metrics are computed from a sliding window of samples into account? They are primarily meant
to compute rates. Monitoring cumulative values would have to happen with the sliding window duration into account and can be proven to be
less accurate than you expect.

Most monitoring systems around messaging chart rates and gauges, less often cumulative values.

"confirm" and "return" are rates of publisher confirms and returns (which clarifies that you are looking at channel stats, something that wasn't explicitly stated).
If you don't use publisher confirms [2] and don't have unroutable messages published with the mandatory flag set to true, they won't be present/cannot be computed.


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Guillermo Cola

unread,
Jan 13, 2020, 12:45:21 PM1/13/20
to rabbitmq-users
Hello!

I'm dealing with exactly the same issue. I'm not able to deduct the relationship between these metrics. Did you find a solution / more documentation for this?

Thanks,

Guillermo

Michael Klishin

unread,
Jan 14, 2020, 4:54:25 PM1/14/20
to rabbitmq-users
There are two metrics for consumption rates (we really discourage the use of basic.get, by the way) and one for publishing.

If all messages published are routed to a single queue, total number of published messages will be equal to Ready + Unacknowledged + Consumed.
The math is greatly complicated by the fact that things are entirely asynchronous and even with just one publisher, one consumer and one queue
the numbers at a specific point in time won't add up to zero (but on a long enough period they will give a good enough approximation).

What is your end goal, Guillermo? What metrics are you considering to use to get there and why?


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/aa82858e-533b-4892-a4ee-06e70d03bef5%40googlegroups.com.

Michael Klishin

unread,
Jan 14, 2020, 7:26:32 PM1/14/20
to rabbitmq-users
…and there are two metrics for consumption rates, of course, because there are two ways to consume from a RabbitMQ queue
using AMQP 0-9-1: a subscription [1] or by fetching on demand (polling, highly discouraged). And since there is only one
way of publishing, there is only one publishing metric for, say, an exchange's publishing (ingress) rate.

I also did not mention that a message can be routed to N queues. Such a message would be counted N times in consumer delivery rates
since every message is a completely independent copy.

Guillermo Cola

unread,
Jan 15, 2020, 9:09:51 AM1/15/20
to rabbitmq-users
Hello Michael,

Thanks for your answer.

First of all, I want to be sure about this, because we are assuming it from the definition of each metric, but maybe we are wrong: are published messages the amount of messages that entered the queue and delivered messages the amount of messages that left the queue?

Our end goal is to know programmatically (not by checking the UI) if the delivering (leaving the queue) rate goes up or not after some changes we are doing on our system. We don't want to work with the rates provided by RabbitMQ, we prefer logging the raw total values of messages entering and leaving the queue for us to make the analysis later.

So, we tried to do this, but we found out that the difference between Message Count in two different moments didn't add up to zero when we analyzed Published - Delivered in both moments (with no Redelivered messages in that period of time, we checked it), so we were wondering if any other metric played in this equation, because it seemed we were "losing track" of some messages.

Thanks,

Guillermo
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

Michael Klishin

unread,
Jan 15, 2020, 12:57:37 PM1/15/20
to rabbitmq-users
On Wed, Jan 15, 2020 at 5:10 PM Guillermo Cola <gc...@sessionm.com> wrote:
First of all, I want to be sure about this, because we are assuming it from the definition of each metric, but maybe we are wrong: are published messages the amount of messages that entered the queue and delivered messages the amount of messages that left the queue?

Correct. They are usually called ingress (inbound) and egress (outgoing) rates in some communities.
 

Our end goal is to know programmatically (not by checking the UI) if the delivering (leaving the queue) rate goes up or not after some changes we are doing on our system. We don't want to work with the rates provided by RabbitMQ, we prefer logging the raw total values of messages entering and leaving the queue for us to make the analysis later.

And why is that? Have you considered using Prometheus and Grafana? I don't see a lot of reasons to take in
raw stats and do your own aggregation and so on but if you do, at least use the Prometheus format. The plugin will
change the output format to be more efficient in the next patch release but if you use our Grafana dashboards then
you won't have to do any work to upgrade [1].
 

So, we tried to do this, but we found out that the difference between Message Count in two different moments didn't add up to zero when we analyzed Published - Delivered in both moments (with no Redelivered messages in that period of time, we checked it), so we were wondering if any other metric played in this equation, because it seemed we were "losing track" of some messages.


The do not add up because distributed and messaging systems specifically are inherently concurrent and volatile. As you are trying to measure
one thing, other things happen, and they won't coordinate with your measurements (and neither you want them to, as it will ruin throughput by several orders of magnitude).

I really don't see the need to reinvent any wheels here. If you must, base it on the Prometheus plugin.
It's the closest you can get to working with internal stats tables directly.

Reply all
Reply to author
Forward
0 new messages