Monitor RabbitMQ queues and exchanges

377 views
Skip to first unread message

Sarah Johan

unread,
Sep 18, 2018, 2:13:54 AM9/18/18
to rabbitmq-users
I am interested in using Splunk to monitor queue depths and message timings on a RabbitMQ install. I've found the AMQP modular input plugin, but it seems that this plugin actually pulls messages from the queues, which I don't want to do. The rabbit management UI has some ootb graph widgets which would be useful, but I'd like to provide a single dashboard for monitoring this data in Splunk Tutorial.

What's the best way to approach this scenario? Is this plugin correct? Do I need to observe the Rabbit logs (they are pretty verbose and would consume a fair chunk of index volume)

Michael Klishin

unread,
Sep 18, 2018, 3:37:56 AM9/18/18
to rabbitm...@googlegroups.com
You need a Splunk version of [1][2]. As far as I can tell after a quick search
[3] seems to be a commonly recommended option.


On Tue, Sep 18, 2018 at 6:13 AM, Sarah Johan <sarahjo...@gmail.com> wrote:
I am interested in using Splunk to monitor queue depths and message timings on a RabbitMQ install. I've found the AMQP modular input plugin, but it seems that this plugin actually pulls messages from the queues, which I don't want to do. The rabbit management UI has some ootb graph widgets which would be useful, but I'd like to provide a single dashboard for monitoring this data in Splunk Tutorial.

What's the best way to approach this scenario? Is this plugin correct? Do I need to observe the Rabbit logs (they are pretty verbose and would consume a fair chunk of index volume)

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Alceu Rodrigues de Freitas Junior

unread,
Sep 25, 2018, 1:05:25 PM9/25/18
to rabbitm...@googlegroups.com
Hello Sarah,

Not entirely related to the subject, but I think it is worth the thought.

Why monitor the queue depth? What you're trying to achieve over there?

If is all about identifying if there is a problem with the producer and/or consumer, it would make more sense to monitor the rate of input/output of the queue than monitoring the queue depth?

Ideally, you should have the same rate. But that doesn't mean that in the case the producer rate is higher than the consumer (but not enough to bring the RabbitMQ to resource exhaustion) there is really a problem, specially if it is for a short period of time.

In the other hand, what does it mean the queue depth? What should be the "best" value to monitor for?

If you have messages that should be picked up from the queue during a time interval, probably you should be setting TTL and dead-letter queues.

On Tue, Sep 18, 2018 at 3:13 AM Sarah Johan <sarahjo...@gmail.com> wrote:
I am interested in using Splunk to monitor queue depths and message timings on a RabbitMQ install. I've found the AMQP modular input plugin, but it seems that this plugin actually pulls messages from the queues, which I don't want to do. The rabbit management UI has some ootb graph widgets which would be useful, but I'd like to provide a single dashboard for monitoring this data in Splunk Tutorial.

What's the best way to approach this scenario? Is this plugin correct? Do I need to observe the Rabbit logs (they are pretty verbose and would consume a fair chunk of index volume)

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

Michael Klishin

unread,
Sep 25, 2018, 1:08:32 PM9/25/18
to rabbitm...@googlegroups.com
Team RabbitMQ has seen plenty of issues which users could have been avoided had the runaway queue growth been
monitored. Running out of disk space is a very common example.

We are extending our monitoring docs [1] to focus on specific areas that must be monitored in our opinion.

So if you ask me, TTL and length limits can work very well but they are orthogonal to monitoring. Monitoring helps you reason about
the state of your system so that when things do not work the way they usually do, you can identify the problem quicker.

Alceu Rodrigues de Freitas Junior

unread,
Sep 25, 2018, 7:12:06 PM9/25/18
to rabbitm...@googlegroups.com
Good point Michael... even though I could argue that running out of space should be something to be also be monitored in parallel, having both metrics stored somewhere would indeed help to troubleshoot much quicker a problem.

Regarding the rates I commented, my experience came from development teams being "obsessed" with some "magic" number of messages in the queue when actually the overall system was just under pressure. And, usually, when a team ask for such metrics to be recovered and stored, they want to make in shortest time intervals (like 1 minute), which I believe should be expensive for RabbitMQ, specially when using a cluster.

I recently read this book "Practical Monitoring" that suggest the same thing (avoiding static thresholds, see here).

Michael Klishin

unread,
Sep 25, 2018, 8:52:21 PM9/25/18
to rabbitm...@googlegroups.com
We recommend polling at least every 30 seconds (60 is optimal for nearly every monitoring tool we tried or used ourselves) [1].

Metrics interpretation and avoidance of false positives is definitely something that deserves a small book written about it :)

Alceu Rodrigues de Freitas Junior

unread,
Sep 26, 2018, 9:40:56 AM9/26/18
to rabbitm...@googlegroups.com
Thanks Michael,

How much does it cost for RabbitMQ in terms of CPU to count messages in large queues with such small time intervals? For instance, to be able to use 30 seconds, I would need to make sure that the returned values are not cached (management_db_cache_multiplier parameter).

Thanks again,
Alceu

Michael Klishin

unread,
Sep 26, 2018, 9:43:46 AM9/26/18
to rabbitm...@googlegroups.com
Message counters are continuously maintained and assuming queue indexes are available, which for a running
node is always the case, do not involve message store scans. So it's pretty efficient (up to O(1) in the best case scenario).
Reply all
Reply to author
Forward
0 new messages