Migrating Monitoring from Management Plugin to Prometheus Plugin

77 views
Skip to first unread message

Ilia Kurenkov

unread,
Feb 8, 2023, 4:48:02 PM2/8/23
to rabbitmq-users
I tried to draw parallels between the metrics coming from the Management Plugin and the newer Prometheus Plugin.

Just based on the available documentation for the metrics, it seems there's a significant difference in the kinds of metrics the two plugins expose. Is this true? I'm not versed in Erlang, so I felt lost trying to find the answer in the source code myself.

Thanks for your help!

Ilia Kurenkov

unread,
Feb 8, 2023, 4:51:04 PM2/8/23
to rabbitmq-users
As a related follow-up:

It seems that the management plugin endpoint exposes metrics about all nodes in the cluster, whereas the prometheus endpoint will only report objects from the node its on, with the exception of things like quorum queues.

If I have a cluster of many nodes and I want to monitor them with the Prometheus plugin, what's a good way to automatically discover these nodes? I wouldn't want to manually have to add endpoints for each one.

Michal Kuratczyk

unread,
Feb 9, 2023, 3:01:55 AM2/9/23
to rabbitm...@googlegroups.com
Hi,

Yes, there is an overlap in the metrics but there are also different metrics exposed only by one of the endpoints. If there is anything available through the Management API that you wish was available through the Prometheus endpoint, that'd be valuable feedback.

And yes, you get metrics from a single node when you query the Prometheus endpoint. This is one of the most important distinctions and improvements. In a large system, with many queues and connections, the Management API can be very slow to respond as it needs to query all details from all nodes. Prometheus' endpoint responds with its local data only. Prometheus (and other monitoring systems that support Prometheus endpoints/format) have service discovery mechanisms:

Best,

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/d840bf94-fa3c-4b68-82fe-27388cbea255n%40googlegroups.com.


--
Michał
RabbitMQ team

Ilia Kurenkov

unread,
Feb 9, 2023, 4:05:16 AM2/9/23
to rabbitmq-users
Thanks Michal for the response!

Regarding the metrics overlap, could you point me to some leads from which I can assemble a list of matches/mismatches. I have created a list based on comparing descriptions, but that experience makes me think either some docs by the devs or even source code would be a better source of truth.

Michal Kuratczyk

unread,
Feb 9, 2023, 4:55:47 AM2/9/23
to rabbitm...@googlegroups.com
We don't have such a list. Many of the metrics are collected in a different way so it's not a simple change of metric name that you can find in the source code.
We published dashboards that should get you started with Prometheus-based monitoring. https://grafana.com/orgs/rabbitmq/dashboards
There are also a few markdown files with lists of metrics per endpoint (we actually have a few Prometheus endpoints):



--
Michał
RabbitMQ team

Matt Green

unread,
Feb 9, 2023, 5:09:06 AM2/9/23
to rabbitm...@googlegroups.com
I too found the process of working out where to get stats from and how a little confusing. However I think I've got the hang of it now and the stats I was able to get from the new "detailed" endpoint are much quicker than using the Management UI. I don't know if it helps but I use "rabbitmqctl cluster_status" to get a view of which nodes are in the cluster and if there are any problems, like a node down or a split brain.

I then use the list of nodes from there to know where I have to go and look for Prometheus stats, that gets uploaded to a DB where I can run reports or issue alerts.

As I say, I'm very pleased with how much faster this is than having to scrape the Management UI. If I were to wish for anything at the moment it would be for a list of current connections open by users. Getting it from the Management UI is either slow or it just hangs (at least for me) so if that were available from Prometheus it would help.

Cheers,

Matthew.

Michal Kuratczyk

unread,
Feb 9, 2023, 5:15:41 AM2/9/23
to rabbitm...@googlegroups.com
Do you mean a count of connections per user or more than that?



--
Michał
RabbitMQ team

Matt Green

unread,
Feb 9, 2023, 5:28:01 AM2/9/23
to rabbitm...@googlegroups.com
I mean exactly that :-) I tried using "rabbitmqctl list_connections user" but it seems to hang (I would have then had to process the output to get a count per user).

Ilia Kurenkov

unread,
Feb 10, 2023, 11:49:46 AM2/10/23
to rabbitmq-users
Oh, Matt, interesting regarding the cluster info!

I'm particularly curious if that data can be used for auto-discovery of the nodes :)

Michal Kuratczyk

unread,
Feb 13, 2023, 11:06:31 AM2/13/23
to rabbitm...@googlegroups.com
Prometheus has multiple service discovery mechanisms. Why are you looking for something different?

As for connections per user, I guess it could be added. We already track this information (because you can set the maximum number of connections per user),
so it's just a matter of exposing it.

Best,



--
Michał
RabbitMQ team

Ilia Kurenkov

unread,
Feb 15, 2023, 10:15:58 AM2/15/23
to rabbitmq-users
We're getting rabbitmq to talk to Datadog, so we can't really take advantage of Prometheus autodiscovery.
Reply all
Reply to author
Forward
0 new messages