Which RabbitMQ healthcheck should I use

3,743 views
Skip to first unread message

Prateek Khatri

unread,
Oct 12, 2022, 5:42:47 AM10/12/22
to rabbitmq-users
Hi,
I'm trying to set up a RabbitMQ cluster where we're using Consul for service discovery and cluster formation.

There are several health checks available in RabbitMQ. [1] and [2]. My specific questions are:

1. Should I used rabbitmqctl cLI based health checks or API based? (both are possible with consul agent health check mechanism)

2. Is the aliveness check /api/overview enough ? or should I use a combination of the following:
  • /api/health/checks/alarms
  • /api/health/checks/port-listener/port
  • /api/health/checks/virtual-hosts
  • /api/health/checks/node-is-mirror-sync-critical
The objective in this case is to make the node available for receiving connections from clients via Consul DNS.

Another way to put this question is: Which combination of health-checks can be used to determine if a RabbitMQ broker is ready to serve the clients?


Michal Kuratczyk

unread,
Oct 12, 2022, 6:00:50 AM10/12/22
to rabbitm...@googlegroups.com
Hi,

In the Kubernetes Operator we decided to use a TCP probe on the AMQP port (5672/5671). CLIs are fairly expensive as they join the Erlang cluster as nodes.
Due to RabbitMQ's Erlang nature, there is no perfect - components can fail independently, for example a certain queue or vhost can be down, while the others
are up, so no healthcheck can tell you all the details. A TCP check is cheap and tells you that RabbitMQ is generally up and not in maintenance mode,
which is generally what you want to allow a new connection.

Best,

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/fb39cda0-5084-45df-8591-23189137eba1n%40googlegroups.com.


--
Michał
RabbitMQ team

Prateek Khatri

unread,
Oct 13, 2022, 12:20:04 AM10/13/22
to rabbitmq-users
Thanks, Michael,
By TCP check, I hope you mean the port check API: /api/health/checks/port-listener/port ?

Vilius Šumskas

unread,
Oct 13, 2022, 2:10:13 AM10/13/22
to rabbitm...@googlegroups.com

Hi,

 

we tried to use TCP based checks but then realized that in our case it‘s not enough, so now we use CLIs:

 

rabbitmq-diagnostics -q check_running && rabbitmq-diagnostics -q check_local_alarms

 

This correctly doesn‘t allow clients to connect to nodes which have alarms active (we have a loadbalancer between clients and a cluster).

 

--

    Vilius

Prateek Khatri

unread,
Oct 13, 2022, 2:22:54 AM10/13/22
to rabbitmq-users
what's the HTTP API equivalent of rabbitmqctl-diagnostics -q check_running

Michal Kuratczyk

unread,
Oct 13, 2022, 2:58:28 AM10/13/22
to rabbitm...@googlegroups.com
Hi,

@Prateek I just meant checking whether a TCP port is open, the same way load balancers often do; Kubernetes calls that TCP Readiness Probe

@Vilius Not allowing connections to nodes with alarms can also prevent these alarms from being cleared, since consumers are usually needed to do that and
you may prevent them from connecting. But if it works for your use case - great.

With quorum queues, classic queues v2 (especially the bits to be released in 3.12) and streams, I think memory alarms will mostly be a thing of the past for most people.
Each of these queue types has lower and/or more predictable memory usage than classic queues v1 (especially the non-lazy ones), so as long as you scale your cluster correctly,
you should have much fewer surprises/alarms.

Best,




--
Michał
RabbitMQ team

Vilius Šumskas

unread,
Oct 13, 2022, 4:20:07 PM10/13/22
to rabbitm...@googlegroups.com

Sorry for the slight hijack of the thread, and sorry if my question is stupid, but in case of the cluster won‘t alarms be cleared anyway by connecting clients to different node?

 

We are using classic v1 mirror queues at the moment.

 

--

    Vilius

Michal Kuratczyk

unread,
Oct 13, 2022, 5:29:57 PM10/13/22
to rabbitm...@googlegroups.com
Sure, if the queues are replicated then a consumer connecting to a different node could help clear the alarm as well.
On the other hand, if the alarms are common/likely enough to justify more complex readiness checks, it feels like something that should be addressed, not just avoided by healthchecks.

Best,



--
Michał
RabbitMQ team

Vilius Šumskas

unread,
Oct 14, 2022, 2:18:48 AM10/14/22
to rabbitm...@googlegroups.com

In part I agree.

 

In our case, it‘s a lack of limits on app architecture side. We have hundreds of external producers and some of them just suddently can spike in the amount of packets they send. This usually consumes all available RAM on the node, even if under normal conditions the usage is 3x less.

 

Our engineers and business are working with the customers on the final solution but it will take probably months to complete, hence our checks.

Michal Kuratczyk

unread,
Oct 18, 2022, 1:48:20 PM10/18/22
to rabbitm...@googlegroups.com
Hi,

"some of them just suddenly can spike in the amount of packets they send. This usually consumes all available RAM on the node, even if under normal conditions the usage is 3x less."

Do you know why? I'd focus on this. Please start a new thread or ping me on Slack and I'm very happy to help.
If there is something to improve in RabbitMQ, let's improve it.

Best,



--
Michał
RabbitMQ team

thoma...@gmail.com

unread,
Jul 21, 2023, 7:57:20 AM7/21/23
to rabbitmq-users
Hi.

I have been considering using the CLI commands @Vilius referenced (as in the documentation) to enhance our health check. Obviously a load balancer can't run the commands (that I know of) so what techniques are being used to make the results available?


Chad

Vilius Šumskas

unread,
Jul 21, 2023, 8:24:55 AM7/21/23
to rabbitm...@googlegroups.com

Hi,

 

we use load balancer integrated with Kubernetes. All load balancer needs in that case is to query health of RabbitMQ nodes from Kubernetes API. Health itself is tracked by running Readiness and Liveness probes on Kubernetes side.

 

--

    Vilius

Reply all
Reply to author
Forward
0 new messages