Hello,
we have an externally exposed RabbitMQ 3.9 cluster which is used by hundreds of external clients. Since we don’t have much control what application versions they are running, for backward compatibility and convenience the cluster is exposed via Google TCP load-balancer under one DNS hostname. Under normal circumstances this works great.
The issue arise when one of the nodes hit resource limits and blocking alarm kicks-in (usually due to RAM usage). Load balancer uses Kubernetes health checks to see if it can forward requests to the correct node, and health check is performed by doing “rabbitmq-diagnostics check_running” every few minutes. Since “rabbitmq-diagnostics check_running” doesn’t return any bad status when alarm is running, the load-balancer just sends the requests to the node which is not healthy at all.
Now, how do I check if a cluster node is healthy *and* if the alarm is in progress? In the documentation I found that I should probably run “rabbitmq-diagnostics -q check_running && rabbitmq-diagnostics -q check_local_alarms”, but I’m not sure what is the performance penalty running check_local_alarms in addition to check_running? Currently I’m running check_running every 4 minutes. I had to increase from 30 seconds to 4 minutes because check_running was pretty expensive CPU wise.
Or maybe there is a better way for our problem? How others handle alarms in applications which only see one DNS address because of load balancer? Maybe we should skip load-balancer all together and just hardcode all cluster node DNS addresses in the application for direct usage?
--
Best Regards,
Vilius Šumskas
Rivile
IT manager
Thank you Luke! I will definitely try to look into HTTP API as a health check. It could be problematic though, since we are using a lot of vhosts.
In the meantime, as a temporary duct tape fix, I will try “rabbitmq-diagnostics -q check_running && rabbitmq-diagnostics -q check_local_alarms”. Is check_local_alarms more expensive resource wise than check_running? (not considering Erlang VM startup time).
--
Vilius
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit
https://groups.google.com/d/msgid/rabbitmq-users/91fdb36f-8408-45e8-b02b-6d8d6bc16d66n%40googlegroups.com.