rabbitmq-diagnostics command crashing sometimes

222 views
Skip to first unread message

prasanth kumar

unread,
Nov 30, 2022, 11:59:59 PM11/30/22
to rabbitmq-users
Hi Team,

We have 3 rmq node cluster and other 20+ application running in one VM.

In our startup script we are running "rabbitmq-diagnostics check_running" command for every 20 seconds to make sure rmq is in running state.

Some times the command is getting crashed and we are getting some erlang errors which is not understandable. Due to this issue our service is getting restarted and sometimes it results in to outage. Please help us to understand the issue. 

There is no issue observed on the server logs. Only this command is impacting.  

Below is the error :


    :erlang.open_port({:spawn_executable, '/usr/bin/sh'}, [{:args, ['-ex', '-c', '. "/etc/rabbitmq/rabbitmq-env.conf" && echo "-----BEGIN VARS LIST FOR PID 47800-----" && set']}, {:env, [{'SYS_PREFIX', []}, {'RABBITMQ_HOME', '/usr/lib/rabbitmq/lib/rabbitmq_server-3.8.8'}, {'CONFIG_FILE', '/etc/rabbitmq/rabbitmq'}, {'ADVANCED_CONFIG_FILE', '/etc/rabbitmq/advanced.config'}, {'MNESIA_BASE', '/var/lib/rabbitmq/mnesia'}, {'ENABLED_PLUGINS_FILE', '/etc/rabbitmq/enabled_plugins'}, {'PLUGINS_DIR', '/usr/lib/rabbitmq/plugins:/usr/lib/rabbitmq/lib/rabbitmq_server-3.8.8/plugins'}, {'CONF_ENV_FILE_PHASE', 'rabbtimq-prelaunch'}]}, :binary, :use_stdio, :stderr_to_stdout, :exit_status])
    src/rabbit_env.erl:1506: :rabbit_env.do_load_conf_env_file/3
    (stdlib) lists.erl:1263: :lists.foldl/3
    src/rabbit_env.erl:69: :rabbit_env.get_context/0
    (rabbitmqctl) lib/rabbitmq/cli/core/config.ex:90: RabbitMQ.CLI.Core.Config.get_system_option/2
    (rabbitmqctl) lib/rabbitmq/cli/core/config.ex:23: RabbitMQ.CLI.Core.Config.get_option/2
    (rabbitmqctl) lib/rabbitmqctl.ex:246: RabbitMQCtl.merge_defaults_node/1
    (rabbitmqctl) lib/rabbitmqctl.ex:240: RabbitMQCtl.merge_all_defaults/1

Thanks in Advance,
Prasanth.

Luke Bakken

unread,
Dec 1, 2022, 3:53:58 PM12/1/22
to rabbitmq-users
Hi Prasanth,

Any time you ask a question about RabbitMQ we need to know important details like...
  • RabbitMQ version (I see 3.8.8 in your output. This version is unsupported)
  • Erlang version
  • Operating system running RabbitMQ, and version
  • Full configuration files and environment variable (attached to your question, not pasted!)
Is the output you provided the full output, no edits?

How often does this command fail vs succeed?

Thanks,
Luke

prasanth kumar

unread,
Dec 1, 2022, 10:09:30 PM12/1/22
to rabbitm...@googlegroups.com
Hi Luke,

Our version is RabbitMQ 3.8.8 on Erlang 22.3.4.7, running on Redhat - 7.

Seems unsupported from your end :-). 

The error I pasted is the full output which I received for the command. This command will run for every 20 seconds. For a day minimum 6-7 times we are observing this crash on all the rmq nodes randomly.

We are observing this issue after VM resize (increased), Now setup is with 64 core cpu. Before resizing we didn't observe this issue. We verified CPU usage but we don't see any spike also during crashing time.

Env file:
HOME=/var/lib/rabbitmq
RABBITMQ_CONFIG_FILE=/etc/rabbitmq/rabbitmq.conf
RABBITMQ_ADVANCED_CONFIG_FILE=/etc/rabbitmq/advanced.config
RABBITMQ_MNESIA_BASE=/Database/rabbitmq/mnesia
RABBITMQ_LOG_BASE=/DGlogs/rabbitmq
RABBITMQ_USE_LONGNAME=false
[root@RMQ-118 bin]#

Thanks & regards,
Prasanth



--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/43557818-0404-44f1-b7d1-2c8b8e62013dn%40googlegroups.com.

Michal Kuratczyk

unread,
Dec 2, 2022, 2:26:28 AM12/2/22
to rabbitm...@googlegroups.com
Hi,

If I understood correctly, you are running 20+ applications and a RabbitMQ cluster in one VM? If so, the command likely fails when some system limit is reached:
number of open files, running processes or something like that. The command just tries to start a shell command at that point, so anything that could prevent
a process from starting is the most likely culprit. If not on the system level, it could also happen on the Erlang level (I'd expect you to find something about it in the logs
in such case). For example the number of open ports could be exceeded https://www.erlang.org/doc/man/erl.html. I've never seen this being a problem but you can
check how many you are using with "rabbitmqctl eval 'length(erlang:ports()).'" (of course the result will vary over time).

Lastly, in the Kubernetes Operator we decided to use a TCP probe as a health check. Perhaps it would be a good choice for you that also works around the issue.

Best,



--
Michał
RabbitMQ team

prasanth kumar

unread,
Dec 5, 2022, 12:05:34 AM12/5/22
to rabbitmq-users
Hi Michal,

Thank you for your Response.

Could you please let us know if we have any API that would be equivalent to 'rabbitmq-diagnostics check_running' for monitoring RabbitMQ server  or what is the recommended rest API to replace this to avoid the same issue in future? 


Regards,
Prasanth

Michal Kuratczyk

unread,
Dec 5, 2022, 2:34:51 AM12/5/22
to rabbitm...@googlegroups.com
Hi,

We still don't know what causes the problem so I can't tell you how to avoid it. If my hunch is correct, and you are running out of file descriptors,
or reach some other system-level limit, you will have other problems soon, so changing the way you perform a health check won't change much.

As I said, when developing the Kubernetes Operator, we decided to perform a TCP port check, not an HTTP API request.

Best,



--
Michał
RabbitMQ team
Reply all
Reply to author
Forward
0 new messages