rabbitmqctl status failing with distribution failed after upgrading to 3.7.20 from 3.6.4

2,483 views
Skip to first unread message

Karl Jackson

unread,
Nov 21, 2019, 5:36:16 PM11/21/19
to rabbitmq-users
I was running RabbitMQ 3.6.4 along with erlang 18.3. I upgraded erlang to 3.7.20 and RabbitMQ 3.7.20. I am still able to send and receive messages from a named queue.

However, when I now try to run a rabbitmqctl status, I get the following error:

Distribution failed: {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}}, {:child, :undefined, :net_sup_dynamic, {:erl_distribution, :start_link, [[:"rabbitmqcli-3486-rabbit@node1", :shortnames, 15000], false]}, :permanent, 1000, :supervisor, [:erl_distribution]}}


The system is running Red Hat Enterprise Linux Server release 7.7 (Maipo). I put selinux in permissive mode and it still failed. How do I troubleshoot the issue?

Karl Jackson

Wesley Peng

unread,
Nov 21, 2019, 5:40:36 PM11/21/19
to rabbitmq-users
Hi

This is may due to name resolving issues. All long and short hostnames should be resolved by DNS or hosts file.

Sent from Mobile


Friday, November 22, 2019, 6:36 AM +0800 from iss...@gmail.com <iss...@gmail.com>:
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/f62ab4f7-b0f0-4ce1-9887-73dbf664be8b%40googlegroups.com.

Karl Jackson

unread,
Nov 21, 2019, 5:51:49 PM11/21/19
to rabbitmq-users
The hostname command returns the short hostname. Nslookup responds with the FQDN. 

I tried running rabbitmqctl status --longnames, but I got the same error:

Distribution failed: {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}}, {:child, :undefined, :net_sup_dynamic, {:erl_distribution, :start_link, [[:"rabbitmqcli-12085-rabbit@longhostname", :longnames, 15000], false]}, :permanent, 1000, :supervisor, [:erl_distribution]}}


Thank you,
Karl Jackson

On Thursday, November 21, 2019 at 3:40:36 PM UTC-7, Wesley Peng wrote:
Hi

This is may due to name resolving issues. All long and short hostnames should be resolved by DNS or hosts file.

Sent from Mobile


Friday, November 22, 2019, 6:36 AM +0800 from iss...@gmail.com <iss...@gmail.com>:
I was running RabbitMQ 3.6.4 along with erlang 18.3. I upgraded erlang to 3.7.20 and RabbitMQ 3.7.20. I am still able to send and receive messages from a named queue.

However, when I now try to run a rabbitmqctl status, I get the following error:

Distribution failed: {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}}, {:child, :undefined, :net_sup_dynamic, {:erl_distribution, :start_link, [[:"rabbitmqcli-3486-rabbit@node1", :shortnames, 15000], false]}, :permanent, 1000, :supervisor, [:erl_distribution]}}


The system is running Red Hat Enterprise Linux Server release 7.7 (Maipo). I put selinux in permissive mode and it still failed. How do I troubleshoot the issue?

Karl Jackson

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

Luke Bakken

unread,
Nov 21, 2019, 5:54:59 PM11/21/19
to rabbitmq-users
Hi Karl,

Are you running RabbitMQ with short or long host names? Can you attach all of your complete RabbitMQ configuration files? rabbitmq-env.conf, rabbitmq.config / rabbitmq.conf, etc

Karl Jackson

unread,
Nov 21, 2019, 6:33:57 PM11/21/19
to rabbitmq-users
Here are the files.

rabbitmq.conf:

[

{kernel,

[{inet_dist_listen_min, 55700},

{inet_dist_listen_max, 56000}

]

},

        {rabbit,

          [{vm_memory_high_watermark, 0.8},

          {loopback_users, []}]

        }

].


rabbitmq-env.conf:

#example rabbitmq-env.con file entries

#set the port address

NODE_PORT=35672


Thank you,

Karl Jackson

Luke Bakken

unread,
Nov 21, 2019, 6:49:42 PM11/21/19
to rabbitmq-users
Hi Karl,

What is the output when you run these commands?

rabbitmqctl -n "rabbit@$(hostname -s)" status
rabbitmqctl -n rabbit@localhost status
ping -c2 localhost
ping -c2 "$(hostname -s)"
ping -c2 "$(hostname)"

Thanks,
Luke

Karl Jackson

unread,
Nov 21, 2019, 8:17:41 PM11/21/19
to rabbitmq-users
Here is the output of the commands.

rabbitmqctl -n "rabbit@$(hostname -s)" status

Distribution failed: {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}}, {:child, :undefined, :net_sup_dynamic, {:erl_distribution, :start_link, [[:"rabbitmqcli-29483-rabbit@jackson1", :shortnames, 15000], false]}, :permanent, 1000, :supervisor, [:erl_distribution]}}



[root@jackson1 adm.kjj3]# rabbitmqctl -n rabbit@localhost status

Distribution failed: {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}}, {:child, :undefined, :net_sup_dynamic, {:erl_distribution, :start_link, [[:"rabbitmqcli-29913-rabbit@jackson1", :shortnames, 15000], false]}, :permanent, 1000, :supervisor, [:erl_distribution]}}



[root@jackson1 adm.kjj3]# ping -c2 localhost

PING localhost (127.0.0.1) 56(84) bytes of data.

64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.043 ms

64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.043 ms


--- localhost ping statistics ---

2 packets transmitted, 2 received, 0% packet loss, time 999ms

rtt min/avg/max/mdev = 0.043/0.043/0.043/0.000 ms



[root@jackson1 adm.kjj3]# ping -c2 "$(hostname -s)"

PING jackson1.byu.edu (10.11.14.29) 56(84) bytes of data.

64 bytes from jackson1.byu.edu (10.11.14.29): icmp_seq=1 ttl=64 time=0.041 ms

64 bytes from jackson1.byu.edu (10.11.14.29): icmp_seq=2 ttl=64 time=0.044 ms


--- jackson1.byu.edu ping statistics ---

2 packets transmitted, 2 received, 0% packet loss, time 999ms

rtt min/avg/max/mdev = 0.041/0.042/0.044/0.006 ms



[root@jackson1 adm.kjj3]# ping -c2 "$(hostname)"

PING jackson1.byu.edu (10.11.14.29) 56(84) bytes of data.

64 bytes from jackson1.byu.edu (10.11.14.29): icmp_seq=1 ttl=64 time=0.032 ms

64 bytes from jackson1.byu.edu (10.11.14.29): icmp_seq=2 ttl=64 time=0.044 ms


--- jackson1.byu.edu ping statistics ---

2 packets transmitted, 2 received, 0% packet loss, time 999ms

rtt min/avg/max/mdev = 0.032/0.038/0.044/0.006 ms



Thank you,

Karl Jackson

Luke Bakken

unread,
Nov 22, 2019, 10:07:33 AM11/22/19
to rabbitmq-users
Hi Karl,

You can see that both hostname and hostname -s return the same value, which is unusual.

Please attach your RabbitMQ log file. I'm most interested to know what node name is being used by RabbitMQ. I'm guessing that it will be rab...@jackson1.byu.edu

If that is the case it is technically a "long name" but you do not have RabbitMQ configured to use long names. Add the following line to /etc/rabbitmq/rabbitmq-env.conf 

USE_LONGNAME=true

Restart RabbitMQ, re run this command, and I think it should work:

rabbitmqctl status

Thanks,
Luke

Karl Jackson

unread,
Nov 25, 2019, 12:00:15 PM11/25/19
to rabbitmq-users
I added the change to rabbitmq-env.conf, so it now contains:

#example rabbitmq-env.con file entries

#set the port address

NODE_PORT=35672

USE_LONGNAME=true


I am still getting the error:


Distribution failed: {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}}, {:child, :undefined, :net_sup_dynamic, {:erl_distribution, :start_link, [[:"rabbitmqcli...@jackson1.byu.edu", :longnames, 15000], false]}, :permanent, 1000, :supervisor, [:erl_distribution]}}


Thank you,

Karl Jackson

Luke Bakken

unread,
Nov 25, 2019, 12:25:36 PM11/25/19
to rabbitmq-users
Hi Karl,

You must have missed my request ... could you please attach your complete RabbitMQ log file? At this point I'm mystified as to what is going on.

Did you restart RabbitMQ after adding that configuration setting?

Thanks,
Luke


On Monday, November 25, 2019 at 9:00:15 AM UTC-8, Karl Jackson wrote:
I added the change to rabbitmq-env.conf, so it now contains:

#example rabbitmq-env.con file entries

#set the port address

NODE_PORT=35672

USE_LONGNAME=true


I am still getting the error:


Distribution failed: {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}}, {:child, :undefined, :net_sup_dynamic, {:erl_distribution, :start_link, [[:"rabbitmqcli-2790-rabbit@jackson1.byu.edu", :longnames, 15000], false]}, :permanent, 1000, :supervisor, [:erl_distribution]}}


Thank you,

Karl Jackson

Karl Jackson

unread,
Nov 25, 2019, 2:11:07 PM11/25/19
to rabbitmq-users
Hi Luke,

I made the change to the file and then started rabbitmq-server. I didn't see your post. I am attaching the log.

Thank you,
Karl Jackson

On Monday, November 25, 2019 at 10:25:36 AM UTC-7, Luke Bakken wrote:
Hi Karl,

You must have missed my request ... could you please attach your complete RabbitMQ log file? At this point I'm mystified as to what is going on.

Did you restart RabbitMQ after adding that configuration setting?

Thanks,
Luke

On Monday, November 25, 2019 at 9:00:15 AM UTC-8, Karl Jackson wrote:
I added the change to rabbitmq-env.conf, so it now contains:

#example rabbitmq-env.con file entries

#set the port address

NODE_PORT=35672

USE_LONGNAME=true


I am still getting the error:


Distribution failed: {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}}, {:child, :undefined, :net_sup_dynamic, {:erl_distribution, :start_link, [[:"rabbitmqcli...@jackson1.byu.edu", :longnames, 15000], false]}, :permanent, 1000, :supervisor, [:erl_distribution]}}


Thank you,

Karl Jackson

rabbit@jackson1.log

Luke Bakken

unread,
Nov 25, 2019, 3:14:58 PM11/25/19
to rabbitmq-users
Hi Karl,

Turns out my guess was incorrect. RabbitMQ is using the short node name:

rabbit@jackson1

You can remove the USE_LONGNAME setting from rabbitmq-env.conf

I would like to see the output of this command:


erl -A0 -noinput -boot start_clean -eval 'net_kernel:start([list_to_atom("rabbit-gethostname-" ++ os:getpid()), shortnames]), [_, H] = string:tokens(atom_to_list(node()), "@"), io:format("~s~n", [H]), init:stop().'

If you see the same nodistribution error, please run epmd -daemon first and re-try.

Karl Jackson

unread,
Nov 25, 2019, 3:50:26 PM11/25/19
to rabbitmq-users
I removed the USE_LONGNAME from rabbitmq-env.conf and restarted rabbitmq-server.

The output of the erl command is

[root@jackson1 adm.kjj3]# erl -A0 -noinput -boot start_clean -eval 'net_kernel:start([list_to_atom("rabbit-gethostname-" ++ os:getpid()), shortnames]), [_, H] = string:tokens(atom_to_list(node()), "@"), io:format("~s~n", [H]), init:stop().'

jackson1


Thank you,
Karl Jackson

Luke Bakken

unread,
Nov 25, 2019, 4:37:51 PM11/25/19
to rabbitmq-users
Hey Karl,

I'm just about out of ideas for your system. I can reproduce the nodistribution error locally if epmd isn't running, but that doesn't seem to be the case in your environment.

If you run rabbitmqctl status, do you still see node1 as the node name in the output, like in your earlier message?

rabbitmqcli-NNNN-rabbit@node1

What is the output of this command?

echo $HOSTNAME

Thanks -
Luke

Karl Jackson

unread,
Nov 25, 2019, 4:56:29 PM11/25/19
to rabbitmq-users
Hi Luke,

Yes. The output is the same:

[root@jackson1 adm.kjj3]# rabbitmqctl status

Distribution failed: {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}}, {:child, :undefined, :net_sup_dynamic, {:erl_distribution, :start_link, [[:"rabbitmqcli-20305-rabbit@jackson1", :shortnames, 15000], false]}, :permanent, 1000, :supervisor, [:erl_distribution]}}



The output for echo $HOSTNAME is

[root@jackson1 adm.kjj3]# echo $HOSTNAME

jackson1


Thank you,
Karl Jackson


Luke Bakken

unread,
Nov 25, 2019, 5:18:42 PM11/25/19
to rabbitmq-users
Hi Karl,

Does the host name jackson1 resolve in DNS? What is the output of this command?

ping -c2 jackson1

Thanks,
Luke

Karl Jackson

unread,
Nov 25, 2019, 6:34:24 PM11/25/19
to rabbitmq-users
Hi Luke,

The output is

[root@jackson1 adm.kjj3]# ping -c2 jackson1

PING jackson1.byu.edu (10.11.14.29) 56(84) bytes of data.

64 bytes from jackson1.byu.edu (10.11.14.29): icmp_seq=1 ttl=64 time=0.041 ms

64 bytes from jackson1.byu.edu (10.11.14.29): icmp_seq=2 ttl=64 time=0.049 ms


Thank you,

Karl

Luke Bakken

unread,
Nov 25, 2019, 7:57:38 PM11/25/19
to rabbitmq-users
Thanks Karl.

I'd like to check what interface RabbitMQ and epmd are listening on. Would you mind running this and returning the output?

sudo netstat -pan | egrep '^tcp'

If you want to remove output not related to beam.smp or epmd, feel free. Or, use the "Reply privately to author" feature in the google groups UI to send me the output directly.

Do you have any iptables or other firewall rules in place?

This is an "above average" difficulty problem to diagnose. Thanks for your patience.

Luke Bakken

unread,
Nov 26, 2019, 12:07:35 PM11/26/19
to rabbitmq-users
Hi Karl,

Thanks for the message showing that epmd and RabbitMQ are listening on the expected interfaces.

Can you run this command?

epmd -names

Can you use nc or telnet to connect to port 4369?

nc jackson1 4369

All of my research indicates that this is an issue with contacting epmd during the start of the Erlang VM that is run by rabbitmqctl

Thanks -
Luke

Karl Jackson

unread,
Nov 26, 2019, 12:37:02 PM11/26/19
to rabbitmq-users
Hi Luke,

Here is the output you requested

[root@jackson1 adm.kjj3]# epmd -names

epmd: up and running on port 4369 with data:

name rabbit at port 55700


MacBook-Pro-4:Downloads kjj3$ telnet jackson1 4369

Trying 10.11.14.29...

Connected to jackson1.byu.edu.

Escape character is '^]'.


Thank you,
Karl

Luke Bakken

unread,
Nov 26, 2019, 12:49:52 PM11/26/19
to rabbitmq-users
Hi Karl,

Sorry, I should have asked you to connect to port 4369 on the same server running rabbitmqctl. You appear to have connected to it from an OS X machine.

Please also use the same account (root) that you are using to run rabbitmqctl.

Thanks -
Luke

Luke Bakken

unread,
Nov 26, 2019, 1:08:17 PM11/26/19
to rabbitmq-users
Hi Karl,

In addition to what I asked below, let's take a second to discuss how you've customized the ports on which RabbitMQ listens.

You've set the default node port to 35672. This means that Erlang distribution will be using a port range computed from that:


But, in your environment, the distribution port is being reported by epmd as 55700 when according to the start script it should be 55672 (35672 + 20000). There's something not adding up.

In addition to the Erlang distribution port used by RabbitMQ, the rabbitmqctl command also uses distribution ports:


In your case, the CLI distribution port range is 65672 through 65682, which is an invalid port range.

Here's what I recommend doing, if you can bring down RabbitMQ on this system for a while:

* Disable all firewall rules and flush the iptable rule set. Ensure that no firewall is in place.

* Move all RabbitMQ configuration from /etc/rabbitmq to a temp directory.

* Stop the RabbitMQ server service.

* Kill all epmd and beam.smp processes.

* As root, start a new shell and run RabbitMQ in the foreground by running the rabbitmq-server command.

* Run netstat and ensure that the following processes are listening on 0.0.0.0:

epmd on port 4369
beam.smp on port 15672, 5672 and 25672

* Run rabbitmqctl status from another fresh root shell.

Thanks,
Luke

On Tuesday, November 26, 2019 at 9:49:52 AM UTC-8, Luke Bakken wrote:
Hi Karl,

Karl Jackson

unread,
Nov 26, 2019, 2:03:04 PM11/26/19
to rabbitmq-users
Hi Luke,

I did as you asked. Here are the results:

[root@jackson1 adm.kjj3]# telnet jackson1 4369

Trying 10.11.14.29...

Connected to jackson1.

Escape character is '^]'.

^CConnection closed by foreign host.



[root@jackson1 adm.kjj3]# netstat -pan|grep 0.0.0.0

tcp        0      0 0.0.0.0:46781           0.0.0.0:*               LISTEN      -                   

tcp        0      0 0.0.0.0:42817           0.0.0.0:*               LISTEN      1316/rpc.statd      

tcp        0      0 127.0.0.1:199           0.0.0.0:*               LISTEN      1251/snmpd          

tcp        0      0 0.0.0.0:25672           0.0.0.0:*               LISTEN      6704/beam.smp       

tcp        0      0 0.0.0.0:22222           0.0.0.0:*               LISTEN      1257/sshd           

tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      1/systemd           

tcp        0      0 0.0.0.0:4369            0.0.0.0:*               LISTEN      6658/epmd           

tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      1704/master         

udp        0      0 0.0.0.0:111             0.0.0.0:*                           1/systemd           

udp        0      0 0.0.0.0:161             0.0.0.0:*                           1251/snmpd          

udp        0      0 0.0.0.0:33986           0.0.0.0:*                           -                   

udp        0      0 127.0.0.1:323           0.0.0.0:*                           889/chronyd         

udp        0      0 0.0.0.0:50577           0.0.0.0:*                           1316/rpc.statd      

udp        0      0 0.0.0.0:622             0.0.0.0:*                           1296/rpcbind        

udp        0      0 127.0.0.1:659           0.0.0.0:*                           1316/rpc.statd  


[root@jackson1 adm.kjj3]# netstat -pan|grep 5672

tcp        0      0 0.0.0.0:25672           0.0.0.0:*               LISTEN      6704/beam.smp       

tcp6       0      0 :::5672                 :::*                    LISTEN      6704/beam.smp      


Nothing is listening on 15672.


I can now run rabbitmqctl status, but I am not able to connect to the management console through a browser.


Thank you,

Karl Jackson

Luke Bakken

unread,
Nov 26, 2019, 2:37:26 PM11/26/19
to rabbitmq-users
Hi Karl,

That confirms my theory about using port 35672 as RabbitMQ's AMQP port - it results in invalid numbers being used for other listening ports.

Moving everything out of /etc/rabbitmq also moves the /etc/rabbitmq/enabled_plugins file, which is why the management interface isn't available. You can move that file back.

If you must use 35672 as the AMQP port, you'll have to define other port ranges in /etc/rabbitmq/rabbitmq-env.conf like this:

NODE_PORT=35672
DIST_PORT=25672
CTL_DIST_PORT_MIN=25680
CTL_DIST_PORT_MAX=25690

Save the above and restart RabbitMQ. With the firewall still disabled, ensure that the rabbitmqctl status and rabbitmqctl list_queues commands work.

Then, ensure that your firewall rules allow the following and re-enable it:

* TCP connections to port 35672, since this is what AMQP clients will be using.

* TCP connections to port 4369 (epmd), from local and remote hosts. Remote hosts are necessary for clustering.

* TCP connections to port 25672, from local and remote hosts. This is the Erlang distribution port used by RabbitMQ to communicate to the CLI tools (rabbitmqctl) and other nodes if you are running a cluster.

* TCP connections to ports in the range 25680 to 25690. These connections can be limited to just the local IP addresses only (127.0.0.1 and 10.11.14.29). These ports are used to connect from RabbitMQ back to the CLI tool for streaming data back to the client. You must allow remote connections if you use the -n argument to connect to a remote RabbitMQ node, however (if you are using a cluster).

* TCP connections to port 15672 for the management UI

Let me know how that goes -
Luke

Karl Jackson

unread,
Nov 26, 2019, 3:32:12 PM11/26/19
to rabbitmq-users
Hi Luke,

Those changes made the difference; things are now working.

Thank you for your help. 

Karl Jackson
Reply all
Reply to author
Forward
0 new messages