Rabbitmq user/privilege creation intermittently failing with "Distribution failed" error

186 views
Skip to first unread message

Pooja Ghumre

unread,
Aug 5, 2019, 3:39:29 PM8/5/19
to rabbitmq-users
Hello.. We have been using rabbitmq-server version 3.7.13 for quite sometime now, but recently intermittently running into below error during user creation or permission update. What does this error indicate?
Ansible task to assign permissions to rabbitmq user 'resmgr' on node rabbit@masterbroker:
FAILED! => {"changed": false, "cmd": "/usr/sbin/rabbitmqctl -q -n '' set_permissions -p / resmgr '^(pf9-changes)$' '^(pf9-changes)$' '^$'", "msg": "Distribution failed:********@masterbroker\", :shortnames, 15000], false]}, :permanent, 1000, :supervisor, [:erl_distribution]}}", "rc": 78, "stderr": "Distribution failed: {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}}, {:child, :undefined, :net_sup_dynamic, {:erl_distribution, :start_link, [[:\"rabbitmqcli-26007-rabbit@masterbroker\", :shortnames, 15000], false]}, :permanent, 1000, :supervisor, [:erl_distribution]}}\n", "stderr_lines": ["Distribution failed: {{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}}, {:child, :undefined, :net_sup_dynamic, {:erl_distribution, :start_link, [[:\"rabbitmqcli-26007-rabbit@masterbroker\", :shortnames, 15000], false]}, :permanent, 1000, :supervisor, [:erl_distribution]}}"], "stdout": "", "stdout_lines": []}

Pooja Ghumre

unread,
Aug 5, 2019, 8:32:30 PM8/5/19
to rabbitmq-users
Wanted to mention that retrying the set_permissions command worked on the same machine. Any clue why this fails intermittently?

[root@sandbox pooja]# rpm -qa | grep rabbitmq
rabbitmq-server-3.7.13-1.el7.noarch
[root@sandbox pooja]# rpm -qa | grep erlang
erlang-20.3-1.el7.centos.x86_64


[root@sandbox pooja]# rabbitmqctl list_user_permissions resmgr
Listing permissions for user "resmgr" ...

[root@sandbox pooja]# /usr/sbin/rabbitmqctl -q -n 'rabbit@masterbroker'  set_permissions -p / resmgr '^(pf9-changes)$' '^(pf9-changes)$' '^$'
[root@sandbox pooja]# rabbitmqctl list_user_permissions resmgr
Listing permissions for user "resmgr" ...
vhost configure write read
/ ^(pf9-changes)$ ^(pf9-changes)$ ^$


Thanks!

Pooja Ghumre

unread,
Aug 7, 2019, 7:36:46 PM8/7/19
to rabbitmq-users
Hi,

Any idea what the "{{:shutdown, {:failed_to_start_child, :net_kernel, {:EXIT, :nodistribution}}}, {:child, :undefined, :net_sup_dynamic, {:erl_distribution, :start_link, [[:\"rabbitmqcli-26007-rabbit@masterbroker\", :shortnames, 15000], false]}" error means?

Appreciate any pointers in this regard.

Thanks!



On Monday, August 5, 2019 at 12:39:29 PM UTC-7, Pooja Ghumre wrote:

Luke Bakken

unread,
Aug 8, 2019, 12:46:58 PM8/8/19
to rabbitmq-users
Hi Pooja,

You can see in the output that the node name argument (-n) to rabbitmqctl is an empty string:

/usr/sbin/rabbitmqctl -q -n ''

This is almost certainly an error in your Ansible code. I would start by investigating how that -n argument is constructed.

Thanks,
Luke

Pooja Ghumre

unread,
Aug 8, 2019, 4:31:20 PM8/8/19
to rabbitmq-users
Hey Luke,

Thanks, but I suspected the same and tried the command with empty nodename and got this error:

# /usr/sbin/rabbitmqctl -q -n '' set_permissions -p / resmgr '^(pf9-changes)$' '^(pf9-changes)$' '^$'
Unsupported node name: node name head (the part before the @) is invalid. Only alphanumerics, _ and - characters are allowed.

Then I noticed that the error I posted earlier was probably redacting the nodename while logging error and it did log the nodename in the error string later as:
"rabbitmqcli-26007-rabbit@masterbroker" and "Distribution failed:********@masterbroker"

Any other pointers?


Thanks,
Pooja

Luke Bakken

unread,
Aug 8, 2019, 5:23:42 PM8/8/19
to rabbitmq-users
Hello,


Or, DNS lookup of the host name masterbroker could be failing intermittently. I've found that this issue usually has to do with DNS resolution issues (in the rare instances I have seen it).

If this issue persists and you can't find a cause, you could switch to using the HTTP API for doing these permission operations.

FWIW, I searched google using these search terms to get ideas as to what may be causing this -


Thanks,
Luke

Pooja Ghumre

unread,
Aug 8, 2019, 8:51:02 PM8/8/19
to rabbitm...@googlegroups.com
Thanks Luke!

I can check on the iptables rules. Does rabbitmqcli use ports in the below range for this command?

35672-35682: used by CLI tools (Erlang distribution client ports) for communication with nodes and is allocated from a dynamic range (computed as server distribution port + 10000 through server distribution port + 10010). 

Do we need to stop any other services from using these ports?

Thanks,
Pooja


--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/ZkCCbQtDA-4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/328f93f4-8ee8-4b6f-8133-83072545480a%40googlegroups.com.

Luke Bakken

unread,
Aug 9, 2019, 11:24:01 AM8/9/19
to rabbitmq-users
Hi Pooja,

No, the permissions commands don't use those ports. I doubt this is a firewall issue ... at the moment my best guess is that it is DNS related. Is there an entry in /etc/hosts for masterbroker ? Does it use 127.0.0.1 as the IP address?
To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.

Pooja Ghumre

unread,
Aug 9, 2019, 11:28:53 AM8/9/19
to rabbitm...@googlegroups.com
Hi Luke,

We saw the same error for set_permissions and create_user commands intermittently. So I suspected some port was in use elsewhere probably. There is an entry for masterbroker in /etc/hosts as below:

[root@sandbox ~(admin)]# cat /etc/hosts
127.0.0.1 masterbroker localhost localhost.localdomain localhost4 localhost4.localdomain4

We are not using LONG_NAMES in rabbitmq currently. During clustering, we change masterbroker to a private IP instead of 127.0.0.1 and then use rabbit@brokerX on the other rabbit nodes with entry for all nodes with private IPs in /etc/hosts file. But this error is seen even before we get to clustering currently.


Thanks,
Pooja


Hi Pooja,

To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/ZkCCbQtDA-4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/3a976d22-d493-4a97-889f-c214d68c752d%40googlegroups.com.

Luke Bakken

unread,
Aug 9, 2019, 11:34:26 AM8/9/19
to rabbitmq-users
Hi Pooja -

This error is still intermittent, correct? What is your complete /etc/resolv.conf file?

Pooja Ghumre

unread,
Aug 9, 2019, 11:37:29 AM8/9/19
to rabbitm...@googlegroups.com
Hi Luke,

Yes, this error is intermittent, but last week, we hit this 6 times straight when deploying a node with rabbitmq. Also we have been on this same 3.7.13 version of rabbit for almost a year now but only started running into this error recently.

Here is the resolv.conf file as requested:

# cat /etc/resolv.conf
; Created by cloud-init on instance boot automatically, do not edit.
;
; generated by /usr/sbin/dhclient-script
search snn1.pf.io
nameserver 131.153.252.6
nameserver 131.153.252.7

Thanks,
Pooja


--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/ZkCCbQtDA-4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.

Luke Bakken

unread,
Aug 9, 2019, 11:42:00 AM8/9/19
to rabbitmq-users
Hello,

I should have asked for one more file - /etc/nsswitch.conf

For what it's worth this issue is not related to RabbitMQ version, most likely. Have you recently upgraded Erlang, or your operating system version?

Thanks -
Luke
To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.

Pooja Ghumre

unread,
Aug 9, 2019, 11:46:51 AM8/9/19
to rabbitm...@googlegroups.com
No, we haven't updated Erlang version either, its at 20.3 still. Also the OS version has been Centos 7.6.1810 for a while.

$ cat /etc/nsswitch.conf

# /etc/nsswitch.conf
#
# An example Name Service Switch config file. This file should be
# sorted with the most-used services at the beginning.
#
# The entry '[NOTFOUND=return]' means that the search for an
# entry should stop if the search in the previous entry turned
# up nothing. Note that if the search failed due to some other reason
# (like no NIS server responding) then the search continues with the
# next entry.
#
# Valid entries include:
#
# nisplus Use NIS+ (NIS version 3)
# nis Use NIS (NIS version 2), also called YP
# dns Use DNS (Domain Name Service)
# files Use the local files
# db Use the local database (.db) files
# compat Use NIS on compat mode
# hesiod Use Hesiod for user lookups
# [NOTFOUND=return] Stop searching if not found so far
#

# To use db, put the "db" in front of "files" for entries you want to be
# looked up first in the databases
#
# Example:
#passwd:    db files nisplus nis
#shadow:    db files nisplus nis
#group:     db files nisplus nis

passwd:     files sss
shadow:     files sss
group:      files sss
#initgroups: files

#hosts:     db files nisplus nis dns
hosts:      files dns

# Example - obey only what nisplus tells us...
#services:   nisplus [NOTFOUND=return] files
#networks:   nisplus [NOTFOUND=return] files
#protocols:  nisplus [NOTFOUND=return] files
#rpc:        nisplus [NOTFOUND=return] files
#ethers:     nisplus [NOTFOUND=return] files
#netmasks:   nisplus [NOTFOUND=return] files

bootparams: nisplus [NOTFOUND=return] files

ethers:     files
netmasks:   files
networks:   files
protocols:  files
rpc:        files
services:   files sss

netgroup:   files sss

publickey:  nisplus

automount:  files
aliases:    files nisplus


Thanks,
Pooja


To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/ZkCCbQtDA-4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/9757fae2-8ef6-4aed-8778-5cb84e8d81d9%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages