Issues with 'rabbitmqctl list_queues' timing out in RMQ 3.7.2

2,198 views
Skip to first unread message

Gavin Williams

unread,
Jan 24, 2018, 7:30:14 AM1/24/18
to rabbitmq-users
Afternoon all, 

I've installed a 3 node RMQ 3.7.2 Cluster running on CentOS 7.3 with erlang 20.2.2 hosted in an Openstack based cloud. 

The cluster installed and formed correctly, and I can see 3 nodes happily reporting in the cluster. 
However when I try and run `rabbitmqctl list_queues`, the command errors with:
Error: operation list_queues on node rabbit@mqb01 timed out. Timeout value used: 60.0. Some queue(s) are unresponsive, use list_unresponsive_queues command.

I've opened the documented ports as per https://www.rabbitmq.com/networking.html, and I've set kernel.inet_dist_listen_min/max to 25672 in rabbitmq.config.
All ports have been confirmed as open via Telnet between the 3 hosts. 

From running a TCPDump, I can see packets as per the attached screenshot. 
The client makes a EPMD_PORT2_REQ call, and gets back a EPMD_PORT2_RESP with a random port value, which I suspect is why things are timing out. 

So how do I need to configure RabbitMQ 3.7.2 to get it to only use the specified fixed ports?

Regards
Gavin 

Screen Shot 2018-01-24 at 12.27.19.png

Michael Klishin

unread,
Jan 24, 2018, 4:52:58 PM1/24/18
to rabbitm...@googlegroups.com
Have you tried providing a different —timeout value? Depending on how many queues
you have, listing them can take more than 60 seconds, although in 3.7 that is done
by streaming results from multiple nodes in parallel.

See also server logs and `rabbitmq-diagnostics maybe_stuck` output.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<Screen Shot 2018-01-24 at 12.27.19.png>

Gavin Williams

unread,
Jan 25, 2018, 4:08:02 AM1/25/18
to rabbitmq-users
Michael

I haven't tried increasing the timeout. There are only 2 queues in this cluster currently... 

However I did open all ports between the 3 cluster nodes, and the command now returns in under 1 second. 
So it's definitely a TCP port blocking issue... 

Nothing of any use in the server logs. 
Diagnostics show:
rabbitmq-diagnostics maybe_stuck
Asking node rabbit@mqb01 to detect potentially stuck Erlang processes...
2018-01-25 09:06:41 There are 374 processes.
2018-01-25 09:06:41 Investigated 1 processes this round, 5000ms to go.
2018-01-25 09:06:41 Investigated 1 processes this round, 4500ms to go.
2018-01-25 09:06:42 Investigated 1 processes this round, 4000ms to go.
2018-01-25 09:06:42 Investigated 1 processes this round, 3500ms to go.
2018-01-25 09:06:43 Investigated 1 processes this round, 3000ms to go.
2018-01-25 09:06:43 Investigated 1 processes this round, 2500ms to go.
2018-01-25 09:06:44 Investigated 1 processes this round, 2000ms to go.
2018-01-25 09:06:44 Investigated 1 processes this round, 1500ms to go.
2018-01-25 09:06:45 Investigated 1 processes this round, 1000ms to go.
2018-01-25 09:06:45 Investigated 1 processes this round, 500ms to go.
2018-01-25 09:06:46 Found 1 suspicious processes.
2018-01-25 09:06:46 [{pid,<9177.1.0>},
                     {registered_name,erts_code_purger},
                     {current_stacktrace,
                         [{erts_code_purger,wait_for_request,0,[]}]},
                     {initial_call,{erts_code_purger,start,0}},
                     {message_queue_len,0},
                     {links,[]},
                     {monitors,[]},
                     {monitored_by,[]},
                     {heap_size,233}]

Michael Klishin

unread,
Jan 25, 2018, 5:35:25 AM1/25/18
to rabbitm...@googlegroups.com
OK, then you have figured it out. Because in 3.7 list_* commands contact all cluster
nodes in parallel and wait for streaming results, inability to contact one of them will hang until a timeout hits.

Gavin Williams

unread,
Jan 25, 2018, 7:21:58 AM1/25/18
to rabbitmq-users
Michael

Figured out is a loose term. I'd still expect to be able to secure traffic between hosts to a subset of TCP ports, as was possible in previous versions. 
So I would consider this to be a bug in the updated implementation of the list_* commands. 

Regards

Michael Klishin

unread,
Jan 25, 2018, 7:59:17 AM1/25/18
to rabbitm...@googlegroups.com
The only thing that has changed is that rabbitmqctl will try to contact all cluster
nodes in parallel instead of just one.

Unless you try to restrict the set of client ports I don't see
how this is meaningfully different in terms of security: there are no extra ports
that node open, TLS for CLI-to-node communication is still supported, and so on.

Multiple outgoing connections is a trade-off
we decided to be worth it because it can make list_* operations drastically more responsive
and avoid operator confusion ("Oh, this thing must be stuck, let me kill it with -9").


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Gavin Williams

unread,
Jan 25, 2018, 8:50:21 AM1/25/18
to rabbitmq-users
Michael

Hmm, so the behaviour I'm seeing is that, with exactly the same security setup, 'list_queues' works fine on a 3.6.9 cluster, but doesn't work on a 3.7.2 cluster. 
So I don't see how making single or parallel requests will have any impact on that. 

The issue as far as I can see is that the remote server is now returning a random TCP port to attempt to continue the communications on, which is blocked because it's random. 
So it's either the changes made in https://github.com/rabbitmq/rabbitmq-server/pull/683, or there's been a change in Erlang 20.2 which has introduced this behaviour. 

Thoughts?

Luke Bakken

unread,
Jan 25, 2018, 12:31:34 PM1/25/18
to rabbitmq-users
Hi Gavin,

This statement caught my attention: "kernel.inet_dist_listen_min/max to 25672 in rabbitmq.config"

RabbitMQ 3.7.2 supports the new, ini-style configuration file whose name must end in .conf, not .config. Since you mention the configuration key kernel.inet_dist_listen_min, I assume you intend to use the new-style configuration file with the .conf extension. Could you double-check the name of your configuration file, or, better yet, attach it here?

I would also be interested to see the output of epmd -names

Thanks,
Luke

Gavin Williams

unread,
Jan 25, 2018, 5:07:13 PM1/25/18
to rabbitmq-users
Luke

Cheers for the response. 


Output of 'epmd --names' 
$ epmd -names
epmd: up and running on port 4369 with data:
name rabbit at port 25672

Cheers
Gavin 

Michael Klishin

unread,
Jan 25, 2018, 9:59:33 PM1/25/18
to rabbitm...@googlegroups.com
I recall a change that made CLI node name random in early 3.6.x releases but it's unrelated
to client TCP ports.

The setting in question configures *server's* inter-node (and CLI tool) communication port.
It has no effect on CLI tools. CLI tools do not read server's config file (in many cases they can't even in theory).

CLI tools discover what ports they should use when talking to a node by first connecting to epmd
on that node and asking it. You can see what an epmd instance returns by ssh-ing into the node
and running

epmd -names

None of that is new in 3.7 or has changed in a few years (since kernel.inet_dist_listen_min/max were introduced).

The ports used by CLI tools to then connect to the discovered port can be controlled via VM arguments.
CLI tools 3.7.0 through 3.7.2 have a bug where they ignore `RABBITMQ_CTL_ERL_ARGS` [1]. That's the only
relevant change (corrected in the upcoming 3.7.3) that I can think of.

There is a section on configuration verification in http://www.rabbitmq.com/configure.html. I'd start there
and proceed to using Wireshark or similar to see what ports are actually used by what tool.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Luke Bakken

unread,
Jan 26, 2018, 10:25:09 AM1/26/18
to rabbitmq-users
Hi Gavin,

Like Michael said, I don't believe there are changes in RabbitMQ, Erlang or rabbitmqctl that can explain what you are seeing. Your configuration files look fine.

Are you using exactly the same firewall rules from your previous OpenStack environment? Can you export the firewall rules from all environments so I can try to reproduce this issue myself?

Thanks -
Luke

dfed...@pivotal.io

unread,
Jan 26, 2018, 10:36:16 AM1/26/18
to rabbitmq-users
Hi,

Can you run `rabbitmqctl list_unresponsive_queues`?
You may actually have some unresponsive queues.

Michael Klishin

unread,
Jan 26, 2018, 10:59:21 AM1/26/18
to rabbitmq-users
I thought we've established that the issue goes away when some firewall rules are removed.

So while it's a useful data point to collect, I'd say we only can make progress on this if we get `epmd -names` data
from all nodes, some info on what the firewall rules look like and perhaps a traffic capture.

Gavin Williams

unread,
Jan 26, 2018, 11:04:56 AM1/26/18
to rabbitmq-users
Luke

Yeh, the Openstack rules are consistent as the infra is being built via Terraform. 

The security group rules look like:
(openstack) security group rule list core_secgroup_1a_mqb --long
+--------------------------------------+-------------+-----------+-------------+-----------+-----------+--------------------------------------+
| ID                                   | IP Protocol | IP Range  | Port Range  | Direction | Ethertype | Remote Security Group                |
+--------------------------------------+-------------+-----------+-------------+-----------+-----------+--------------------------------------+
| bf9f197a-03f1-4366-b11b-e8e9cf4e3acc | tcp         | 0.0.0.0/0 | 5672:5672   | ingress   | IPv4      | None                                 |
| 8addffb9-16d1-4b89-8732-4f86584d0402 | tcp         | None      | 4369:4369   | ingress   | IPv4      | 25368a8c-24df-4fe9-ad31-93dfe43f8ab4 |
| ab299e80-87f1-4ef1-9b16-793256884a2b | tcp         | None      | 25672:25672 | ingress   | IPv4      | 25368a8c-24df-4fe9-ad31-93dfe43f8ab4 |
| f6967dbb-16bd-4aa9-b6b3-f51014c7edbd | tcp         | 0.0.0.0/0 | 80:80       | ingress   | IPv4      | None                                 |
| 9fda5571-d855-4170-89b2-f921942a3cd5 | tcp         | None      | 4369:4369   | egress    | IPv4      | 25368a8c-24df-4fe9-ad31-93dfe43f8ab4 |
| d5a9fa69-ecad-4a53-bf15-25b6f7fe41e6 | tcp         | None      | 25672:25672 | egress    | IPv4      | 25368a8c-24df-4fe9-ad31-93dfe43f8ab4 |
+--------------------------------------+-------------+-----------+-------------+-----------+-----------+--------------------------------------+


'25368a8c-24df-4fe9-ad31-93dfe43f8ab4' is the UUID of the group, so those rules are effectively allowing traffic to/from the same group. 

HTH

Gav

Luke Bakken

unread,
Jan 26, 2018, 11:28:06 AM1/26/18
to rabbitmq-users
Hi Gavin,

Thanks for providing your firewall rules. I did a packet trace on my workstation for port 4369 and have attached the file here. epmd always returns 25672 for RabbitMQ (3.7.2), so there must be something "up" in your environment.

You can check to see if the kernel.inet_dist_listen_min/max settings are actually being picked up by running these commands (of course, this may only work if you run rabbitmqctl on the same node as RabbitMQ):

rabbitmqctl eval 'application:get_env(kernel, inet_dist_listen_min).'
rabbitmqctl eval 'application:get_env(kernel, inet_dist_listen_max).'

Please run this on all nodes in your cluster, as well as epmd -names on all nodes. I know you provided output from that command but it's unclear if you ran it on every node.

Thanks -
Luke

On Wednesday, January 24, 2018 at 4:30:14 AM UTC-8, Gavin Williams wrote:
epmd-4369-any.pcapng

Gavin Williams

unread,
Jan 26, 2018, 11:40:55 AM1/26/18
to rabbitmq-users
Luke

Did that packet capture include running a 'rabbitmqctl list_queues'? 

Checking the eval on every host returns '25672':
[gavin.williams@mqb01:~] $ sudo rabbitmqctl eval 'application:get_env(kernel, inet_dist_listen_min).';sudo rabbitmqctl eval 'applica
ti:get_env(kernel, inet_dist_listen_max).'
{ok,25672}
{ok,25672}
---
[gavin.williams@mqb02:~] $ sudo rabbitmqctl eval 'application:get_env(kernel, inet_dist_listen_min).';sudo rabbitmqctl eval 'applica
ti:get_env(kernel, inet_dist_listen_max).'
{ok,25672}
{ok,25672}
---
[gavin.williams@mqb03:~] $ sudo rabbitmqctl eval 'application:get_env(kernel, inet_dist_listen_min).';sudo rabbitmqctl eval 'applica
ti:get_env(kernel, inet_dist_listen_max).'
{ok,25672}
{ok,25672}

'epmd -names' returns the following:
[gavin.williams@mqb01:~] 1 $ sudo epmd -names
epmd: up and running on port 4369 with data:
name rabbit at port 25672
---
[gavin.williams@mqb02:~] $ sudo epmd -names
epmd: up and running on port 4369 with data:
name rabbit at port 25672
---
[gavin.williams@mqb03:~] $ sudo epmd -names
epmd: up and running on port 4369 with data:
name rabbit at port 25672

Cheers
Gavin

Luke Bakken

unread,
Jan 26, 2018, 11:44:26 AM1/26/18
to rabbitmq-users
Hi Gavin -

Yep, that's the command I ran. I was only monitoring the epmd port in Wireshark, just to confirm what the packet flow looks like. I really have no idea at this point why epmd would return a different port in your environment as shown in your packet capture. I suppose the next thing I would try is to shut down your RMQ cluster and kill all epmd processes, restart everything, and do another trace to see if epmd returns a port other than 25672

Luke

Michael Klishin

unread,
Jan 26, 2018, 11:44:43 AM1/26/18
to rabbitm...@googlegroups.com
The data that all nodes and epmd report is consistent. Are you sure that the epmd port (see https://rabbitmq.com/networking.html) wasn’t firewalled off on any of the nodes?

Gavin Williams

unread,
Jan 26, 2018, 12:15:04 PM1/26/18
to rabbitmq-users
Luke

OK, so I did a complete stop on all 3 nodes, made sure all the processes were gone and then started up again, and still the same issue... 

One thing I've just spotted when comparing a trace I've just captured with yours - My 'EPMD_PORT2_REQ' have a different node-name to yours. 
Mine seem to be some kind of auto-generated 'rabbitmqcli + digits', whereas yours are requesting 'rabbit'. 

Which I'm guessing is why the 'EPMD_PORT2_RESP' is random, as it's not matching 'rabbitmqcliXXX' with 'rabbitmq' as returned by 'epmd -names'. 

So any ideas why my 'rabbitmqcli' might be requesting a random EPMD name?  

Cheers
Gavin

Gavin Williams

unread,
Jan 26, 2018, 12:19:18 PM1/26/18
to rabbitmq-users
Packet capture attached for completeness... 

Gav
epmd.pcap

Luke Bakken

unread,
Jan 26, 2018, 1:28:59 PM1/26/18
to rabbitmq-users
Gavin -

Good catch noticing that difference. That request would be if rabbitmqctl is looking up its own name. When that command runs, it gives itself a "random" node name.

Looks like I need to do some tracing with an actual cluster next with some queues running on it.

Thanks!
Luke

On Friday, January 26, 2018 at 9:15:04 AM UTC-8, Gavin Williams wrote:
Luke

Luke Bakken

unread,
Jan 26, 2018, 2:15:44 PM1/26/18
to rabbitmq-users
Hi Gavin,

I set up 3-node clusters using 3.6.14 and 3.7.2. Sure enough, rabbitmqctl in 3.7.2 contacts epmd to "look itself up" where it didn't used to in 3.6.14.

I have created the following issue if you would like to watch it for resolution: https://github.com/rabbitmq/rabbitmq-cli/issues/237

Thanks a lot for providing all of the requested information!
Luke

Gavin Williams

unread,
Jan 26, 2018, 2:49:35 PM1/26/18
to rabbitmq-users
Luke

Cheers for your persistence, and confirming that it is a behaviour change between 3.6 and 3.7 :) 

I'll keep an eye on the Github issue. 

Regards
Gavin 

Luke Bakken

unread,
Jan 26, 2018, 5:21:28 PM1/26/18
to rabbitmq-users
Hi Gavin,

The behavior we're seeing is due to the code in these places:


Basically, they are changes to allow streaming result sets back to rabbitmqctl for efficiency's sake. That can't be changed, but what I can do is specify a single dist port for rabbitmqctl. I did that and specified port 55672 for the attached rabbitmqctl escript.

You should be able to copy this file to one of your nodes, mark it as executable, and run it (as long as escript is in your PATH). First, be sure to open port 55672 to that node, of course. If you do a packet trace, you will see EPMD_PORT2_REQ for rabbitmqctl that will return 55672, and your command should succeed.

Let me know how that works -
Luke
rabbitmqctl

Gavin Williams

unread,
Jan 26, 2018, 5:28:18 PM1/26/18
to rabbitm...@googlegroups.com
Luke

Cheers for the quick turn-around. 

I’ll give the patched rabbitmqctl a go early next week when I’m back in the office, and let you know how it goes. 

Cheers
Gavin

--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/OiiVP_RjP5A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.

To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<rabbitmqctl>

Michael Klishin

unread,
Jan 26, 2018, 11:15:05 PM1/26/18
to rabbitm...@googlegroups.com
There will be a 3.7.3-rc.3 shortly after it passes our pipeline and other types of QA.

Gavin

To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
<rabbitmqctl>

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Jan 29, 2018, 9:04:08 AM1/29/18
to rabbitm...@googlegroups.com
We had to revert our initial attempt which used a single port:
https://github.com/rabbitmq/rabbitmq-server/pull/1487.

A single port prevents two instances of a CLI tool (or more than one tool) from running in parallel,
which is unacceptable.

A more accomodating solution would be to let the user configure a range of ports that CLI tools will
be allowed to use, much like we do for RabbitMQ nodes themselves.

We won't be delaying 3.7.3 because of this as the issue turned out to be more involved
than we originally thought and it doesn't affect a significant enough % of users.

Gavin


To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
<rabbitmqctl>

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,
Jan 30, 2018, 9:12:15 AM1/30/18
to rabbitm...@googlegroups.com
Gavin,

With RabbitMQ 3.7.3 [1] released, there's a workaround for this:
it is possible to limit client port range for CLI tools using the `RABBITMQ_CTL_ERL_ARGS`
env variable the same way [2] does it for a future version.

RABBITMQ_CTL_ERL_ARGS="-kernel inet_dist_listen_min 35672 -kernel inet_dist_listen_max 35682" (will use a range
of 10 ports).

Note that using a single port is possible but you won't be able to run more than one CLI tool process concurrently
on a single host and for parallel list_* operations that means parallel connections will fail.
So I'd recommend a range of 10 or so to be on the safe side.

Note that if [2] gets merged for 3.7.4, there would be two sets of settings provided
so to avoid surprises I'd use the new env variables in [2] after upgrading from 3.7.3.

fatmcgav

unread,
Jan 30, 2018, 9:48:06 AM1/30/18
to rabbitm...@googlegroups.com
Michael 

Cheers for the update. 

I'll give 3.7.3 a go and report back. 

Regards 
Gavin 

To post to this group, send email to rabbitmq-users@googlegroups.com.

Gavin Williams

unread,
Feb 19, 2018, 11:06:34 AM2/19/18
to rabbitmq-users
Michael

Sorry for the delay, just got round to testing 3.7.3, and can confirm that with the addition of the 'RABBITMQ_CTL_ERL_ARGS' env variable, I'm getting a consistent set of ports returned by EPMD :) 

Cheers
Gavin 
Gavin 

Gavin


To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<rabbitmqctl>

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ



--
MK

Staff Software Engineer, Pivotal/RabbitMQ



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,
Feb 19, 2018, 3:48:37 PM2/19/18
to rabbitm...@googlegroups.com
Woohoo! Thank you for reporting back!

Gavin 
Gavin 

Gavin


To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
<rabbitmqctl>

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ



--
MK

Staff Software Engineer, Pivotal/RabbitMQ



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/OiiVP_RjP5A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages