MQTT clients not able to connect

1,997 views
Skip to first unread message

Adam Schaible

unread,
Dec 11, 2019, 2:33:55 PM12/11/19
to rabbitmq-users
Hello,

I have a 2-node RabbitMQ cluster with the MQTT plugin installed on one of them.  Previously the MQTT plugin was installed on both, but I've removed it from one (not the one that is generating the logs below) to attempt to debug the issue I'm experiencing.

The error seems to be around mqtt_node logging that the leader is not known, then a connection timeout.  Can anyone provide guidance based upon the logs below?

Thanks in advance!

2019-12-11 11:31:16.192 [debug] <0.2169.0> Supervisor {<0.2169.0>,rabbit_mqtt_connection_sup} started rabbit_mqtt_connection_sup:start_keepalive_link() at pid <0.2170.0>
2019-12-11 11:31:16.192 [debug] <0.2169.0> Supervisor {<0.2169.0>,rabbit_mqtt_connection_sup} started rabbit_mqtt_reader:start_link(<0.2170.0>, {acceptor,{0,0,0,0},1884}) at pid <0.2171.0>
2019-12-11 11:31:16.192 [debug] <0.2171.0> MQTT accepting TCP connection <0.2171.0> (69.61.195.3:64905 -> 10.0.69.6:1884)
2019-12-11 11:31:16.192 [debug] <0.2171.0> Received a CONNECT, client ID: "mqtt-test" (expanded to "mqtt-test"), username: "mqtt-test", clean session: true, protocol version: 3, keepalive: 60
2019-12-11 11:31:16.192 [info] <0.2171.0> MQTT vhost picked using plugin configuration or default
2019-12-11 11:31:16.192 [debug] <0.2172.0> Supervisor {<0.2172.0>,amqp_connection_sup} started amqp_connection_type_sup:start_link() at pid <0.2173.0>
2019-12-11 11:31:16.192 [debug] <0.2172.0> Supervisor {<0.2172.0>,amqp_connection_sup} started amqp_gen_connection:start_link(<0.2173.0>, {amqp_params_direct,<<"mqtt-test">>,<<"rySv0czT6WKak2zfAYgea0Vobmr6j8T8OOpFVldrIOMqZXLls/ZP5MyxMmn...">>,...}) at pid <0.2174.0>
2019-12-11 11:31:16.207 [debug] <0.2174.0> User 'mqtt-test' authenticated successfully by backend rabbit_auth_backend_internal
2019-12-11 11:31:16.207 [debug] <0.2173.0> Supervisor {<0.2173.0>,amqp_connection_type_sup} started amqp_channel_sup_sup:start_link(direct, <0.2174.0>, <<"69.61.195.3:64905 -> 10.0.69.6:1884">>) at pid <0.2175.0>
2019-12-11 11:31:16.207 [info] <0.734.0> mqtt_node: leader call - leader not known. Command will be forwarded once leader is known.
2019-12-11 11:31:16.207 [debug] <0.2173.0> Supervisor {<0.2173.0>,amqp_connection_type_sup} started amqp_channels_manager:start_link(<0.2174.0>, <<"69.61.195.3:64905 -> 10.0.69.6:1884">>, <0.2175.0>) at pid <0.2176.0>
2019-12-11 11:31:16.207 [debug] <0.2173.0> Supervisor {<0.2173.0>,amqp_connection_type_sup} started rabbit_queue_collector:start_link(<<"69.61.195.3:64905 -> 10.0.69.6:1884">>) at pid <0.2177.0>
2019-12-11 11:31:21.224 [error] <0.2171.0> MQTT cannot accept a connection: client ID registration timed out
2019-12-11 11:31:21.224 [error] <0.2171.0> MQTT cannot accept connection 69.61.195.3:64905 -> 10.0.69.6:1884 due to an internal error or unavailable component


Luke Bakken

unread,
Dec 11, 2019, 2:46:23 PM12/11/19
to rabbitmq-users
Hi Adam,

It's really important to tell us what version of RabbitMQ and Erlang you're using, and on what operating system. The MQTT client library and version you're using would also be good to know.

Based on the issue, I can guess that you are using RabbitMQ 3.8.X. This version includes cluster-aware MQTT client ID tracking (https://github.com/rabbitmq/rabbitmq-mqtt/pull/195).

The implementation of this client ID tracking requires that a quorum of nodes be available in your cluster. Since you have only two nodes, no quorum is possible.

The correct fix is to use at least three nodes, and ensure that the MQTT plugin is running on all of them.

Thanks,
Luke

Adam Schaible

unread,
Dec 11, 2019, 2:54:42 PM12/11/19
to rabbitmq-users
Hi Luke,

Wow, blazing fast response - I really appreciate it.  Yes I am using 3.8.x and the latest Erlang (fresh install) but yes, seems like the issue is with quorum - I wasn't aware of that feature, thanks for the heads up.

Adam Schaible

unread,
Dec 12, 2019, 8:45:59 AM12/12/19
to rabbitmq-users
Hi Luke,

I setup another server and added it to the cluster.  Now there are a total of 3 servers clustered running RMQ 3.8.1 and Erlang 22.1.  This is a new environment so I'm using desktop based MQTT clients right now, specifically MQTT Explorer and MQTT .fx.

Getting the same errors - that the leader call times out.  Do you know what ports these need?  This is a fairly locked down environment.  The cluster process works properly so I know those ports are open, but I based the networking request on the clustering guide - perhaps I need to open more ports?

Thanks,
-Adam


On Wednesday, December 11, 2019 at 2:46:23 PM UTC-5, Luke Bakken wrote:

Luke Bakken

unread,
Dec 12, 2019, 9:50:03 AM12/12/19
to rabbitmq-users
Hi Adam,

Could you please attach the log files from your servers? A one-sentence description of the error isn't nearly as useful as what is in the logs.

Can you declare a new quorum queue in this cluster?

Thanks,
Luke
Message has been deleted

Adam Schaible

unread,
Dec 12, 2019, 10:00:05 AM12/12/19
to rabbitmq-users
My apologies Luke, the logs were essentially the same as they were before.

I was able to add a quorum queue in the management interface.

I have tried with the MQTT plugin installed on just the node that receives the MQTT messages, and also on all nodes and experienced the same behavior.  Is there a recommendation on if it should be installed on all nodes or not?

Thanks!

2019-12-12 06:58:13.576 [debug] <0.7315.0> Supervisor {<0.7315.0>,rabbit_mqtt_connection_sup} started rabbit_mqtt_connection_sup:start_keepalive_link() at pid <0.7316.0>
2019-12-12 06:58:13.576 [debug] <0.7315.0> Supervisor {<0.7315.0>,rabbit_mqtt_connection_sup} started rabbit_mqtt_reader:start_link(<0.7316.0>, {acceptor,{0,0,0,0},1884}) at pid <0.7317.0>
2019-12-12 06:58:13.576 [debug] <0.7317.0> MQTT accepting TCP connection <0.7317.0> (69.61.195.3:60758 -> 10.0.69.6:1884)
2019-12-12 06:58:13.576 [debug] <0.7317.0> Received a CONNECT, client ID: "mqtt-test" (expanded to "mqtt-test"), username: "mqtt-test", clean session: true, protocol version: 3, keepalive: 60
2019-12-12 06:58:13.576 [info] <0.7317.0> MQTT vhost picked using plugin configuration or default
2019-12-12 06:58:13.591 [debug] <0.7318.0> Supervisor {<0.7318.0>,amqp_connection_sup} started amqp_connection_type_sup:start_link() at pid <0.7319.0>
2019-12-12 06:58:13.591 [debug] <0.7318.0> Supervisor {<0.7318.0>,amqp_connection_sup} started amqp_gen_connection:start_link(<0.7319.0>, {amqp_params_direct,<<"mqtt-test">>,<<"njlU7krudlh9pzVcMgDrDMtT4ZFVqfbO4MX0/IUOTQXXc+6KzBlKWuiX+wb...">>,...}) at pid <0.7320.0>
2019-12-12 06:58:13.607 [debug] <0.7320.0> User 'mqtt-test' authenticated successfully by backend rabbit_auth_backend_internal
2019-12-12 06:58:13.607 [debug] <0.7319.0> Supervisor {<0.7319.0>,amqp_connection_type_sup} started amqp_channel_sup_sup:start_link(direct, <0.7320.0>, <<"69.61.195.3:60758 -> 10.0.69.6:1884">>) at pid <0.7321.0>
2019-12-12 06:58:13.607 [debug] <0.7319.0> Supervisor {<0.7319.0>,amqp_connection_type_sup} started amqp_channels_manager:start_link(<0.7320.0>, <<"69.61.195.3:60758 -> 10.0.69.6:1884">>, <0.7321.0>) at pid <0.7322.0>
2019-12-12 06:58:13.607 [info] <0.789.0> mqtt_node: leader call - leader not known. Command will be forwarded once leader is known.
2019-12-12 06:58:13.607 [debug] <0.7319.0> Supervisor {<0.7319.0>,amqp_connection_type_sup} started rabbit_queue_collector:start_link(<<"69.61.195.3:60758 -> 10.0.69.6:1884">>) at pid <0.7323.0>
2019-12-12 06:58:18.622 [error] <0.7317.0> MQTT cannot accept a connection: client ID registration timed out
2019-12-12 06:58:18.622 [error] <0.7317.0> MQTT cannot accept connection 69.61.195.3:60758 -> 10.0.69.6:1884 due to an internal error or unavailable component

Previously in the logs, I noticed these messages when I created the quorum queue:

2019-12-12 06:53:58.400 [info] <0.6354.0> queue 'TestQuorum' in vhost '/': leader call - leader not known. Command will be forwarded once leader is known.
2019-12-12 06:53:58.411 [info] <0.6354.0> queue 'TestQuorum' in vhost '/': granting vote for {'%2F_TestQuorum','rabbit@ABCDEFG'} with last indexterm {0,0} for term 1 previous term was 0
2019-12-12 06:53:58.415 [info] <0.6354.0> queue 'TestQuorum' in vhost '/': detected a new leader {'%2F_TestQuorum','rabbit@ABCDEFG'} in term 1




Luke Bakken

unread,
Dec 12, 2019, 10:11:28 AM12/12/19
to rabbitmq-users
Hi Adam,

Really, I do need to see the entire log file. I see that you have configured debug logging, which is great. I need to see the entire files because there should be information logged at startup time with regard to the part of RabbitMQ that tracks MQTT client ids.

Please compress the logs and attach them. If they contain sensitive information, use the "reply privately to author" function in the google groups web UI.

Thanks,
Luke
Message has been deleted

Adam Schaible

unread,
Dec 12, 2019, 10:26:23 AM12/12/19
to rabbitmq-users
Hi Luke,

Apologies for the deleted message.  I was configuring debug logging via rabbitmqctl and when restarting the service, I miss most of the debug logs.  Going to set it via config and then re-post the logs.

Thanks,
-Adam

Diana Corbacho

unread,
Dec 16, 2019, 10:35:33 AM12/16/19
to rabbitmq-users
Hello Adam,

Your initial issue, when the mqtt plugin was removed from one of the two nodes, it's likely due to the node not have been decomissioned from the `ra` cluster. You need to run `rabbitmqctl decommission_mqtt_node <node> ` before disabling the plugin on that node. I'm going to document that on the mqtt page on our website.

As Luke said, we need the full logs and also to know how you have enabled/disabled the mqtt plugin and added those nodes to the cluster.

Thanks,
Diana

Adam Schaible

unread,
Dec 16, 2019, 11:20:28 AM12/16/19
to rabbitmq-users
Hi Diana,

I was able to get with Luke on Slack and he really helped me out.  Good to know about that command!  Much appreciated.

Thanks,
-Adam

Denis Boulas

unread,
Feb 12, 2020, 4:09:49 PM2/12/20
to rabbitmq-users
Would be nice to have this resolution documented here... We're having existing cluster of RMQ in K8S. It was upgraded to recent version a month ago (probably from version 3.7). Now we're trying to configure MQTT here - and having the same issue as described above...
Interesting fact about this is that we've configured MQTT on our devel cluster with RMQ 3.7 and updated it just fine - MQTT works w/o any problems here. Configurations in dev and prod environments are equal...
I have checked everything but unable to find anything :(

Luke Bakken

unread,
Feb 12, 2020, 4:21:05 PM2/12/20
to rabbitmq-users
Hi Denis,

What version of RabbitMQ are you using?

Somehow Adam's cluster got into a state where there was no Raft leader for the MQTT client id tracking. You can see if you have the same issue by running this command on every node in your cluster:

rabbitmqctl eval "ra:overview()."

Under servers, you will see mqtt_node. If every node is in state follower, you need to manually trigger an election:

rabbitmqctl eval "ra:trigger_election('mqtt_node')."

You can run that command on any node in your cluster.

Please let me know if this is your issue and if the above command fixes it.

Thanks,
Luke

Denis Boulas

unread,
Feb 12, 2020, 10:27:49 PM2/12/20
to rabbitm...@googlegroups.com
Luke, you're just awesome! That was exactly my case.
That definitely should be documented somewhere, I've parsed whole google w/o any luck.

Since that thread was one of google results, others now at least have chance

Thank you so much!

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/3f6a42df-030b-49f0-ad0a-557b9538789d%40googlegroups.com.


--
Best regards,
Denis Boulas

Wesley Peng

unread,
Feb 12, 2020, 10:45:03 PM2/12/20
to rabbitm...@googlegroups.com
You should know Erlang expression before using the "eval" command.

regards.

Luke Bakken

unread,
Feb 13, 2020, 10:41:57 AM2/13/20
to rabbitmq-users
Hi Denis,

We're trying to find the root cause of this issue. I'm glad it's fixed for you.

I'm guessing that you don't know what sequence of events lead to your cluster being in this state? It seems to happen when most of the cluster nodes are shut down or suddenly stopped, repeatedly.

Luke
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.

Denis Boulas

unread,
Feb 13, 2020, 5:28:12 PM2/13/20
to rabbitm...@googlegroups.com
Well, I would say that this cluster was running just fine for a few years (on Kubernetes). We've recently updated to 3.8.x, after some time we've enabled MQTT and that basically the only thing that wasn't working here properly (regular durable queues were served just fine). Also I've scaled up cluster from 3 nodes to 5 (that was my attempt to deal with quorum and I've found out in documentation that by default quorum set to 5). Yesterday, I've applied your fix and everything seems started to work, but as we've figured out today - just partially... New nodes (4th and 5th) were not able to handle MQTT requests even though they were present at the moment I've forced leader election. I've purged these two nodes one by one (destroyed storage and reinitialized them), that made it fully functional.

Hope this helps in your investigation.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.


--
Best regards,
Denis Boulas

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/fe6086e1-a9b2-4b87-b1a2-9557dd902faf%40googlegroups.com.

Luke Bakken

unread,
Feb 13, 2020, 8:34:44 PM2/13/20
to rabbitmq-users
Hi Denis,

Thanks for the information - that might help us out.

For what it's worth, a 3 node cluster is fine. You just need to have an odd number of nodes so that in the event of a partition a clear majority (quorum) can be established.

Luke

Michael Klishin

unread,
Feb 24, 2020, 10:14:22 AM2/24/20
to rabbitmq-users
Those nodes likely formed a cluster of its own.

The following scenario is highly problematic for Raft and other consensus protocols when the list of nodes is not known ahead
of time (which is the case with RabbitMQ MQTT plugin which does not require the final expected set of nodes to be preconfigured in any way).

When a node boots, it checks for any possible MQTT client ID tracking cluster members and if there are none, starts a cluster of its own.
This is exactly the chicken-and-egg problem described in [1] but in a different place.

We have introduced a "delay and check" before proceeding kind of step to the plugin for 3.8.3. It will help somewhat but
unfortunately not a guarantee that a set of nodes booting in parallel for the first time will form a single cluster. It's a "consensus
before consensus" kind of scenario. Note that Raft itself face the same issue and solves it using a randomized delay
(also used and covered in [2]) but the difference is that the Raft design/paper assumes the list of nodes is known
ahead of time.




--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Jimmy Carter

unread,
Apr 5, 2021, 9:15:19 AM4/5/21
to rabbitmq-users
Very urgent
Can you please let me know how to make it automatic so that we dont need to trigger the command for selection
Reply all
Reply to author
Forward
0 new messages