RabbitMQ MQTT Raft Errors : "State machine mqtt_node terminating"

426 views
Skip to first unread message

abhayverma

unread,
Jul 24, 2020, 9:39:47 AM7/24/20
to rabbitmq-users
Hi Team,

OS - Ubuntu 18.04
Erlang - 23.0.2
RabbitMQ - 3.8.5

While performing some perf test on a single node RabbitMQ server, we see certain errors related to raft leader election and mqtt. Please provide some insights how these events affect the RabbitMQ service.


Thanks,
Abhay Verma

mqqt-raft.log

Karl Nilsson

unread,
Jul 24, 2020, 10:05:15 AM7/24/20
to rabbitm...@googlegroups.com
Hi,

This bug has been fixed and will be shipped in the next release (3.8.6)


Cheers
Karl

From: rabbitm...@googlegroups.com <rabbitm...@googlegroups.com> on behalf of abhayverma <abhayve...@gmail.com>
Sent: 24 July 2020 2:39 PM
To: rabbitmq-users <rabbitm...@googlegroups.com>
Subject: [rabbitmq-users] RabbitMQ MQTT Raft Errors : "State machine mqtt_node terminating"
 
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/2830f0e0-51df-4995-a71e-df40121b77b5n%40googlegroups.com.

abhayverma

unread,
Jul 24, 2020, 10:18:42 AM7/24/20
to rabbitmq-users
Thanks Karl.

If you can also tell how adversely this bug affect the service. Will the same client (that is making repeated connections) won't be able to make any connection at all ? Will restarting RabbitMQ service help in such cases ?
Also, new clients can connect successfully irrespective when this bug is hit in running service?

Karl Nilsson

unread,
Jul 24, 2020, 10:23:24 AM7/24/20
to rabbitm...@googlegroups.com
It is possible that this bug makes the connection tracking non-functional. You could try disabling and re-enabling the mqtt plugin on all nodes and see if that helps.

Sent: 24 July 2020 3:18 PM
To: rabbitmq-users <rabbitm...@googlegroups.com>
Subject: Re: [rabbitmq-users] RabbitMQ MQTT Raft Errors : "State machine mqtt_node terminating"
 

abhayverma

unread,
Jul 24, 2020, 10:38:18 AM7/24/20
to rabbitmq-users
I assume this will happen in scenarios when the connection churn is too high that mqtt cannot accept a new connection and thus client initiates a reconnect loop. Is my understanding correct here ?

Also, disabling mqtt plugin will shutdown its connector and as soon as it is enabled, the connection churn rate will again be very high from clients to re-establish connections.

abhayverma

unread,
Jul 24, 2020, 10:49:38 AM7/24/20
to rabbitmq-users
Hi Karl,
Please find the output of disabling and enabling plugin in running cluster. The plugin can't be enabled back.

I have no name!@node-0:/$ rabbitmq-plugins disable rabbitmq_mqtt ; rabbitmq-plugins enable rabbitmq_mqtt
Disabling plugins on node rabbit@node_name:
rabbitmq_mqtt
The following plugins have been configured:
  rabbitmq_management
  rabbitmq_management_agent
  rabbitmq_peer_discovery_common
  rabbitmq_peer_discovery_k8s
  rabbitmq_prometheus
  rabbitmq_web_dispatch
Applying plugin configuration to rabbit@node_name...
The following plugins have been disabled:
  rabbitmq_mqtt

stopped 1 plugins.
Enabling plugins on node rabbit@node_name:
rabbitmq_mqtt
The following plugins have been configured:
  rabbitmq_management
  rabbitmq_management_agent
  rabbitmq_mqtt
  rabbitmq_peer_discovery_common
  rabbitmq_peer_discovery_k8s
  rabbitmq_prometheus
  rabbitmq_web_dispatch
Applying plugin configuration to rabbit@node_name...
Stack trace:

** (CaseClauseError) no case clause matching: {:could_not_start, :rabbitmq_mqtt, {:rabbitmq_mqtt, {:bad_return, {{:rabbit_mqtt, :start, [:normal, []]}, {:EXIT, {:timeout, {:gen_statem, :call, [{:mqtt_node, :"rabbit@node_name"}, :trigger_election, 5000]}}}}}}}
{:case_clause, {:could_not_start, :rabbitmq_mqtt, {:rabbitmq_mqtt, {:bad_return, {{:rabbit_mqtt, :start, [:normal, []]}, {:EXIT, {:timeout, {:gen_statem, :call, [{:mqtt_node, :"rabbit@node_name"}, :trigger_election, 5000]}}}}}}}}

Karl Nilsson

unread,
Jul 24, 2020, 10:52:00 AM7/24/20
to rabbitm...@googlegroups.com
can you provide the logs covering your enable / disable attempt?

Did you try stopping and starting the broker?

How many nodes do you have?



Sent: 24 July 2020 3:49 PM

abhayverma

unread,
Jul 24, 2020, 1:00:27 PM7/24/20
to rabbitmq-users
Unfortunately, I didn't capture the logs while disabling/enabling plugin. I am running a single node of RabbitMQ.
Stop/Start of broker will work and this was established but it stuck at some point when this bug is hit. Though new connections from other clients do get establish. 


"I assume this will happen in scenarios when the connection churn is too high that mqtt cannot accept a new connection and thus client initiates a reconnect loop. Is my understanding correct here ?"


abhayverma

unread,
Jul 28, 2020, 5:17:42 AM7/28/20
to rabbitmq-users
Hi Karl,
Did you get a chance to check this ?

Karl Nilsson

unread,
Jul 28, 2020, 5:39:55 AM7/28/20
to rabbitm...@googlegroups.com
Hi,

I think we need to delete the raft machine and re-create it.

Can you try this rabbitmqctl command then restart your server?

rabbitmqctl eval "ra:force_delete_server({mqtt_node, node()})."

Sent: 28 July 2020 10:17 AM

abhayverma

unread,
Jul 28, 2020, 7:50:57 AM7/28/20
to rabbitmq-users
Restart actually solves this issue until this bug re-occurs.
I was trying to understand the circumstances/situations that can cause this bug. Nevertheless 3.8.6-rc is out, will wait for its stable version to run these tests again.

Karl Nilsson

unread,
Jul 28, 2020, 8:36:22 AM7/28/20
to rabbitm...@googlegroups.com
ok I see, yes any time you make a duplicate client id request whilst it is still active should trigger this error.

Sent: 28 July 2020 12:50 PM

Luke Bakken

unread,
Jul 30, 2020, 7:10:28 PM7/30/20
to rabbitmq-users
Hello,

If you have the time to test with this RC it would help us out a lot - https://github.com/rabbitmq/rabbitmq-server/releases/tag/v3.8.6-rc.1

Thank you,
Luke

ivan.e...@gmail.com

unread,
Aug 2, 2020, 7:27:45 AM8/2/20
to rabbitmq-users

It looks like in our case we might have the same situation. I mean that the MQTT node believes that the client is still there, but the client somehow recognizes the connection as broken, and re-connects without closing the previous connection gracefully. Some time ago I described our case here: https://groups.google.com/g/rabbitmq-users/c/z-jTwSHS-10/m/WacLtMLrBAAJ

Since then I applied the new version of MQTT plugin (provided by Michael Klishin) to all nodes and replaced the Windows node with the Linux node. Now everything works without constantly producing crashes in logs. Anyway, we still can see some anomalies:

All nodes might work well without any noticeable issues for a while but at some point in time, one of nodes starting go crazy - the memory consumption fast and constantly grows. According to "rabbitmqctl status" most of the memory consumed by connection_other - up to 1.5GiB (it takes ~ 0.36GiB on other two nodes) or quorum_ets - up to 2.2GiB (it takes ~ 0.21GiB on other two nodes). It reaches highmark in two-three days, and we have to restart the node. Stopping and starting MQTT plugin does not free the memory, so we restart the rabbitmq-server. It helps, but unfortunately not much long.

The start of the node after the stop takes a significant amount of time (see logs below). We also see, that about 14GiB of HDD used by files in /var/lib/rabbitmq/mnesia/rabbit@vs20031001/quorum/rabbit@vs20031001/MQTT_ZZZYYYXXX . There are thousands of files with names like 000NNNNN.segment .

So I have two questions:

(1) shall we try 3.8.6.RC, if we already applied the version of MQTT plugin provided by Michael? Was "any time you make a duplicate client id request whilst it is still active should trigger this error" addressed there?

(2) We aren't happy with Gigs of disk space used by segment files. How can I get rid of them?


The logs I wrote about above (warnings and the gap between 11:31:52 and 11:33:10 looks strange for me):

---------------------------------------------

2020-08-02 11:31:42.035 [info] <0.285.0> 

 Starting RabbitMQ 3.8.5 on Erlang 23.0.2

 Copyright (c) 2007-2020 VMware, Inc. or its affiliates.

 Licensed under the MPL 1.1. Website: https://rabbitmq.com

2020-08-02 11:31:42.037 [info] <0.285.0> 

 node : rabbit@node1001

 home dir : /var/lib/rabbitmq

 config file(s) : /etc/rabbitmq/rabbitmq.conf

 cookie hash : xxxxxxxxxxx

 log(s) : /var/log/rabbitmq/rab...@node1001.log

                : /var/log/rabbitmq/rabbit@node1001_upgrade.log

 database dir : /var/lib/rabbitmq/mnesia/rabbit@node1001

2020-08-02 11:31:42.052 [info] <0.285.0> Running boot step pre_boot defined by app rabbit

2020-08-02 11:31:42.052 [info] <0.285.0> Running boot step rabbit_core_metrics defined by app rabbit

2020-08-02 11:31:42.052 [info] <0.285.0> Running boot step rabbit_alarm defined by app rabbit

2020-08-02 11:31:42.058 [info] <0.422.0> Memory high watermark set to 4804 MiB (5037500006 bytes) of 12010 MiB (12593750016 bytes) total

2020-08-02 11:31:42.062 [info] <0.424.0> Enabling free disk space monitoring

2020-08-02 11:31:42.062 [info] <0.424.0> Disk free limit set to 50MB

2020-08-02 11:31:42.068 [info] <0.285.0> Running boot step code_server_cache defined by app rabbit

2020-08-02 11:31:42.068 [info] <0.285.0> Running boot step file_handle_cache defined by app rabbit

2020-08-02 11:31:42.068 [info] <0.427.0> Limiting to approx 32671 file handles (29401 sockets)

2020-08-02 11:31:42.068 [info] <0.428.0> FHC read buffering: OFF

2020-08-02 11:31:42.068 [info] <0.428.0> FHC write buffering: ON

2020-08-02 11:31:42.069 [info] <0.285.0> Running boot step worker_pool defined by app rabbit

2020-08-02 11:31:42.069 [info] <0.417.0> Will use 4 processes for default worker pool

2020-08-02 11:31:42.069 [info] <0.417.0> Starting worker pool 'worker_pool' with 4 processes in it

2020-08-02 11:31:42.069 [info] <0.285.0> Running boot step database defined by app rabbit

2020-08-02 11:31:42.262 [info] <0.285.0> Waiting for Mnesia tables for 30000 ms, 9 retries left

2020-08-02 11:31:42.486 [info] <0.285.0> Successfully synced tables from a peer

2020-08-02 11:31:42.487 [info] <0.285.0> Waiting for Mnesia tables for 30000 ms, 9 retries left

2020-08-02 11:31:42.487 [info] <0.285.0> Successfully synced tables from a peer

2020-08-02 11:31:42.487 [warning] <0.285.0> Feature flags: the previous instance of this node must have failed to write the `feature_flags` file at `/var/lib/rabbitmq/mnesia/rabbit@node1001-feature_flags`:

2020-08-02 11:31:42.487 [warning] <0.285.0> Feature flags: - list of previously disabled feature flags now marked as such: [empty_basic_get_metric]

2020-08-02 11:31:42.498 [info] <0.285.0> Waiting for Mnesia tables for 30000 ms, 9 retries left

2020-08-02 11:31:42.498 [info] <0.285.0> Successfully synced tables from a peer

2020-08-02 11:31:42.577 [info] <0.285.0> Waiting for Mnesia tables for 30000 ms, 9 retries left

2020-08-02 11:31:42.577 [info] <0.285.0> Successfully synced tables from a peer

2020-08-02 11:31:42.577 [info] <0.285.0> Peer discovery backend rabbit_peer_discovery_classic_config does not support registration, skipping registration.

2020-08-02 11:31:42.577 [info] <0.285.0> Running boot step database_sync defined by app rabbit

2020-08-02 11:31:42.577 [info] <0.285.0> Running boot step feature_flags defined by app rabbit

2020-08-02 11:31:42.577 [info] <0.285.0> Running boot step codec_correctness_check defined by app rabbit

2020-08-02 11:31:42.577 [info] <0.285.0> Running boot step external_infrastructure defined by app rabbit

2020-08-02 11:31:42.577 [info] <0.285.0> Running boot step rabbit_registry defined by app rabbit

2020-08-02 11:31:42.578 [info] <0.285.0> Running boot step rabbit_auth_mechanism_cr_demo defined by app rabbit

2020-08-02 11:31:42.578 [info] <0.285.0> Running boot step rabbit_queue_location_random defined by app rabbit

2020-08-02 11:31:42.578 [info] <0.285.0> Running boot step rabbit_event defined by app rabbit

2020-08-02 11:31:42.578 [info] <0.285.0> Running boot step rabbit_auth_mechanism_amqplain defined by app rabbit

2020-08-02 11:31:42.578 [info] <0.285.0> Running boot step rabbit_auth_mechanism_plain defined by app rabbit

2020-08-02 11:31:42.578 [info] <0.285.0> Running boot step rabbit_exchange_type_direct defined by app rabbit

2020-08-02 11:31:42.578 [info] <0.285.0> Running boot step rabbit_exchange_type_fanout defined by app rabbit

2020-08-02 11:31:42.579 [info] <0.285.0> Running boot step rabbit_exchange_type_headers defined by app rabbit

2020-08-02 11:31:42.579 [info] <0.285.0> Running boot step rabbit_exchange_type_topic defined by app rabbit

2020-08-02 11:31:42.579 [info] <0.285.0> Running boot step rabbit_mirror_queue_mode_all defined by app rabbit

2020-08-02 11:31:42.579 [info] <0.285.0> Running boot step rabbit_mirror_queue_mode_exactly defined by app rabbit

2020-08-02 11:31:42.579 [info] <0.285.0> Running boot step rabbit_mirror_queue_mode_nodes defined by app rabbit

2020-08-02 11:31:42.579 [info] <0.285.0> Running boot step rabbit_priority_queue defined by app rabbit

2020-08-02 11:31:42.579 [info] <0.285.0> Priority queues enabled, real BQ is rabbit_variable_queue

2020-08-02 11:31:42.580 [info] <0.285.0> Running boot step rabbit_queue_location_client_local defined by app rabbit

2020-08-02 11:31:42.580 [info] <0.285.0> Running boot step rabbit_queue_location_min_masters defined by app rabbit

2020-08-02 11:31:42.580 [info] <0.285.0> Running boot step kernel_ready defined by app rabbit

2020-08-02 11:31:42.580 [info] <0.285.0> Running boot step rabbit_sysmon_minder defined by app rabbit

2020-08-02 11:31:42.581 [info] <0.285.0> Running boot step rabbit_epmd_monitor defined by app rabbit

2020-08-02 11:31:42.583 [info] <0.527.0> epmd monitor knows us, inter-node communication (distribution) port: 25672

2020-08-02 11:31:42.583 [info] <0.285.0> Running boot step guid_generator defined by app rabbit

2020-08-02 11:31:42.602 [info] <0.285.0> Running boot step rabbit_node_monitor defined by app rabbit

2020-08-02 11:31:42.604 [info] <0.533.0> Starting rabbit_node_monitor

2020-08-02 11:31:42.604 [info] <0.285.0> Running boot step delegate_sup defined by app rabbit

2020-08-02 11:31:42.605 [info] <0.285.0> Running boot step rabbit_memory_monitor defined by app rabbit

2020-08-02 11:31:42.605 [info] <0.285.0> Running boot step core_initialized defined by app rabbit

2020-08-02 11:31:42.605 [info] <0.285.0> Running boot step upgrade_queues defined by app rabbit

2020-08-02 11:31:42.627 [info] <0.285.0> Running boot step rabbit_connection_tracking defined by app rabbit

2020-08-02 11:31:42.627 [info] <0.285.0> Running boot step rabbit_connection_tracking_handler defined by app rabbit

2020-08-02 11:31:42.627 [info] <0.285.0> Running boot step rabbit_exchange_parameters defined by app rabbit

2020-08-02 11:31:42.628 [info] <0.285.0> Running boot step rabbit_mirror_queue_misc defined by app rabbit

2020-08-02 11:31:42.628 [info] <0.285.0> Running boot step rabbit_policies defined by app rabbit

2020-08-02 11:31:42.630 [info] <0.285.0> Running boot step rabbit_policy defined by app rabbit

2020-08-02 11:31:42.630 [info] <0.285.0> Running boot step rabbit_queue_location_validator defined by app rabbit

2020-08-02 11:31:42.630 [info] <0.285.0> Running boot step rabbit_quorum_memory_manager defined by app rabbit

2020-08-02 11:31:42.630 [info] <0.285.0> Running boot step rabbit_vhost_limit defined by app rabbit

2020-08-02 11:31:42.630 [info] <0.285.0> Running boot step recovery defined by app rabbit

2020-08-02 11:31:49.918 [info] <0.3741.0> Making sure data directory '/var/lib/rabbitmq/mnesia/rabbit@node1001/msg_stores/vhosts/XXXYYYZZZ' for vhost '/' exists

2020-08-02 11:31:49.934 [info] <0.3741.0> Starting message stores for vhost '/'

2020-08-02 11:31:49.935 [info] <0.3745.0> Message store "XXXYYYZZZ/msg_store_transient": using rabbit_msg_store_ets_index to provide index

2020-08-02 11:31:49.941 [info] <0.3741.0> Started message store of type transient for vhost '/'

2020-08-02 11:31:49.941 [info] <0.3749.0> Message store "XXXYYYZZZ/msg_store_persistent": using rabbit_msg_store_ets_index to provide index

2020-08-02 11:31:49.945 [info] <0.3741.0> Started message store of type persistent for vhost '/'

2020-08-02 11:31:50.005 [info] <0.285.0> Running boot step definition_import_worker_pool defined by app rabbit

2020-08-02 11:31:50.005 [info] <0.417.0> Starting worker pool 'definition_import_pool' with 4 processes in it

2020-08-02 11:31:50.005 [info] <0.285.0> Running boot step load_core_definitions defined by app rabbit

2020-08-02 11:31:50.005 [info] <0.285.0> Running boot step empty_db_check defined by app rabbit

2020-08-02 11:31:50.005 [info] <0.285.0> Running boot step rabbit_looking_glass defined by app rabbit

2020-08-02 11:31:50.005 [info] <0.285.0> Running boot step rabbit_core_metrics_gc defined by app rabbit

2020-08-02 11:31:50.005 [info] <0.285.0> Running boot step background_gc defined by app rabbit

2020-08-02 11:31:50.006 [info] <0.285.0> Running boot step connection_tracking defined by app rabbit

2020-08-02 11:31:50.007 [info] <0.285.0> Setting up a table for connection tracking on this node: tracked_connection_on_node_rabbit@node1001

2020-08-02 11:31:50.009 [info] <0.285.0> Setting up a table for per-vhost connection counting on this node: tracked_connection_per_vhost_on_node_rabbit@node1001

2020-08-02 11:31:50.009 [info] <0.285.0> Running boot step routing_ready defined by app rabbit

2020-08-02 11:31:50.009 [info] <0.285.0> Running boot step pre_flight defined by app rabbit

2020-08-02 11:31:50.009 [info] <0.285.0> Running boot step notify_cluster defined by app rabbit

2020-08-02 11:31:50.009 [info] <0.285.0> Running boot step networking defined by app rabbit

2020-08-02 11:31:50.010 [info] <0.533.0> rabbit on node rabbit@node2801 up

2020-08-02 11:31:50.011 [info] <0.3797.0> started TCP listener on [::]:5672

2020-08-02 11:31:50.093 [info] <0.533.0> rabbit on node rabbit@node1002 up

2020-08-02 11:31:50.136 [info] <0.3815.0> started TLS (SSL) listener on [::]:5671

2020-08-02 11:31:50.144 [info] <0.285.0> Running boot step cluster_name defined by app rabbit

2020-08-02 11:31:50.144 [info] <0.285.0> Running boot step direct_client defined by app rabbit

2020-08-02 11:31:52.117 [info] <0.3817.0> Feature flag `drop_unroutable_metric`: mark as enabled=true

2020-08-02 11:31:52.149 [info] <0.3817.0> Feature flag `empty_basic_get_metric`: mark as enabled=true

2020-08-02 11:31:52.493 [info] <0.3817.0> Running boot step rabbit_mgmt_db_handler defined by app rabbitmq_management_agent

2020-08-02 11:31:52.493 [info] <0.3817.0> Management plugin: using rates mode 'basic'

2020-08-02 11:31:52.691 [info] <0.3817.0> Running boot step rabbit_mgmt_reset_handler defined by app rabbitmq_management

2020-08-02 11:31:52.691 [info] <0.3817.0> Running boot step rabbit_management_load_definitions defined by app rabbitmq_management

2020-08-02 11:31:52.736 [info] <0.3886.0> Management plugin: HTTPS listener started on port 15671

2020-08-02 11:31:52.736 [info] <0.3993.0> Statistics database started.

2020-08-02 11:31:52.736 [info] <0.3992.0> Starting worker pool 'management_worker_pool' with 3 processes in it

2020-08-02 11:33:10.584 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.585 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.590 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.590 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.595 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.660 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.688 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.702 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.718 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.720 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.723 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.738 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.743 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.756 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.758 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.761 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.767 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.785 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.786 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.826 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.

2020-08-02 11:33:10.837 [info] <0.4184.0> MQTT retained message store: rabbit_mqtt_retained_msg_store_dets

2020-08-02 11:33:10.841 [info] <0.4202.0> started MQTT TCP listener on [::]:1883

2020-08-02 11:33:10.843 [info] <0.4218.0> started MQTT TLS listener on [::]:8883

2020-08-02 11:33:10.875 [info] <0.3817.0> Running boot step rabbit_event_exchange_decorator defined by app rabbitmq_event_exchange

2020-08-02 11:33:10.876 [info] <0.3817.0> Running boot step rabbit_event_exchange defined by app rabbitmq_event_exchange

2020-08-02 11:33:10.880 [notice] <0.295.0> Changed loghwm of /var/log/rabbitmq/rab...@node1001.log to 5000

2020-08-02 11:33:11.304 [info] <0.3817.0> Server startup complete; 5 plugins started.

 * rabbitmq_event_exchange

 * rabbitmq_mqtt

 * rabbitmq_management

 * rabbitmq_web_dispatch

 * rabbitmq_management_agent



вторник, 28 июля 2020 г. в 14:36:22 UTC+2, Karl Nilsson:

abhayverma

unread,
Aug 2, 2020, 8:57:01 AM8/2/20
to rabbitmq-users
Hi,

"It looks like in our case we might have the same situation. I mean that the MQTT node believes that the client is still there, but the client somehow recognizes the connection as broken, and re-connects without closing the previous connection gracefully."

I have faced this similar issue and in my case the clients keeps on reconnecting and eventually reach to a very high numbers (more than 2x of original client connections) as shown in management UI. This issue was even reproducible on v3.7.19. Interestingly, the number of (re-connect) connections shown in management UI is directly proportional to the mqtt tcp backlog value. If you keep a low backlog value the reconnect attempts tends to slow down and settle to actual numbers.

We were able to reproduce this whenever we reset the rabbitmq cluster while the connections are still live. Mostly when the churn rate is high, we observe this issue on clients trying to reconnect.

Ivan A Eryshov

unread,
Aug 2, 2020, 11:33:09 AM8/2/20
to rabbitm...@googlegroups.com
In our deployment, I cannot see any growth in connection numbers. We have as many connections as MQTT clients. The number of channels exceeds the number of clients by factor ~1.5. I'll try to play with backlog size. Now it is set to 32, which I believe quite low. 

Reply all
Reply to author
Forward
0 new messages