It looks like in our case we might have the same situation. I mean that the MQTT node believes that the client is still there, but the client somehow recognizes the connection as broken, and re-connects without closing the previous connection gracefully. Some time ago I described our case here: https://groups.google.com/g/rabbitmq-users/c/z-jTwSHS-10/m/WacLtMLrBAAJ
Since then I applied the new version of MQTT plugin (provided by Michael Klishin) to all nodes and replaced the Windows node with the Linux node. Now everything works without constantly producing crashes in logs. Anyway, we still can see some anomalies:
All nodes might work well without any noticeable issues for a while but at some point in time, one of nodes starting go crazy - the memory consumption fast and constantly grows. According to "rabbitmqctl status" most of the memory consumed by connection_other - up to 1.5GiB (it takes ~ 0.36GiB on other two nodes) or quorum_ets - up to 2.2GiB (it takes ~ 0.21GiB on other two nodes). It reaches highmark in two-three days, and we have to restart the node. Stopping and starting MQTT plugin does not free the memory, so we restart the rabbitmq-server. It helps, but unfortunately not much long.
The start of the node after the stop takes a significant amount of time (see logs below). We also see, that about 14GiB of HDD used by files in /var/lib/rabbitmq/mnesia/rabbit@vs20031001/quorum/rabbit@vs20031001/MQTT_ZZZYYYXXX . There are thousands of files with names like 000NNNNN.segment .
So I have two questions:
(1) shall we try 3.8.6.RC, if we already applied the version of MQTT plugin provided by Michael? Was "any time you make a duplicate client id request whilst it is still active should trigger this error" addressed there?
(2) We aren't happy with Gigs of disk space used by segment files. How can I get rid of them?
The logs I wrote about above (warnings and the gap between 11:31:52 and 11:33:10 looks strange for me):
---------------------------------------------
2020-08-02 11:31:42.035 [info] <0.285.0>
Starting RabbitMQ 3.8.5 on Erlang 23.0.2
Copyright (c) 2007-2020 VMware, Inc. or its affiliates.
Licensed under the MPL 1.1. Website: https://rabbitmq.com
2020-08-02 11:31:42.037 [info] <0.285.0>
node : rabbit@node1001
home dir : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.conf
cookie hash : xxxxxxxxxxx
log(s) : /var/log/rabbitmq/rab...@node1001.log
: /var/log/rabbitmq/rabbit@node1001_upgrade.log
database dir : /var/lib/rabbitmq/mnesia/rabbit@node1001
2020-08-02 11:31:42.052 [info] <0.285.0> Running boot step pre_boot defined by app rabbit
2020-08-02 11:31:42.052 [info] <0.285.0> Running boot step rabbit_core_metrics defined by app rabbit
2020-08-02 11:31:42.052 [info] <0.285.0> Running boot step rabbit_alarm defined by app rabbit
2020-08-02 11:31:42.058 [info] <0.422.0> Memory high watermark set to 4804 MiB (5037500006 bytes) of 12010 MiB (12593750016 bytes) total
2020-08-02 11:31:42.062 [info] <0.424.0> Enabling free disk space monitoring
2020-08-02 11:31:42.062 [info] <0.424.0> Disk free limit set to 50MB
2020-08-02 11:31:42.068 [info] <0.285.0> Running boot step code_server_cache defined by app rabbit
2020-08-02 11:31:42.068 [info] <0.285.0> Running boot step file_handle_cache defined by app rabbit
2020-08-02 11:31:42.068 [info] <0.427.0> Limiting to approx 32671 file handles (29401 sockets)
2020-08-02 11:31:42.068 [info] <0.428.0> FHC read buffering: OFF
2020-08-02 11:31:42.068 [info] <0.428.0> FHC write buffering: ON
2020-08-02 11:31:42.069 [info] <0.285.0> Running boot step worker_pool defined by app rabbit
2020-08-02 11:31:42.069 [info] <0.417.0> Will use 4 processes for default worker pool
2020-08-02 11:31:42.069 [info] <0.417.0> Starting worker pool 'worker_pool' with 4 processes in it
2020-08-02 11:31:42.069 [info] <0.285.0> Running boot step database defined by app rabbit
2020-08-02 11:31:42.262 [info] <0.285.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2020-08-02 11:31:42.486 [info] <0.285.0> Successfully synced tables from a peer
2020-08-02 11:31:42.487 [info] <0.285.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2020-08-02 11:31:42.487 [info] <0.285.0> Successfully synced tables from a peer
2020-08-02 11:31:42.487 [warning] <0.285.0> Feature flags: the previous instance of this node must have failed to write the `feature_flags` file at `/var/lib/rabbitmq/mnesia/rabbit@node1001-feature_flags`:
2020-08-02 11:31:42.487 [warning] <0.285.0> Feature flags: - list of previously disabled feature flags now marked as such: [empty_basic_get_metric]
2020-08-02 11:31:42.498 [info] <0.285.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2020-08-02 11:31:42.498 [info] <0.285.0> Successfully synced tables from a peer
2020-08-02 11:31:42.577 [info] <0.285.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2020-08-02 11:31:42.577 [info] <0.285.0> Successfully synced tables from a peer
2020-08-02 11:31:42.577 [info] <0.285.0> Peer discovery backend rabbit_peer_discovery_classic_config does not support registration, skipping registration.
2020-08-02 11:31:42.577 [info] <0.285.0> Running boot step database_sync defined by app rabbit
2020-08-02 11:31:42.577 [info] <0.285.0> Running boot step feature_flags defined by app rabbit
2020-08-02 11:31:42.577 [info] <0.285.0> Running boot step codec_correctness_check defined by app rabbit
2020-08-02 11:31:42.577 [info] <0.285.0> Running boot step external_infrastructure defined by app rabbit
2020-08-02 11:31:42.577 [info] <0.285.0> Running boot step rabbit_registry defined by app rabbit
2020-08-02 11:31:42.578 [info] <0.285.0> Running boot step rabbit_auth_mechanism_cr_demo defined by app rabbit
2020-08-02 11:31:42.578 [info] <0.285.0> Running boot step rabbit_queue_location_random defined by app rabbit
2020-08-02 11:31:42.578 [info] <0.285.0> Running boot step rabbit_event defined by app rabbit
2020-08-02 11:31:42.578 [info] <0.285.0> Running boot step rabbit_auth_mechanism_amqplain defined by app rabbit
2020-08-02 11:31:42.578 [info] <0.285.0> Running boot step rabbit_auth_mechanism_plain defined by app rabbit
2020-08-02 11:31:42.578 [info] <0.285.0> Running boot step rabbit_exchange_type_direct defined by app rabbit
2020-08-02 11:31:42.578 [info] <0.285.0> Running boot step rabbit_exchange_type_fanout defined by app rabbit
2020-08-02 11:31:42.579 [info] <0.285.0> Running boot step rabbit_exchange_type_headers defined by app rabbit
2020-08-02 11:31:42.579 [info] <0.285.0> Running boot step rabbit_exchange_type_topic defined by app rabbit
2020-08-02 11:31:42.579 [info] <0.285.0> Running boot step rabbit_mirror_queue_mode_all defined by app rabbit
2020-08-02 11:31:42.579 [info] <0.285.0> Running boot step rabbit_mirror_queue_mode_exactly defined by app rabbit
2020-08-02 11:31:42.579 [info] <0.285.0> Running boot step rabbit_mirror_queue_mode_nodes defined by app rabbit
2020-08-02 11:31:42.579 [info] <0.285.0> Running boot step rabbit_priority_queue defined by app rabbit
2020-08-02 11:31:42.579 [info] <0.285.0> Priority queues enabled, real BQ is rabbit_variable_queue
2020-08-02 11:31:42.580 [info] <0.285.0> Running boot step rabbit_queue_location_client_local defined by app rabbit
2020-08-02 11:31:42.580 [info] <0.285.0> Running boot step rabbit_queue_location_min_masters defined by app rabbit
2020-08-02 11:31:42.580 [info] <0.285.0> Running boot step kernel_ready defined by app rabbit
2020-08-02 11:31:42.580 [info] <0.285.0> Running boot step rabbit_sysmon_minder defined by app rabbit
2020-08-02 11:31:42.581 [info] <0.285.0> Running boot step rabbit_epmd_monitor defined by app rabbit
2020-08-02 11:31:42.583 [info] <0.527.0> epmd monitor knows us, inter-node communication (distribution) port: 25672
2020-08-02 11:31:42.583 [info] <0.285.0> Running boot step guid_generator defined by app rabbit
2020-08-02 11:31:42.602 [info] <0.285.0> Running boot step rabbit_node_monitor defined by app rabbit
2020-08-02 11:31:42.604 [info] <0.533.0> Starting rabbit_node_monitor
2020-08-02 11:31:42.604 [info] <0.285.0> Running boot step delegate_sup defined by app rabbit
2020-08-02 11:31:42.605 [info] <0.285.0> Running boot step rabbit_memory_monitor defined by app rabbit
2020-08-02 11:31:42.605 [info] <0.285.0> Running boot step core_initialized defined by app rabbit
2020-08-02 11:31:42.605 [info] <0.285.0> Running boot step upgrade_queues defined by app rabbit
2020-08-02 11:31:42.627 [info] <0.285.0> Running boot step rabbit_connection_tracking defined by app rabbit
2020-08-02 11:31:42.627 [info] <0.285.0> Running boot step rabbit_connection_tracking_handler defined by app rabbit
2020-08-02 11:31:42.627 [info] <0.285.0> Running boot step rabbit_exchange_parameters defined by app rabbit
2020-08-02 11:31:42.628 [info] <0.285.0> Running boot step rabbit_mirror_queue_misc defined by app rabbit
2020-08-02 11:31:42.628 [info] <0.285.0> Running boot step rabbit_policies defined by app rabbit
2020-08-02 11:31:42.630 [info] <0.285.0> Running boot step rabbit_policy defined by app rabbit
2020-08-02 11:31:42.630 [info] <0.285.0> Running boot step rabbit_queue_location_validator defined by app rabbit
2020-08-02 11:31:42.630 [info] <0.285.0> Running boot step rabbit_quorum_memory_manager defined by app rabbit
2020-08-02 11:31:42.630 [info] <0.285.0> Running boot step rabbit_vhost_limit defined by app rabbit
2020-08-02 11:31:42.630 [info] <0.285.0> Running boot step recovery defined by app rabbit
2020-08-02 11:31:49.918 [info] <0.3741.0> Making sure data directory '/var/lib/rabbitmq/mnesia/rabbit@node1001/msg_stores/vhosts/XXXYYYZZZ' for vhost '/' exists
2020-08-02 11:31:49.934 [info] <0.3741.0> Starting message stores for vhost '/'
2020-08-02 11:31:49.935 [info] <0.3745.0> Message store "XXXYYYZZZ/msg_store_transient": using rabbit_msg_store_ets_index to provide index
2020-08-02 11:31:49.941 [info] <0.3741.0> Started message store of type transient for vhost '/'
2020-08-02 11:31:49.941 [info] <0.3749.0> Message store "XXXYYYZZZ/msg_store_persistent": using rabbit_msg_store_ets_index to provide index
2020-08-02 11:31:49.945 [info] <0.3741.0> Started message store of type persistent for vhost '/'
2020-08-02 11:31:50.005 [info] <0.285.0> Running boot step definition_import_worker_pool defined by app rabbit
2020-08-02 11:31:50.005 [info] <0.417.0> Starting worker pool 'definition_import_pool' with 4 processes in it
2020-08-02 11:31:50.005 [info] <0.285.0> Running boot step load_core_definitions defined by app rabbit
2020-08-02 11:31:50.005 [info] <0.285.0> Running boot step empty_db_check defined by app rabbit
2020-08-02 11:31:50.005 [info] <0.285.0> Running boot step rabbit_looking_glass defined by app rabbit
2020-08-02 11:31:50.005 [info] <0.285.0> Running boot step rabbit_core_metrics_gc defined by app rabbit
2020-08-02 11:31:50.005 [info] <0.285.0> Running boot step background_gc defined by app rabbit
2020-08-02 11:31:50.006 [info] <0.285.0> Running boot step connection_tracking defined by app rabbit
2020-08-02 11:31:50.007 [info] <0.285.0> Setting up a table for connection tracking on this node: tracked_connection_on_node_rabbit@node1001
2020-08-02 11:31:50.009 [info] <0.285.0> Setting up a table for per-vhost connection counting on this node: tracked_connection_per_vhost_on_node_rabbit@node1001
2020-08-02 11:31:50.009 [info] <0.285.0> Running boot step routing_ready defined by app rabbit
2020-08-02 11:31:50.009 [info] <0.285.0> Running boot step pre_flight defined by app rabbit
2020-08-02 11:31:50.009 [info] <0.285.0> Running boot step notify_cluster defined by app rabbit
2020-08-02 11:31:50.009 [info] <0.285.0> Running boot step networking defined by app rabbit
2020-08-02 11:31:50.010 [info] <0.533.0> rabbit on node rabbit@node2801 up
2020-08-02 11:31:50.011 [info] <0.3797.0> started TCP listener on [::]:5672
2020-08-02 11:31:50.093 [info] <0.533.0> rabbit on node rabbit@node1002 up
2020-08-02 11:31:50.136 [info] <0.3815.0> started TLS (SSL) listener on [::]:5671
2020-08-02 11:31:50.144 [info] <0.285.0> Running boot step cluster_name defined by app rabbit
2020-08-02 11:31:50.144 [info] <0.285.0> Running boot step direct_client defined by app rabbit
2020-08-02 11:31:52.117 [info] <0.3817.0> Feature flag `drop_unroutable_metric`: mark as enabled=true
2020-08-02 11:31:52.149 [info] <0.3817.0> Feature flag `empty_basic_get_metric`: mark as enabled=true
2020-08-02 11:31:52.493 [info] <0.3817.0> Running boot step rabbit_mgmt_db_handler defined by app rabbitmq_management_agent
2020-08-02 11:31:52.493 [info] <0.3817.0> Management plugin: using rates mode 'basic'
2020-08-02 11:31:52.691 [info] <0.3817.0> Running boot step rabbit_mgmt_reset_handler defined by app rabbitmq_management
2020-08-02 11:31:52.691 [info] <0.3817.0> Running boot step rabbit_management_load_definitions defined by app rabbitmq_management
2020-08-02 11:31:52.736 [info] <0.3886.0> Management plugin: HTTPS listener started on port 15671
2020-08-02 11:31:52.736 [info] <0.3993.0> Statistics database started.
2020-08-02 11:31:52.736 [info] <0.3992.0> Starting worker pool 'management_worker_pool' with 3 processes in it
2020-08-02 11:33:10.584 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.585 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.590 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.590 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.595 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.660 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.688 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.702 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.718 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.720 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.723 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.738 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.743 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.756 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.758 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.761 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.767 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.785 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.786 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.826 [warning] <0.4007.0> mqtt_node: leader cast - leader not known. Command is dropped.
2020-08-02 11:33:10.837 [info] <0.4184.0> MQTT retained message store: rabbit_mqtt_retained_msg_store_dets
2020-08-02 11:33:10.841 [info] <0.4202.0> started MQTT TCP listener on [::]:1883
2020-08-02 11:33:10.843 [info] <0.4218.0> started MQTT TLS listener on [::]:8883
2020-08-02 11:33:10.875 [info] <0.3817.0> Running boot step rabbit_event_exchange_decorator defined by app rabbitmq_event_exchange
2020-08-02 11:33:10.876 [info] <0.3817.0> Running boot step rabbit_event_exchange defined by app rabbitmq_event_exchange
2020-08-02 11:33:10.880 [notice] <0.295.0> Changed loghwm of /var/log/rabbitmq/rab...@node1001.log to 5000
2020-08-02 11:33:11.304 [info] <0.3817.0> Server startup complete; 5 plugins started.
* rabbitmq_event_exchange
* rabbitmq_mqtt
* rabbitmq_management
* rabbitmq_web_dispatch
* rabbitmq_management_agent
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/68efe9f8-87de-46aa-ad57-9f99aaa9fc9en%40googlegroups.com.