High memory usage with v3.8.14 3-node cluster

351 views
Skip to first unread message

Damian TagDev

unread,
May 21, 2021, 9:02:50 AM5/21/21
to rabbitmq-users
Hi,

I'm hoping someone can help me understand why we're seeing very high memory usage (> 4gb) after a blue-green upgrade to v3.8.14 from v3.6.6 on Windows Server 2012R2.

We have 3 nodes in a cluster, 1 primary/active, 2 passive mirrors. All our connections go to the primary node, and that's where the queues are homed. All our queues are classic and persisted.

Before the upgrade the memory usage on the primary node was a steady 1.5GB. Since the upgrade the memory usage has climbed steadily (linear) until this morning where it triggered the memory alarm.

The memory usage on the 2 passive mirrors is steady at ~1.5GB.

According to the management UI the primary node has 2.5GB allocated to "Binaries", but when I look at binary references it only mentions ~9.5MB worth of memory.

I've tried a forced garbage collection, but it made no real difference to the amount of RAM consumed.

TIA
Damian

Partial output from diagnostics report:
Reporting server status of node rabbit@MQ01 ...

Status of node rabbit@MQ01 ...
Runtime

OS PID: 1020
OS: Windows
Uptime (seconds): 233333
Is under maintenance?: false
RabbitMQ version: 3.8.14
Node name: rabbit@MQ01
Erlang configuration: Erlang/OTP 23 [erts-11.2] [source] [64-bit] [smp:6:6] [ds:6:6:10] [async-threads:1]
Erlang processes: 34159 used, 1048576 limit
Scheduler run queue: 1
Cluster heartbeat timeout (net_ticktime): 60

Plugins

Enabled plugin file: c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/enabled_plugins
Enabled plugins:

 * rabbitmq_top
 * rabbitmq_management
 * amqp_client
 * rabbitmq_web_dispatch
 * cowboy
 * cowlib
 * rabbitmq_management_agent

Data directory

Node data directory: c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/db/rabbit@MQ01-mnesia
Raft data directory: c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/db/rabbit@MQ01-mnesia/quorum/rabbit@MQ01

Config files

 * c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/advanced.config
 * c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/rabbitmq.conf

Log file(s)

 * c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rab...@MQ01.log
 * c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rabbit@MQ01_upgrade.log

Alarms

Memory alarm on node rabbit@MQ01

Memory

Total memory used: 5.1439 gb
Calculation strategy: rss
Memory high watermark setting: 0.4 of available memory, computed to: 5.1537 gb

binary: 2.8048 gb (54.53 %)
allocated_unused: 0.9473 gb (18.42 %)
queue_procs: 0.6793 gb (13.21 %)
mgmt_db: 0.2611 gb (5.08 %)
other_proc: 0.1222 gb (2.38 %)
plugins: 0.0922 gb (1.79 %)
mnesia: 0.0829 gb (1.61 %)
msg_index: 0.0495 gb (0.96 %)
code: 0.0329 gb (0.64 %)
other_system: 0.0311 gb (0.61 %)
metrics: 0.0222 gb (0.43 %)
other_ets: 0.0168 gb (0.33 %)
atom: 0.0015 gb (0.03 %)
quorum_ets: 0.0 gb (0.0 %)
connection_other: 0.0 gb (0.0 %)
connection_channels: 0.0 gb (0.0 %)
connection_readers: 0.0 gb (0.0 %)
connection_writers: 0.0 gb (0.0 %)
queue_slave_procs: 0.0 gb (0.0 %)
quorum_queue_procs: 0.0 gb (0.0 %)
reserved_unallocated: 0.0 gb (0.0 %)

File Descriptors

Total: 3360, limit: 65439
Sockets: 3, limit: 58893

Free Disk Space

Low free disk space watermark: 0.05 gb
Free disk space: 104.257 gb

Totals

Connection count: 7
Queue count: 6615
Virtual host count: 720

Listeners

Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Interface: 0.0.0.0, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Interface: [::], port: 15672, protocol: http, purpose: HTTP API
Interface: 0.0.0.0, port: 15672, protocol: http, purpose: HTTP API

Cluster status of node rabbit@MQ01 ...
Basics

Cluster name: rabbit@MQ01

Disk Nodes

rabbit@MQ01
rabbit@MQ02
rabbit@MQ03

Running Nodes

rabbit@MQ01
rabbit@MQ02
rabbit@MQ03

Versions

rabbit@MQ01: RabbitMQ 3.8.14 on Erlang 23.3
rabbit@MQ02: RabbitMQ 3.8.14 on Erlang 23.3
rabbit@MQ03: RabbitMQ 3.8.14 on Erlang 23.3

Maintenance status

Node: rabbit@MQ01, status: not under maintenance
Node: rabbit@MQ02, status: not under maintenance
Node: rabbit@MQ03, status: not under maintenance

Alarms

(none)

Network Partitions

(none)

Listeners

Node: rabbit@MQ01, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@MQ01, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@MQ01, interface: 0.0.0.0, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@MQ01, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@MQ01, interface: 0.0.0.0, port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@MQ02, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@MQ02, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@MQ02, interface: 0.0.0.0, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@MQ02, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@MQ02, interface: 0.0.0.0, port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@MQ03, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@MQ03, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@MQ03, interface: 0.0.0.0, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@MQ03, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@MQ03, interface: 0.0.0.0, port: 15672, protocol: http, purpose: HTTP API

Feature flags

Flag: drop_unroutable_metric, state: enabled
Flag: empty_basic_get_metric, state: enabled
Flag: implicit_default_bindings, state: enabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled

Application environment of node rabbit@MQ01 ...
[{amqp_client,
     [{gen_server_call_timeout,130000},
      {prefer_ipv6,false},
      {ssl_options,[]},
      {writer_gc_threshold,1000000000}]},
 {asn1,[]},
 {aten,
     [{detection_threshold,0.99},
      {heartbeat_interval,100},
      {poll_interval,5000},
      {scaling_factor,1.5}]},
 {compiler,[]},
 {cowboy,[]},
 {cowlib,[]},
 {credentials_obfuscation,[{enabled,true}]},
 {crypto,[{fips_mode,false},{rand_cache_size,896}]},
 {cuttlefish,[]},
 {gen_batch_server,[]},
 {goldrush,[]},
 {inets,[]},
 {jsx,[]},
 {kernel,
     [{inet_default_connect_options,[{nodelay,true}]},
      {inet_dist_listen_max,25672},
      {inet_dist_listen_min,25672},
      {logger,
          [{handler,default,logger_std_h,
               #{config => #{type => standard_io},
                 formatter =>
                     {logger_formatter,
                         #{legacy_header => true,single_line => false}}}}]},
      {logger_level,notice},
      {logger_sasl_compatible,false},
      {shell_docs_ansi,auto},
      {shutdown_func,{rabbit_prelaunch,shutdown_func}}]},
 {lager,
     [{async_threshold,20},
      {async_threshold_window,5},
      {colored,false},
      {colors,
          [{debug,"\e[0;38m"},
           {info,"\e[1;37m"},
           {notice,"\e[1;36m"},
           {warning,"\e[1;33m"},
           {error,"\e[1;31m"},
           {critical,"\e[1;35m"},
           {alert,"\e[1;44m"},
           {emergency,"\e[1;41m"}]},
      {crash_log,"log/crash.log"},
      {crash_log_count,5},
      {crash_log_date,"$D0"},
      {crash_log_msg_size,65536},
      {crash_log_rotator,lager_rotator_default},
      {crash_log_size,10485760},
      {error_logger_format_raw,true},
      {error_logger_hwm,5000},
      {error_logger_hwm_original,50},
      {error_logger_redirect,true},
      {extra_sinks,
          [{error_logger_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,info]}]}]},
           {rabbit_log_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,info]}]}]},
           {rabbit_log_channel_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,warning]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,warning]}]}]},
           {rabbit_log_connection_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,warning]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,warning]}]}]},
           {rabbit_log_feature_flags_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,info]}]}]},
           {rabbit_log_federation_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,info]}]}]},
           {rabbit_log_ldap_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,info]}]}]},
           {rabbit_log_mirroring_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,info]}]}]},
           {rabbit_log_prelaunch_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,info]}]}]},
           {rabbit_log_queue_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,info]}]}]},
           {rabbit_log_ra_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,info]}]}]},
           {rabbit_log_shovel_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,info]}]}]},
           {rabbit_log_upgrade_lager_event,
               [{handlers,
                    [{lager_file_backend,
                         [{count,100},
                          {date,"$D0"},
                          {file,
                              "c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rabbit@MQ01_upgrade.log"},
                          {formatter_config,
                              [date," ",time," ",color,"[",severity,"] ",
                               {pid,[]},
                               " ",message,"\n"]},
                          {level,info},
                          {size,10485760}]}]},
                {rabbit_handlers,
                    [{lager_file_backend,
                         [{count,100},
                          {date,"$D0"},
                          {file,
                              "c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rabbit@MQ01_upgrade.log"},
                          {formatter_config,
                              [date," ",time," ",color,"[",severity,"] ",
                               {pid,[]},
                               " ",message,"\n"]},
                          {level,info},
                          {size,10485760}]}]}]}]},
      {handlers,
          [{lager_file_backend,
               [{count,100},
                {date,"$D0"},
                {file,
                    "c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rab...@MQ01.log"},
                {formatter_config,
                    [date," ",time," ",color,"[",severity,"] ",
                     {pid,[]},
                     " ",message,"\n"]},
                {level,debug},
                {size,10485760}]}]},
      {log_root,"c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log"},
      {rabbit_handlers,
          [{lager_file_backend,
               [{count,100},
                {date,"$D0"},
                {file,
                    "c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rab...@MQ01.log"},
                {formatter_config,
                    [date," ",time," ",color,"[",severity,"] ",
                     {pid,[]},
                     " ",message,"\n"]},
                {level,debug},
                {size,10485760}]}]}]},
 {mnesia,
     [{dir,
          "c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/db/rabbit@MQ01-mnesia"}]},
 {observer_cli,[{plugins,[]},{scheduler_usage,disable}]},
 {os_mon,
     [{start_cpu_sup,false},
      {start_disksup,false},
      {start_memsup,false},
      {start_os_sup,false}]},
 {public_key,[]},
 {ra,[{data_dir,
          "c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/db/rabbit@MQ01-mnesia/quorum"},
      {logger_module,rabbit_log_ra_shim},
      {wal_max_batch_size,4096},
      {wal_max_size_bytes,536870912}]},
 {rabbit,
     [{auth_backends,[rabbit_auth_backend_internal]},
      {auth_mechanisms,['PLAIN','AMQPLAIN']},
      {autocluster,
          [{peer_discovery_backend,rabbit_peer_discovery_classic_config}]},
      {autoheal_state_transition_timeout,60000},
      {background_gc_enabled,false},
      {background_gc_target_interval,60000},
      {backing_queue_module,rabbit_priority_queue},
      {channel_max,2047},
      {channel_operation_timeout,15000},
      {channel_tick_interval,60000},
      {cluster_keepalive_interval,10000},
      {cluster_nodes,{[],disc}},
      {cluster_partition_handling,autoheal},
      {collect_statistics,fine},
      {collect_statistics_interval,5000},
      {config_entry_decoder,[{passphrase,undefined}]},
      {connection_max,infinity},
      {credit_flow_default_credit,{400,200}},
      {default_consumer_prefetch,{false,0}},
      {default_permissions,[<<".*">>,<<".*">>,<<".*">>]},
      {default_user,<<"guest">>},
      {default_user_tags,[administrator]},
      {default_vhost,<<"/">>},
      {delegate_count,16},
      {disk_free_limit,50000000},
      {disk_monitor_failure_retries,10},
      {disk_monitor_failure_retry_interval,120000},
      {enabled_plugins_file,
          "c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/enabled_plugins"},
      {feature_flags_file,
          "c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/db/rabbit@MQ01-feature_flags"},
      {fhc_read_buffering,false},
      {fhc_write_buffering,true},
      {frame_max,131072},
      {halt_on_upgrade_failure,true},
      {handshake_timeout,10000},
      {heartbeat,60},
      {lager_default_file,
          "c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rab...@MQ01.log"},
      {lager_log_root,
          "c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log"},
      {lager_upgrade_file,
          "c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rabbit@MQ01_upgrade.log"},
      {lazy_queue_explicit_gc_run_operation_threshold,1000},
      {log,
          [{categories,
               [{channel,[{level,warning}]},
                {connection,[{level,warning}]},
                {upgrade,
                    [{file,
                         "c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rabbit@MQ01_upgrade.log"}]}]},
           {file,
               [{count,100},
                {size,10485760},
                {date,"$D0"},
                {file,
                    "c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rab...@MQ01.log"}]}]},
      {loopback_users,[<<"guest">>]},
      {max_message_size,134217728},
      {memory_monitor_interval,2500},
      {mirroring_flow_control,true},
      {mirroring_sync_batch_size,4096},
      {mnesia_table_loading_retry_limit,10},
      {mnesia_table_loading_retry_timeout,30000},
      {msg_store_credit_disc_bound,{4000,800}},
      {msg_store_file_size_limit,16777216},
      {msg_store_index_module,rabbit_msg_store_ets_index},
      {msg_store_io_batch_size,4096},
      {msg_store_shutdown_timeout,600000},
      {num_ssl_acceptors,10},
      {num_tcp_acceptors,10},
      {password_hashing_module,rabbit_password_hashing_sha256},
      {plugins_dir,
          "c:/Program Files/RabbitMQ Server/rabbitmq_server-3.8.14/plugins"},
      {plugins_expand_dir,
          "c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/db/rabbit@MQ01-plugins-expand"},
      {proxy_protocol,false},
      {queue_explicit_gc_run_operation_threshold,1000},
      {queue_index_embed_msgs_below,4096},
      {queue_index_max_journal_entries,32768},
      {quorum_cluster_size,3},
      {quorum_commands_soft_limit,32},
      {reverse_dns_lookups,false},
      {server_properties,[]},
      {ssl_allow_poodle_attack,false},
      {ssl_apps,[asn1,crypto,public_key,ssl]},
      {ssl_cert_login_from,distinguished_name},
      {ssl_handshake_timeout,5000},
      {ssl_listeners,[]},
      {ssl_options,[]},
      {tcp_listen_options,
          [{backlog,128},
           {nodelay,true},
           {linger,{true,0}},
           {exit_on_close,false}]},
      {tcp_listeners,[{"auto",5672}]},
      {trace_vhosts,[]},
      {track_auth_attempt_source,false},
      {tracking_execution_timeout,15000},
      {vhost_restart_strategy,continue},
      {vm_memory_calculation_strategy,rss},
      {vm_memory_high_watermark,0.4},
      {vm_memory_high_watermark_paging_ratio,0.5},
      {writer_gc_threshold,1000000000}]},
 {rabbit_common,[]},
 {rabbitmq_management,
     [{content_security_policy,
          "script-src 'self' 'unsafe-eval' 'unsafe-inline'; object-src 'self'"},
      {cors_allow_origins,[]},
      {cors_max_age,1800},
      {http_log_dir,none},
      {load_definitions,none},
      {management_db_cache_multiplier,5},
      {process_stats_gc_timeout,300000},
      {stats_event_max_backlog,250}]},
 {rabbitmq_management_agent,
     [{rates_mode,basic},
      {sample_retention_policies,
          [{global,[{605,5},{3660,60},{29400,600},{86400,1800}]},
           {basic,[{605,5},{3600,60}]},
           {detailed,[{605,5}]}]}]},
 {rabbitmq_prelaunch,[]},
 {rabbitmq_top,[]},
 {rabbitmq_web_dispatch,[]},
 {ranch,[]},
 {recon,[]},
 {sasl,[{errlog_type,error},{sasl_error_logger,false}]},
 {ssl,[]},
 {stdlib,[]},
 {stdout_formatter,[]},
 {syntax_tools,[]},
 {sysmon_handler,
     [{busy_dist_port,true},
      {busy_port,false},
      {gc_ms_limit,0},
      {heap_word_limit,0},
      {port_limit,100},
      {process_limit,100},
      {schedule_ms_limit,0}]},
 {tools,[{file_util_search_methods,[{[],[]},{"ebin","esrc"},{"ebin","src"}]}]},
 {xmerl,[]}]

Listing connections ...
pid name port host peer_port peer_host ssl ssl_protocol ssl_key_exchange ssl_cipher ssl_hash peer_cert_subject peer_cert_issuer peer_cert_validity state channels protocol auth_mechanism user vhost timeout frame_max channel_max client_properties recv_oct recv_cnt send_oct send_cnt send_pend connected_at
<rab...@MQ01.1621365939.9920.7127> __.__.21.85:60695 -> __.__.21.77:5672 5672 __.__.21.77 60695 __.__.21.85 false running 1 {0,9,1} PLAIN _app _9cb8abb0-a2f1-4c3b-b400-b9a672e29e93 60 131072 2047 [{"product","RabbitMQ"},{"version","5.2.0+dad9cee150674d4c70f275c32a39d7b95a1fb0f1"},{"platform",".NET"},{"copyright","Copyright (c) 2007-2020 VMware, Inc."},{"information","Licensed under the MPL.  See http://www.rabbitmq.com/"},{"capabilities",[{"publisher_confirms",true},{"exchange_exchange_bindings",true},{"basic.nack",true},{"consumer_cancel_notify",true},{"connection.blocked",true},{"authentication_failure_close",true}]},{"connection_name",undefined}] 664 5 693246 5 0 1621599272184
<rab...@MQ01.1621365939.10867.7127> __.__.19.101:59024 -> __.__.21.77:5672 5672 __.__.21.77 59024 __.__.19.101 false running 0 {0,9,1} PLAIN _app _18bb788a-74dc-4b6e-8150-ae15c0a00c0b 60 131072 2047 [{"product","RabbitMQ"},{"version","5.2.0+dad9cee150674d4c70f275c32a39d7b95a1fb0f1"},{"platform",".NET"},{"copyright","Copyright (c) 2007-2020 VMware, Inc."},{"information","Licensed under the MPL.  See http://www.rabbitmq.com/"},{"capabilities",[{"publisher_confirms",true},{"exchange_exchange_bindings",true},{"basic.nack",true},{"consumer_cancel_notify",true},{"connection.blocked",true},{"authentication_failure_close",true}]},{"connection_name",undefined}] 547 3 561 3 0 1621599272266

Listing channels ...
pid connection name number user vhost transactional confirm consumer_count messages_unacknowledged messages_uncommitted acks_uncommitted messages_unconfirmed prefetch_count global_prefetch_count
<rab...@MQ01.1621365939.9458.7127> <rab...@MQ01.1621365939.9920.7127> __.__.21.85:60695 -> __.__.21.77:5672 (1) 1 _app _9cb8abb0-a2f1-4c3b-b400-b9a672e29e93 false false 0 1 0 0 0 0 0

Command line arguments of node rabbit@MQ01 ...
[{root,["C:\\Program Files\\erl-23.3"]},
 {progname,["erl"]},
 {home,["C:\\windows\\system32\\config\\systemprofile"]},
 {sname,["rabbit@MQ01"]},
 {kernel,["inet_dist_listen_min","25672"]},
 {kernel,["inet_dist_listen_max","25672"]},
 {lager,["crash_log","false"]},
 {lager,["handlers","[]"]},
 {boot,["start_sasl"]}]

Listing RabbitMQ-specific environment variables defined on node rabbit@MQ01...
RABBITMQ_BASE=C:\Users\SVC_RabbitMQ\AppData\Roaming\RabbitMQ
RABBITMQ_NODENAME=rabbit@MQ01

Timeout: 60.0 seconds ...

Damian TagDev

unread,
May 21, 2021, 9:29:20 AM5/21/21
to rabbitmq-users
MQ01_MemoryDetails.JPG

Damian TagDev

unread,
May 21, 2021, 10:34:32 AM5/21/21
to rabbitmq-users
RMQ Node RAM consumption.jpg
The lines on the left show the cluster RAM usage on v3.6.6, and the lines on the right show v3.8.14 RAM usage. MQ02 was taking connections on the left and MQ01 on the right. The machines are all identical in terms of spec and we regularly rotate which is the primary node.

kjnilsson

unread,
May 26, 2021, 5:08:21 AM5/26/21
to rabbitmq-users
Hi,

Can you tell us a bit more about how your client applications are using RabbitMQ. Are they long-lived or short-lived connections? 
Do you regularly poll the management HTTP Api for queue metrics?
Is there any activity in the logs?

Cheers
Karl

Damian TagDev

unread,
May 26, 2021, 7:21:16 AM5/26/21
to rabbitm...@googlegroups.com
Hi Karl,

We have a .Net Framework 4.6.2 C# web api, which uses the RabbitMQ .Net Client v5.2.0 for publishing and subscribing messages, and the Management API for the creation of vHosts, exchanges, bindings, and queues. As it stands connections are opened and closed per messaging related web request to our api. On average we're opening and closing about 60 connections per second (according to the management overview page). We don't poll for queue metrics from the management api. We do poll the queues from the .Net client to see if there are any messages waiting but that's usually only about once every 60-90 seconds. For each vHost we have a single "inbound" exchange which routes messages to an appropriate "outbound_topic" exchange, which routes the messages to the relevant outbound queues.

I've set connection logging to "warning" so we don't get spammed with the connection churn. In the logs we mostly just see these unless we've hit the memory limit:
2021-05-26 11:53:56.325 [info] <0.13549.5404> supervisor: {<0.13549.5404>,rabbit_channel_sup_sup}, errorContext: shutdown_error, reason: noproc, offender: [{nb_children,1},{id,channel_sup},{mfargs,{rabbit_channel_sup,start_link,[]}},{restart_type,temporary},{shutdown,infinity},{child_type,supervisor}]
2021-05-26 11:53:56.325 [error] <0.13549.5404> Supervisor {<0.13549.5404>,rabbit_channel_sup_sup} had child channel_sup started with rabbit_channel_sup:start_link() at undefined exit with reason noproc in context shutdown_error


I'm happy to send logs but would prefer not to do it on a public mailing list if that's possible? We're currently writing some tests to see if we can reproduce the scenario outside of our application logic - we'd be happy to send that over if we succeed in reproducing the problem.

We've also tried running our staging environment on 1 node instead of clustered but that also exhibits the same behaviour with regards to memory usage.

Kind Regards
Damian

--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/haPQOtniGbE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/2da7441c-93ef-4e5f-9d74-ae768a4e6553n%40googlegroups.com.

M K

unread,
May 26, 2021, 12:25:53 PM5/26/21
to rabbitmq-users
The above error is a red herring.

According to the logs provided to us on Slack, there is high connection churn in your environment [1][2][3].
If there is a way for you to demonstrate what your applications do, please share it. Our best guess is that it is a subtle
metric leak or something like that which can only be reproduced with a high enough connection churn and a long enough testing period.

With the amount of information we have right now, there are two fundamental recommendations:

 * Don't churn through connections and channels. That's not how the protocols RabbitMQ supports are meant to be used
 * You have a fair number of classic mirrored queue. Use quorum queues, they are a better replicated queue type in almost every way
Reply all
Reply to author
Forward
0 new messages