I'm hoping someone can help me understand why we're seeing very high memory usage (> 4gb) after a blue-green upgrade to v3.8.14 from v3.6.6 on Windows Server 2012R2.
We have 3 nodes in a cluster, 1 primary/active, 2 passive mirrors. All our connections go to the primary node, and that's where the queues are homed. All our queues are classic and persisted.
Before the upgrade the memory usage on the primary node was a steady 1.5GB. Since the upgrade the memory usage has climbed steadily (linear) until this morning where it triggered the memory alarm.
The memory usage on the 2 passive mirrors is steady at ~1.5GB.
According to the management UI the primary node has 2.5GB allocated to "Binaries", but when I look at binary references it only mentions ~9.5MB worth of memory.
I've tried a forced garbage collection, but it made no real difference to the amount of RAM consumed.
Reporting server status of node rabbit@MQ01 ...
Status of node rabbit@MQ01 ...
Runtime
OS PID: 1020
OS: Windows
Uptime (seconds): 233333
Is under maintenance?: false
RabbitMQ version: 3.8.14
Node name: rabbit@MQ01
Erlang configuration: Erlang/OTP 23 [erts-11.2] [source] [64-bit] [smp:6:6] [ds:6:6:10] [async-threads:1]
Erlang processes: 34159 used, 1048576 limit
Scheduler run queue: 1
Cluster heartbeat timeout (net_ticktime): 60
Plugins
Enabled plugin file: c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/enabled_plugins
Enabled plugins:
* rabbitmq_top
* rabbitmq_management
* amqp_client
* rabbitmq_web_dispatch
* cowboy
* cowlib
* rabbitmq_management_agent
Data directory
Node data directory: c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/db/rabbit@MQ01-mnesia
Raft data directory: c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/db/rabbit@MQ01-mnesia/quorum/rabbit@MQ01
Config files
* c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/advanced.config
* c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/rabbitmq.conf
Log file(s)
* c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rab...@MQ01.log
* c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rabbit@MQ01_upgrade.log
Alarms
Memory alarm on node rabbit@MQ01
Memory
Total memory used: 5.1439 gb
Calculation strategy: rss
Memory high watermark setting: 0.4 of available memory, computed to: 5.1537 gb
binary: 2.8048 gb (54.53 %)
allocated_unused: 0.9473 gb (18.42 %)
queue_procs: 0.6793 gb (13.21 %)
mgmt_db: 0.2611 gb (5.08 %)
other_proc: 0.1222 gb (2.38 %)
plugins: 0.0922 gb (1.79 %)
mnesia: 0.0829 gb (1.61 %)
msg_index: 0.0495 gb (0.96 %)
code: 0.0329 gb (0.64 %)
other_system: 0.0311 gb (0.61 %)
metrics: 0.0222 gb (0.43 %)
other_ets: 0.0168 gb (0.33 %)
atom: 0.0015 gb (0.03 %)
quorum_ets: 0.0 gb (0.0 %)
connection_other: 0.0 gb (0.0 %)
connection_channels: 0.0 gb (0.0 %)
connection_readers: 0.0 gb (0.0 %)
connection_writers: 0.0 gb (0.0 %)
queue_slave_procs: 0.0 gb (0.0 %)
quorum_queue_procs: 0.0 gb (0.0 %)
reserved_unallocated: 0.0 gb (0.0 %)
File Descriptors
Total: 3360, limit: 65439
Sockets: 3, limit: 58893
Free Disk Space
Low free disk space watermark: 0.05 gb
Free disk space: 104.257 gb
Totals
Connection count: 7
Queue count: 6615
Virtual host count: 720
Listeners
Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Interface: 0.0.0.0, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Interface: [::], port: 15672, protocol: http, purpose: HTTP API
Interface: 0.0.0.0, port: 15672, protocol: http, purpose: HTTP API
Cluster status of node rabbit@MQ01 ...
Basics
Cluster name: rabbit@MQ01
Disk Nodes
rabbit@MQ01
rabbit@MQ02
rabbit@MQ03
Running Nodes
rabbit@MQ01
rabbit@MQ02
rabbit@MQ03
Versions
rabbit@MQ01: RabbitMQ 3.8.14 on Erlang 23.3
rabbit@MQ02: RabbitMQ 3.8.14 on Erlang 23.3
rabbit@MQ03: RabbitMQ 3.8.14 on Erlang 23.3
Maintenance status
Node: rabbit@MQ01, status: not under maintenance
Node: rabbit@MQ02, status: not under maintenance
Node: rabbit@MQ03, status: not under maintenance
Alarms
(none)
Network Partitions
(none)
Listeners
Node: rabbit@MQ01, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@MQ01, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@MQ01, interface: 0.0.0.0, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@MQ01, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@MQ01, interface: 0.0.0.0, port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@MQ02, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@MQ02, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@MQ02, interface: 0.0.0.0, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@MQ02, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@MQ02, interface: 0.0.0.0, port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@MQ03, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@MQ03, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@MQ03, interface: 0.0.0.0, port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@MQ03, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@MQ03, interface: 0.0.0.0, port: 15672, protocol: http, purpose: HTTP API
Feature flags
Flag: drop_unroutable_metric, state: enabled
Flag: empty_basic_get_metric, state: enabled
Flag: implicit_default_bindings, state: enabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled
Application environment of node rabbit@MQ01 ...
[{amqp_client,
[{gen_server_call_timeout,130000},
{prefer_ipv6,false},
{ssl_options,[]},
{writer_gc_threshold,1000000000}]},
{asn1,[]},
{aten,
[{detection_threshold,0.99},
{heartbeat_interval,100},
{poll_interval,5000},
{scaling_factor,1.5}]},
{compiler,[]},
{cowboy,[]},
{cowlib,[]},
{credentials_obfuscation,[{enabled,true}]},
{crypto,[{fips_mode,false},{rand_cache_size,896}]},
{cuttlefish,[]},
{gen_batch_server,[]},
{goldrush,[]},
{inets,[]},
{jsx,[]},
{kernel,
[{inet_default_connect_options,[{nodelay,true}]},
{inet_dist_listen_max,25672},
{inet_dist_listen_min,25672},
{logger,
[{handler,default,logger_std_h,
#{config => #{type => standard_io},
formatter =>
{logger_formatter,
#{legacy_header => true,single_line => false}}}}]},
{logger_level,notice},
{logger_sasl_compatible,false},
{shell_docs_ansi,auto},
{shutdown_func,{rabbit_prelaunch,shutdown_func}}]},
{lager,
[{async_threshold,20},
{async_threshold_window,5},
{colored,false},
{colors,
[{debug,"\e[0;38m"},
{info,"\e[1;37m"},
{notice,"\e[1;36m"},
{warning,"\e[1;33m"},
{error,"\e[1;31m"},
{critical,"\e[1;35m"},
{alert,"\e[1;44m"},
{emergency,"\e[1;41m"}]},
{crash_log,"log/crash.log"},
{crash_log_count,5},
{crash_log_date,"$D0"},
{crash_log_msg_size,65536},
{crash_log_rotator,lager_rotator_default},
{crash_log_size,10485760},
{error_logger_format_raw,true},
{error_logger_hwm,5000},
{error_logger_hwm_original,50},
{error_logger_redirect,true},
{extra_sinks,
[{error_logger_lager_event,
[{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
{rabbit_handlers,
[{lager_forwarder_backend,[lager_event,info]}]}]},
{rabbit_log_lager_event,
[{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
{rabbit_handlers,
[{lager_forwarder_backend,[lager_event,info]}]}]},
{rabbit_log_channel_lager_event,
[{handlers,[{lager_forwarder_backend,[lager_event,warning]}]},
{rabbit_handlers,
[{lager_forwarder_backend,[lager_event,warning]}]}]},
{rabbit_log_connection_lager_event,
[{handlers,[{lager_forwarder_backend,[lager_event,warning]}]},
{rabbit_handlers,
[{lager_forwarder_backend,[lager_event,warning]}]}]},
{rabbit_log_feature_flags_lager_event,
[{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
{rabbit_handlers,
[{lager_forwarder_backend,[lager_event,info]}]}]},
{rabbit_log_federation_lager_event,
[{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
{rabbit_handlers,
[{lager_forwarder_backend,[lager_event,info]}]}]},
{rabbit_log_ldap_lager_event,
[{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
{rabbit_handlers,
[{lager_forwarder_backend,[lager_event,info]}]}]},
{rabbit_log_mirroring_lager_event,
[{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
{rabbit_handlers,
[{lager_forwarder_backend,[lager_event,info]}]}]},
{rabbit_log_prelaunch_lager_event,
[{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
{rabbit_handlers,
[{lager_forwarder_backend,[lager_event,info]}]}]},
{rabbit_log_queue_lager_event,
[{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
{rabbit_handlers,
[{lager_forwarder_backend,[lager_event,info]}]}]},
{rabbit_log_ra_lager_event,
[{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
{rabbit_handlers,
[{lager_forwarder_backend,[lager_event,info]}]}]},
{rabbit_log_shovel_lager_event,
[{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
{rabbit_handlers,
[{lager_forwarder_backend,[lager_event,info]}]}]},
{rabbit_log_upgrade_lager_event,
[{handlers,
[{lager_file_backend,
[{count,100},
{date,"$D0"},
{file,
"c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rabbit@MQ01_upgrade.log"},
{formatter_config,
[date," ",time," ",color,"[",severity,"] ",
{pid,[]},
" ",message,"\n"]},
{level,info},
{size,10485760}]}]},
{rabbit_handlers,
[{lager_file_backend,
[{count,100},
{date,"$D0"},
{file,
"c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rabbit@MQ01_upgrade.log"},
{formatter_config,
[date," ",time," ",color,"[",severity,"] ",
{pid,[]},
" ",message,"\n"]},
{level,info},
{size,10485760}]}]}]}]},
{handlers,
[{lager_file_backend,
[{count,100},
{date,"$D0"},
{file,
"c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rab...@MQ01.log"},
{formatter_config,
[date," ",time," ",color,"[",severity,"] ",
{pid,[]},
" ",message,"\n"]},
{level,debug},
{size,10485760}]}]},
{log_root,"c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log"},
{rabbit_handlers,
[{lager_file_backend,
[{count,100},
{date,"$D0"},
{file,
"c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rab...@MQ01.log"},
{formatter_config,
[date," ",time," ",color,"[",severity,"] ",
{pid,[]},
" ",message,"\n"]},
{level,debug},
{size,10485760}]}]}]},
{mnesia,
[{dir,
"c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/db/rabbit@MQ01-mnesia"}]},
{observer_cli,[{plugins,[]},{scheduler_usage,disable}]},
{os_mon,
[{start_cpu_sup,false},
{start_disksup,false},
{start_memsup,false},
{start_os_sup,false}]},
{public_key,[]},
{ra,[{data_dir,
"c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/db/rabbit@MQ01-mnesia/quorum"},
{logger_module,rabbit_log_ra_shim},
{wal_max_batch_size,4096},
{wal_max_size_bytes,536870912}]},
{rabbit,
[{auth_backends,[rabbit_auth_backend_internal]},
{auth_mechanisms,['PLAIN','AMQPLAIN']},
{autocluster,
[{peer_discovery_backend,rabbit_peer_discovery_classic_config}]},
{autoheal_state_transition_timeout,60000},
{background_gc_enabled,false},
{background_gc_target_interval,60000},
{backing_queue_module,rabbit_priority_queue},
{channel_max,2047},
{channel_operation_timeout,15000},
{channel_tick_interval,60000},
{cluster_keepalive_interval,10000},
{cluster_nodes,{[],disc}},
{cluster_partition_handling,autoheal},
{collect_statistics,fine},
{collect_statistics_interval,5000},
{config_entry_decoder,[{passphrase,undefined}]},
{connection_max,infinity},
{credit_flow_default_credit,{400,200}},
{default_consumer_prefetch,{false,0}},
{default_permissions,[<<".*">>,<<".*">>,<<".*">>]},
{default_user,<<"guest">>},
{default_user_tags,[administrator]},
{default_vhost,<<"/">>},
{delegate_count,16},
{disk_free_limit,50000000},
{disk_monitor_failure_retries,10},
{disk_monitor_failure_retry_interval,120000},
{enabled_plugins_file,
"c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/enabled_plugins"},
{feature_flags_file,
"c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/db/rabbit@MQ01-feature_flags"},
{fhc_read_buffering,false},
{fhc_write_buffering,true},
{frame_max,131072},
{halt_on_upgrade_failure,true},
{handshake_timeout,10000},
{heartbeat,60},
{lager_default_file,
"c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rab...@MQ01.log"},
{lager_log_root,
"c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log"},
{lager_upgrade_file,
"c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rabbit@MQ01_upgrade.log"},
{lazy_queue_explicit_gc_run_operation_threshold,1000},
{log,
[{categories,
[{channel,[{level,warning}]},
{connection,[{level,warning}]},
{upgrade,
[{file,
"c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rabbit@MQ01_upgrade.log"}]}]},
{file,
[{count,100},
{size,10485760},
{date,"$D0"},
{file,
"c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/log/rab...@MQ01.log"}]}]},
{loopback_users,[<<"guest">>]},
{max_message_size,134217728},
{memory_monitor_interval,2500},
{mirroring_flow_control,true},
{mirroring_sync_batch_size,4096},
{mnesia_table_loading_retry_limit,10},
{mnesia_table_loading_retry_timeout,30000},
{msg_store_credit_disc_bound,{4000,800}},
{msg_store_file_size_limit,16777216},
{msg_store_index_module,rabbit_msg_store_ets_index},
{msg_store_io_batch_size,4096},
{msg_store_shutdown_timeout,600000},
{num_ssl_acceptors,10},
{num_tcp_acceptors,10},
{password_hashing_module,rabbit_password_hashing_sha256},
{plugins_dir,
"c:/Program Files/RabbitMQ Server/rabbitmq_server-3.8.14/plugins"},
{plugins_expand_dir,
"c:/Users/SVC_RabbitMQ/AppData/Roaming/RabbitMQ/db/rabbit@MQ01-plugins-expand"},
{proxy_protocol,false},
{queue_explicit_gc_run_operation_threshold,1000},
{queue_index_embed_msgs_below,4096},
{queue_index_max_journal_entries,32768},
{quorum_cluster_size,3},
{quorum_commands_soft_limit,32},
{reverse_dns_lookups,false},
{server_properties,[]},
{ssl_allow_poodle_attack,false},
{ssl_apps,[asn1,crypto,public_key,ssl]},
{ssl_cert_login_from,distinguished_name},
{ssl_handshake_timeout,5000},
{ssl_listeners,[]},
{ssl_options,[]},
{tcp_listen_options,
[{backlog,128},
{nodelay,true},
{linger,{true,0}},
{exit_on_close,false}]},
{tcp_listeners,[{"auto",5672}]},
{trace_vhosts,[]},
{track_auth_attempt_source,false},
{tracking_execution_timeout,15000},
{vhost_restart_strategy,continue},
{vm_memory_calculation_strategy,rss},
{vm_memory_high_watermark,0.4},
{vm_memory_high_watermark_paging_ratio,0.5},
{writer_gc_threshold,1000000000}]},
{rabbit_common,[]},
{rabbitmq_management,
[{content_security_policy,
"script-src 'self' 'unsafe-eval' 'unsafe-inline'; object-src 'self'"},
{cors_allow_origins,[]},
{cors_max_age,1800},
{http_log_dir,none},
{load_definitions,none},
{management_db_cache_multiplier,5},
{process_stats_gc_timeout,300000},
{stats_event_max_backlog,250}]},
{rabbitmq_management_agent,
[{rates_mode,basic},
{sample_retention_policies,
[{global,[{605,5},{3660,60},{29400,600},{86400,1800}]},
{basic,[{605,5},{3600,60}]},
{detailed,[{605,5}]}]}]},
{rabbitmq_prelaunch,[]},
{rabbitmq_top,[]},
{rabbitmq_web_dispatch,[]},
{ranch,[]},
{recon,[]},
{sasl,[{errlog_type,error},{sasl_error_logger,false}]},
{ssl,[]},
{stdlib,[]},
{stdout_formatter,[]},
{syntax_tools,[]},
{sysmon_handler,
[{busy_dist_port,true},
{busy_port,false},
{gc_ms_limit,0},
{heap_word_limit,0},
{port_limit,100},
{process_limit,100},
{schedule_ms_limit,0}]},
{tools,[{file_util_search_methods,[{[],[]},{"ebin","esrc"},{"ebin","src"}]}]},
{xmerl,[]}]
Listing connections ...
pid name port host peer_port peer_host ssl ssl_protocol ssl_key_exchange ssl_cipher ssl_hash peer_cert_subject peer_cert_issuer peer_cert_validity state channels protocol auth_mechanism user vhost timeout frame_max channel_max client_properties recv_oct recv_cnt send_oct send_cnt send_pend connected_at
<rab...@MQ01.1621365939.9920.7127> __.__.21.85:60695 -> __.__.21.77:5672 5672 __.__.21.77 60695 __.__.21.85 false running 1 {0,9,1} PLAIN _app _9cb8abb0-a2f1-4c3b-b400-b9a672e29e93 60 131072 2047 [{"product","RabbitMQ"},{"version","5.2.0+dad9cee150674d4c70f275c32a39d7b95a1fb0f1"},{"platform",".NET"},{"copyright","Copyright (c) 2007-2020 VMware, Inc."},{"information","Licensed under the MPL. See http://www.rabbitmq.com/"},{"capabilities",[{"publisher_confirms",true},{"exchange_exchange_bindings",true},{"basic.nack",true},{"consumer_cancel_notify",true},{"connection.blocked",true},{"authentication_failure_close",true}]},{"connection_name",undefined}] 664 5 693246 5 0 1621599272184 <rab...@MQ01.1621365939.10867.7127> __.__.19.101:59024 -> __.__.21.77:5672 5672 __.__.21.77 59024 __.__.19.101 false running 0 {0,9,1} PLAIN _app _18bb788a-74dc-4b6e-8150-ae15c0a00c0b 60 131072 2047 [{"product","RabbitMQ"},{"version","5.2.0+dad9cee150674d4c70f275c32a39d7b95a1fb0f1"},{"platform",".NET"},{"copyright","Copyright (c) 2007-2020 VMware, Inc."},{"information","Licensed under the MPL. See http://www.rabbitmq.com/"},{"capabilities",[{"publisher_confirms",true},{"exchange_exchange_bindings",true},{"basic.nack",true},{"consumer_cancel_notify",true},{"connection.blocked",true},{"authentication_failure_close",true}]},{"connection_name",undefined}] 547 3 561 3 0 1621599272266
Listing channels ...
pid connection name number user vhost transactional confirm consumer_count messages_unacknowledged messages_uncommitted acks_uncommitted messages_unconfirmed prefetch_count global_prefetch_count
<rab...@MQ01.1621365939.9458.7127> <rab...@MQ01.1621365939.9920.7127> __.__.21.85:60695 -> __.__.21.77:5672 (1) 1 _app _9cb8abb0-a2f1-4c3b-b400-b9a672e29e93 false false 0 1 0 0 0 0 0
Command line arguments of node rabbit@MQ01 ...
[{root,["C:\\Program Files\\erl-23.3"]},
{progname,["erl"]},
{home,["C:\\windows\\system32\\config\\systemprofile"]},
{sname,["rabbit@MQ01"]},
{kernel,["inet_dist_listen_min","25672"]},
{kernel,["inet_dist_listen_max","25672"]},
{lager,["crash_log","false"]},
{lager,["handlers","[]"]},
{boot,["start_sasl"]}]
Listing RabbitMQ-specific environment variables defined on node rabbit@MQ01...
RABBITMQ_BASE=C:\Users\SVC_RabbitMQ\AppData\Roaming\RabbitMQ
RABBITMQ_NODENAME=rabbit@MQ01
Timeout: 60.0 seconds ...