Help, RabbitMQ 3.7.9 Mirror Cluster High CPU usage and node crash problem

124 views
Skip to first unread message

Vincent

unread,
Mar 31, 2023, 2:42:08 AM3/31/23
to rabbitmq-users
We used RabbitMQ 3.7.9 Mirror cluster in our production environment.
Cluster has 3 nodes. T-BJCA2-3RD-01 T-BJCA2-3RD-02
T-BJCA2-3RD-03The RabbitMQ host experienced a surge in CPU usage and memory usage at 2023-03-30 23:00:00
CPU.jpg
We monitor RabbitMQ memory usage also.
memory.jpg
mem2.jpg
All clients cannot connect to RabbitMQ Broker.
Here is environment info:
Application environment of node rabbit@T-BJCA2-3RD-01 ...
[{amqp_client,[{prefer_ipv6,false},{ssl_options,[]}]},
 {asn1,[]},
 {compiler,[]},
 {cowboy,[]},
 {cowlib,[]},
 {crypto,[{fips_mode,false},{rand_cache_size,896}]},
 {goldrush,[]},
 {inets,[]},
 {jsx,[]},
 {kernel,
     [{inet_default_connect_options,[{nodelay,true}]},
      {inet_dist_listen_max,25672},
      {inet_dist_listen_min,25672},
      {logger,
          [{handler,default,logger_std_h,
               #{config => #{type => standard_io},
                 formatter =>
                     {logger_formatter,
                         #{legacy_header => true,single_line => false}}}}]},
      {logger_level,notice},
      {logger_sasl_compatible,false}]},
 {lager,
     [{async_threshold,20},
      {async_threshold_window,5},
      {colored,false},
      {colors,
          [{debug,"\e[0;38m"},
           {info,"\e[1;37m"},
           {notice,"\e[1;36m"},
           {warning,"\e[1;33m"},
           {error,"\e[1;31m"},
           {critical,"\e[1;35m"},
           {alert,"\e[1;44m"},
           {emergency,"\e[1;41m"}]},
      {crash_log,"log/crash.log"},
      {crash_log_count,5},
      {crash_log_date,"$D0"},
      {crash_log_msg_size,65536},
      {crash_log_rotator,lager_rotator_default},
      {crash_log_size,10485760},
      {error_logger_format_raw,true},
      {error_logger_hwm,50},
      {error_logger_hwm_original,50},
      {error_logger_redirect,true},
      {extra_sinks,
          [{error_logger_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,inherit]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,inherit]}]}]},
           {rabbit_log_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,inherit]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,inherit]}]}]},
           {rabbit_log_channel_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,inherit]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,inherit]}]}]},
           {rabbit_log_connection_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,inherit]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,inherit]}]}]},
           {rabbit_log_mirroring_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,inherit]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,inherit]}]}]},
           {rabbit_log_queue_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,inherit]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,inherit]}]}]},
           {rabbit_log_federation_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,inherit]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,inherit]}]}]},
           {rabbit_log_upgrade_lager_event,
               [{handlers,
                    [{lager_file_backend,
                         [{date,[]},
                          {file,
                              "/mnt/data02/rabbitmq/log/rabbit@T-BJCA2-3RD-01_upgrade.log"},
                          {formatter_config,
                              [date," ",time," ",color,"[",severity,"] ",
                               {pid,[]},
                               " ",message,"\n"]},
                          {level,error},
                          {size,0}]}]},
                {rabbit_handlers,
                    [{lager_file_backend,
                         [{date,[]},
                          {file,
                              "/mnt/data02/rabbitmq/log/rabbit@T-BJCA2-3RD-01_upgrade.log"},
                          {formatter_config,
                              [date," ",time," ",color,"[",severity,"] ",
                               {pid,[]},
                               " ",message,"\n"]},
                          {level,error},
                          {size,0}]}]}]}]},
      {handlers,
          [{lager_file_backend,
               [{date,[]},
                {file,"/mnt/data02/rabbitmq/log/rab...@T-BJCA2-3RD-01.log"},
                {formatter_config,
                    [date," ",time," ",color,"[",severity,"] ",
                     {pid,[]},
                     " ",message,"\n"]},
                {level,error},
                {size,0}]}]},
      {log_root,"/mnt/data02/rabbitmq/log"},
      {rabbit_handlers,
          [{lager_file_backend,
               [{date,[]},
                {file,"/mnt/data02/rabbitmq/log/rab...@T-BJCA2-3RD-01.log"},
                {formatter_config,
                    [date," ",time," ",color,"[",severity,"] ",
                     {pid,[]},
                     " ",message,"\n"]},
                {level,error},
                {size,0}]}]}]},
 {mnesia,[{dir,"/mnt/data02/rabbitmq/data/rabbit@T-BJCA2-3RD-01"}]},
 {os_mon,
     [{start_cpu_sup,false},
      {start_disksup,false},
      {start_memsup,false},
      {start_os_sup,false}]},
 {public_key,[]},
 {rabbit,
     [{auth_backends,[rabbit_auth_backend_internal]},
      {auth_mechanisms,['PLAIN','AMQPLAIN']},
      {autocluster,
          [{peer_discovery_backend,rabbit_peer_discovery_classic_config}]},
      {background_gc_enabled,false},
      {background_gc_target_interval,60000},
      {backing_queue_module,rabbit_priority_queue},
      {channel_max,2047},
      {channel_operation_timeout,15000},
      {cluster_keepalive_interval,10000},
      {cluster_nodes,{[],disc}},
      {cluster_partition_handling,autoheal},
      {collect_statistics,fine},
      {collect_statistics_interval,5000},
      {config_entry_decoder,
          [{cipher,aes_cbc256},
           {hash,sha512},
           {iterations,1000},
           {passphrase,undefined}]},
      {connection_max,infinity},
      {credit_flow_default_credit,{400,200}},
      {default_consumer_prefetch,{false,0}},
      {default_permissions,[<<".*">>,<<".*">>,<<".*">>]},
      {default_user,<<"guest">>},
      {default_user_tags,[administrator]},
      {default_vhost,<<"/">>},
      {delegate_count,16},
      {disk_free_limit,50000000},
      {disk_monitor_failure_retries,10},
      {disk_monitor_failure_retry_interval,120000},
      {enabled_plugins_file,"/etc/rabbitmq/enabled_plugins"},
      {fhc_read_buffering,false},
      {fhc_write_buffering,true},
      {frame_max,131072},
      {halt_on_upgrade_failure,true},
      {handshake_timeout,15000},
      {heartbeat,60},
      {hipe_compile,false},
      {hipe_modules,
          [rabbit_reader,rabbit_channel,gen_server2,rabbit_exchange,
           rabbit_command_assembler,rabbit_framing_amqp_0_9_1,rabbit_basic,
           rabbit_event,lists,queue,priority_queue,rabbit_router,rabbit_trace,
           rabbit_misc,rabbit_binary_parser,rabbit_exchange_type_direct,
           rabbit_guid,rabbit_net,rabbit_amqqueue_process,
           rabbit_variable_queue,rabbit_binary_generator,rabbit_writer,
           delegate,gb_sets,lqueue,sets,orddict,rabbit_amqqueue,
           rabbit_limiter,gb_trees,rabbit_queue_index,
           rabbit_exchange_decorator,gen,dict,ordsets,file_handle_cache,
           rabbit_msg_store,array,rabbit_msg_store_ets_index,rabbit_msg_file,
           rabbit_exchange_type_fanout,rabbit_exchange_type_topic,mnesia,
           mnesia_lib,rpc,mnesia_tm,qlc,sofs,proplists,credit_flow,pmon,
           ssl_connection,tls_connection,ssl_record,tls_record,gen_fsm,ssl]},
      {lager_default_file,
          "/mnt/data02/rabbitmq/log/rab...@T-BJCA2-3RD-01.log"},
      {lager_extra_sinks,
          [rabbit_log_lager_event,rabbit_log_channel_lager_event,
           rabbit_log_connection_lager_event,rabbit_log_mirroring_lager_event,
           rabbit_log_queue_lager_event,rabbit_log_federation_lager_event,
           rabbit_log_upgrade_lager_event]},
      {lager_log_root,"/mnt/data02/rabbitmq/log"},
      {lager_upgrade_file,
          "/mnt/data02/rabbitmq/log/rabbit@T-BJCA2-3RD-01_upgrade.log"},
      {lazy_queue_explicit_gc_run_operation_threshold,1000},
      {log,
          [{file,
               [{level,error},
                {file,"/mnt/data02/rabbitmq/log/rab...@T-BJCA2-3RD-01.log"}]},
           {categories,
               [{upgrade,
                    [{file,
                         "/mnt/data02/rabbitmq/log/rabbit@T-BJCA2-3RD-01_upgrade.log"}]}]}]},
      {loopback_users,[<<"guest">>]},
      {memory_monitor_interval,2500},
      {mirroring_flow_control,true},
      {mirroring_sync_batch_size,4096},
      {mnesia_table_loading_retry_limit,10},
      {mnesia_table_loading_retry_timeout,30000},
      {msg_store_credit_disc_bound,{4000,800}},
      {msg_store_file_size_limit,16777216},
      {msg_store_index_module,rabbit_msg_store_ets_index},
      {msg_store_io_batch_size,4096},
      {num_ssl_acceptors,10},
      {num_tcp_acceptors,10},
      {password_hashing_module,rabbit_password_hashing_sha256},
      {plugins_dir,
          "/usr/lib/rabbitmq/plugins:/usr/lib/rabbitmq/lib/rabbitmq_server-3.7.9/plugins"},
      {plugins_expand_dir,
          "/mnt/data02/rabbitmq/data/rabbit@T-BJCA2-3RD-01-plugins-expand"},
      {proxy_protocol,false},
      {queue_explicit_gc_run_operation_threshold,1000},
      {queue_index_embed_msgs_below,4096},
      {queue_index_max_journal_entries,32768},
      {reverse_dns_lookups,false},
      {server_properties,[]},
      {ssl_allow_poodle_attack,false},
      {ssl_apps,[asn1,crypto,public_key,ssl]},
      {ssl_cert_login_from,distinguished_name},
      {ssl_handshake_timeout,5000},
      {ssl_listeners,[]},
      {ssl_options,[]},
      {tcp_listen_options,
          [{backlog,128},
           {nodelay,true},
           {linger,{true,0}},
           {exit_on_close,false}]},
      {tcp_listeners,[5672]},
      {trace_vhosts,[]},
      {vhost_restart_strategy,continue},
      {vm_memory_calculation_strategy,rss},
      {vm_memory_high_watermark,0.4},
      {vm_memory_high_watermark_paging_ratio,0.5}]},
 {rabbit_common,[]},
 {rabbitmq_management,
     [{content_security_policy,"default-src 'self'"},
      {cors_allow_origins,[]},
      {cors_max_age,1800},
      {http_log_dir,none},
      {load_definitions,none},
      {management_db_cache_multiplier,5},
      {process_stats_gc_timeout,300000},
      {stats_event_max_backlog,250}]},
 {rabbitmq_management_agent,
     [{rates_mode,basic},
      {sample_retention_policies,
          [{global,[{605,5},{3660,60},{29400,600},{86400,1800}]},
           {basic,[{605,5},{3600,60}]},
           {detailed,[{605,5}]}]}]},
 {rabbitmq_web_dispatch,[]},
 {ranch,[]},
 {ranch_proxy_protocol,[{proxy_protocol_timeout,55000},{ssl_accept_opts,[]}]},
 {recon,[]},
 {sasl,[{errlog_type,error},{sasl_error_logger,false}]},
 {ssl,[]},
 {stdlib,[]},
 {syntax_tools,[]},
 {xmerl,[]}]

Attachment is RabbitMQ log file.
Please help. 
Thanks.

Eric G.


3rd-01-messages
rabbit@T-BJCA2-3RD-01.log

Michal Kuratczyk

unread,
Mar 31, 2023, 3:19:09 AM3/31/23
to rabbitm...@googlegroups.com
3.7 has been out of support for a few years now.
Mirrored queues have been deprecated for a few years as well. One of the main reasons being their unpredictability in situations like this.

There have been many improvements since 3.7.9, so:
1. upgrade to a supported version (we have fixed at least 1 memory spike in mirroring in the last few months)
2. when 3.12 is released, upgrade to it and try classic queues v2 (you can apply a policy with `x-queue-version=2`)
3. migrate to quorum queues

The only potential quick fix I can think of is to try lazy queues if you are not currently.

Best,

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/b560b36a-ea20-4048-b475-27dd8a5cacdan%40googlegroups.com.


--
Michał
RabbitMQ team
Message has been deleted

Michal Kuratczyk

unread,
Apr 3, 2023, 3:13:40 AM4/3/23
to rabbitm...@googlegroups.com
Hi,

You are asking for support for an unsupported version. :)

Mirroring is known to have memory and CPU spikes in some situations. This is the most likely culprit.
This issue talks about dead-lettering but it can happen in other situations where a lot of messages need to be
synchronized between mirrors or published to a mirrored queue.

If you care about your system, keep it up to date.

Best,

On Mon, Apr 3, 2023 at 9:00 AM Vincent <zhouguoq...@gmail.com> wrote:
OK,Thanks.
From the RabbitMQ log, Can you give me some reason for CPU usage and node crash problem.


--
Michał
RabbitMQ team
Reply all
Reply to author
Forward
0 new messages