scheduler threads creating a high CPU load during "general code execution" (emulator)

620 views
Skip to first unread message

Fritz Schieber

unread,
Sep 21, 2021, 5:33:30 AM9/21/21
to rabbitmq-users
Hello, 
I am facing some issues in our rabbitmq environment, where the beam.smp process is causing a permanent 300+% CPU load. 
There have been similar issues with the scheduler being "busy waiting" (Runtime Tuning — RabbitMQ) but my problem differs as it is not the aux/other thread causing the high CPU load, it is listed under the "emulator" thread:
rabbitmqctl eval 'msacc:start(30000), msacc:print(msacc:stats(),#{system => true}).'
        Thread      aux      check_io      emulator            gc         other          port         sleep
Stats per thread:
...
 scheduler( 1)  1.98%( 0.0%)  0.76%( 0.0%) 36.94%( 0.2%)  4.56%( 0.0%) 13.20%( 0.1%)  4.04%( 0.0%) 38.53%( 0.2%)
 scheduler( 2)  1.73%( 0.0%)  0.67%( 0.0%) 32.18%( 0.1%)  3.95%( 0.0%) 12.42%( 0.1%)  3.57%( 0.0%) 45.47%( 0.2%)
 scheduler( 3)  1.71%( 0.0%)  0.68%( 0.0%) 31.77%( 0.1%)  3.89%( 0.0%) 13.06%( 0.1%)  3.56%( 0.0%) 45.34%( 0.2%)
 scheduler( 4)  1.76%( 0.0%)  0.70%( 0.0%) 33.23%( 0.1%)  4.04%( 0.0%) 12.64%( 0.1%)  3.77%( 0.0%) 43.86%( 0.2%)
 scheduler( 5)  1.22%( 0.0%)  0.46%( 0.0%) 21.76%( 0.1%)  2.62%( 0.0%) 10.05%( 0.0%)  2.29%( 0.0%) 61.59%( 0.3%)
 scheduler( 6)  0.34%( 0.0%)  0.14%( 0.0%)  5.39%( 0.0%)  0.66%( 0.0%)  3.75%( 0.0%)  0.56%( 0.0%) 89.15%( 0.4%)
 scheduler( 7)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.15%( 0.0%)  0.00%( 0.0%) 99.85%( 0.4%)
 scheduler( 8)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.18%( 0.0%)  0.00%( 0.0%) 99.82%( 0.4%)
 scheduler( 9)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.14%( 0.0%)  0.00%( 0.0%) 99.85%( 0.4%)
 scheduler(10)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.20%( 0.0%)  0.00%( 0.0%) 99.79%( 0.4%)
 scheduler(11)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.15%( 0.0%)  0.00%( 0.0%) 99.84%( 0.4%)
 scheduler(12)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.00%( 0.0%)  0.22%( 0.0%)  0.00%( 0.0%) 99.78%( 0.4%)

Our RabbitMQ/Erlang versions:
RabbitMQ 3.8.3 on Erlang 21.3.8.8

Top output: 
  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
16862 rabbitmq  20   0 8066736 428220   4944 R 69.0  0.9   4732:58 1_scheduler
16864 rabbitmq  20   0 8066736 428220   4944 R 56.3  0.9   4308:59 3_scheduler
16865 rabbitmq  20   0 8066736 428220   4944 R 42.5  0.9   3981:52 4_scheduler
16866 rabbitmq  20   0 8066736 428220   4944 S 36.8  0.9   2024:02 5_scheduler
16863 rabbitmq  20   0 8066736 428220   4944 S 29.9  0.9   4336:34 2_scheduler
16868 rabbitmq  20   0 8066736 428220   4944 R 25.3  0.9 279:17.03 7_scheduler
16867 rabbitmq  20   0 8066736 428220   4944 S  5.7  0.9 684:51.68 6_scheduler

pstack on one of the affected thread-id: 
pstack 16862
Thread 1 (process 16862):
#0  0x00007f7f9151ac89 in syscall () from /lib64/libc.so.6
#1  0x000000000068b6d0 in ethr_event_twait ()
#2  0x0000000000460a06 in erts_schedule ()
#3  0x000000000044f2fd in process_main ()
#4  0x000000000046615b in sched_thread_func ()
#5  0x000000000068ac4f in thr_wrapper ()
#6  0x00007f7f919ffea5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f7f9152096d in clone () from /lib64/libc.so.6

Rabbitmqctl diagnostics (environment): 
[{amqp_client,
     [{prefer_ipv6,false},{ssl_options,[]},{writer_gc_threshold,1000000000}]},
 {asn1,[]},
 {aten,
     [{detection_threshold,0.99},
      {heartbeat_interval,100},
      {poll_interval,1000}]},
 {compiler,[]},
 {cowboy,[]},
 {cowlib,[]},
 {credentials_obfuscation,
     [{enabled,true},{ets_table_name,credentials_obfuscation}]},
 {crypto,[{fips_mode,false},{rand_cache_size,896}]},
 {gen_batch_server,[]},
 {goldrush,[]},
 {inets,[]},
 {jsx,[]},
 {kernel,
     [{inet_default_connect_options,[{nodelay,true}]},
      {inet_dist_listen_max,5674},
      {inet_dist_listen_min,5674},
      {logger,
          [{handler,default,logger_std_h,
               #{config => #{type => standard_io},
                 formatter =>
                     {logger_formatter,
                         #{legacy_header => true,single_line => false}}}}]},
      {logger_level,notice},
      {logger_sasl_compatible,false}]},
 {lager,
     [{async_threshold,20},
      {async_threshold_window,5},
      {colored,false},
      {colors,
          [{debug,"\e[0;38m"},
           {info,"\e[1;37m"},
           {notice,"\e[1;36m"},
           {warning,"\e[1;33m"},
           {error,"\e[1;31m"},
           {critical,"\e[1;35m"},
           {alert,"\e[1;44m"},
           {emergency,"\e[1;41m"}]},
      {crash_log,"log/crash.log"},
      {crash_log_count,5},
      {crash_log_date,"$D0"},
      {crash_log_msg_size,65536},
      {crash_log_rotator,lager_rotator_default},
      {crash_log_size,10485760},
      {error_logger_format_raw,true},
      {error_logger_hwm,50},
      {error_logger_hwm_original,50},
      {error_logger_redirect,true},
      {extra_sinks,
          [{error_logger_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,inherit]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,inherit]}]}]},
           {rabbit_log_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,inherit]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,inherit]}]}]},
           {rabbit_log_channel_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,info]}]}]},
           {rabbit_log_connection_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,info]}]}]},
           {rabbit_log_ldap_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,inherit]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,inherit]}]}]},
           {rabbit_log_mirroring_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,info]}]}]},
           {rabbit_log_queue_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,inherit]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,inherit]}]}]},
           {rabbit_log_ra_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,inherit]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,inherit]}]}]},
           {rabbit_log_federation_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,info]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,info]}]}]},
           {rabbit_log_shovel_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,inherit]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,inherit]}]}]},
           {rabbit_log_upgrade_lager_event,
               [{handlers,[{lager_forwarder_backend,[lager_event,inherit]}]},
                {rabbit_handlers,
                    [{lager_forwarder_backend,[lager_event,inherit]}]}]}]},
      {handlers,
          [{syslog_lager_backend,
               [info,{},
                {lager_default_formatter,
                    [color,"[",severity,"] ",{pid,[]}," ",message,"\n"]}]},
           {lager_console_backend,
               [{formatter_config,
                    [date," ",time," ",color,"[",severity,"] ",
                     {pid,[]},
                     " ",message,"\n"]},
                {level,info}]}]},
      {log_root,"/var/log/rabbitmq"},
      {rabbit_handlers,
          [{syslog_lager_backend,
               [info,{},
                {lager_default_formatter,
                    [color,"[",severity,"] ",{pid,[]}," ",message,"\n"]}]},
           {lager_console_backend,
               [{formatter_config,
                    [date," ",time," ",color,"[",severity,"] ",
                     {pid,[]},
                     " ",message,"\n"]},
                {level,info}]}]}]},
 {mnesia,[{dir,"/var/lib/rabbitmq/mnesia/rabbit@sredfrqappa"}]},
 {observer_cli,[{plugins,[]}]},
 {os_mon,
     [{start_cpu_sup,false},
      {start_disksup,false},
      {start_memsup,false},
      {start_os_sup,false}]},
 {public_key,[]},
 {ra,[{data_dir,"/var/lib/rabbitmq/mnesia/rabbit@sredfrqappa/quorum"},
      {logger_module,rabbit_log_ra_shim},
      {segment_max_entries,32768},
      {wal_max_size_bytes,536870912}]},
 {rabbit,
     [{auth_backends,[rabbit_auth_backend_internal,rabbit_auth_backend_http]},
      {auth_mechanisms,['PLAIN','AMQPLAIN']},
      {autocluster,
          [{peer_discovery_backend,rabbit_peer_discovery_classic_config}]},
      {autoheal_state_transition_timeout,60000},
      {background_gc_enabled,false},
      {background_gc_target_interval,60000},
      {backing_queue_module,rabbit_priority_queue},
      {channel_max,2047},
      {channel_operation_timeout,15000},
      {channel_tick_interval,60000},
      {cluster_keepalive_interval,10000},
      {cluster_nodes,{[],disc}},
      {cluster_partition_handling,ignore},
      {collect_statistics,fine},
      {collect_statistics_interval,5000},
      {config_entry_decoder,
          [{passphrase,{file,"/var/lib/rabbitmq/.rmqpass"}}]},
      {connection_max,infinity},
      {credit_flow_default_credit,{400,200}},
      {default_consumer_prefetch,{false,0}},
      {default_permissions,[<<".*">>,<<".*">>,<<".*">>]},
      {default_user,<<"rabbitmq">>},
      {default_user_tags,[administrator]},
      {default_vhost,<<"/">>},
      {delegate_count,16},
      {disk_free_limit,"500MB"},
      {disk_monitor_failure_retries,10},
      {disk_monitor_failure_retry_interval,120000},
      {enabled_plugins_file,"/etc/rabbitmq/enabled_plugins"},
      {feature_flags_file,
          "/var/lib/rabbitmq/mnesia/rabbit@sredfrqappa-feature_flags"},
      {fhc_read_buffering,false},
      {fhc_write_buffering,true},
      {frame_max,131072},
      {halt_on_upgrade_failure,true},
      {handshake_timeout,10000},
      {heartbeat,60},
      {hipe_compile,false},
      {hipe_modules,
          [rabbit_reader,rabbit_channel,gen_server2,rabbit_exchange,
           rabbit_command_assembler,rabbit_framing_amqp_0_9_1,rabbit_basic,
           rabbit_event,lists,queue,priority_queue,rabbit_router,rabbit_trace,
           rabbit_misc,rabbit_binary_parser,rabbit_exchange_type_direct,
           rabbit_guid,rabbit_net,rabbit_amqqueue_process,
           rabbit_variable_queue,rabbit_binary_generator,rabbit_writer,
           delegate,gb_sets,lqueue,sets,orddict,rabbit_amqqueue,
           rabbit_limiter,gb_trees,rabbit_queue_index,
           rabbit_exchange_decorator,gen,dict,ordsets,file_handle_cache,
           rabbit_msg_store,array,rabbit_msg_store_ets_index,rabbit_msg_file,
           rabbit_exchange_type_fanout,rabbit_exchange_type_topic,mnesia,
           mnesia_lib,rpc,mnesia_tm,qlc,sofs,proplists,credit_flow,pmon,
           ssl_connection,tls_connection,ssl_record,tls_record,gen_fsm,ssl]},
      {lager_default_file,tty},
      {lager_extra_sinks,
          [rabbit_log_lager_event,rabbit_log_channel_lager_event,
           rabbit_log_connection_lager_event,rabbit_log_ldap_lager_event,
           rabbit_log_mirroring_lager_event,rabbit_log_queue_lager_event,
           rabbit_log_ra_lager_event,rabbit_log_federation_lager_event,
           rabbit_log_shovel_lager_event,rabbit_log_upgrade_lager_event]},
      {lager_log_root,"/var/log/rabbitmq"},
      {lager_upgrade_file,tty},
      {lazy_queue_explicit_gc_run_operation_threshold,1000},
      {log,
          [{syslog,[{enabled,true}]},
           {categories,
               [{connection,[{level,info}]},
                {channel,[{level,info}]},
                {federation,[{level,info}]},
                {mirroring,[{level,info}]}]},
           {console,[{enabled,true}]}]},
      {loopback_users,[<<"guest">>]},
      {max_message_size,134217728},
      {memory_monitor_interval,2500},
      {mirroring_flow_control,true},
      {mirroring_sync_batch_size,4096},
      {mnesia_table_loading_retry_limit,10},
      {mnesia_table_loading_retry_timeout,30000},
      {msg_store_credit_disc_bound,{4000,800}},
      {msg_store_file_size_limit,16777216},
      {msg_store_index_module,rabbit_msg_store_ets_index},
      {msg_store_io_batch_size,4096},
      {num_ssl_acceptors,10},
      {num_tcp_acceptors,10},
      {password_hashing_module,rabbit_password_hashing_sha256},
      {plugins_dir,
          "/usr/lib/rabbitmq/plugins:/usr/lib/rabbitmq/lib/rabbitmq_server-3.8.3/plugins"},
      {plugins_expand_dir,
          "/var/lib/rabbitmq/mnesia/rabbit@sredfrqappa-plugins-expand"},
      {proxy_protocol,false},
      {queue_explicit_gc_run_operation_threshold,1000},
      {queue_index_embed_msgs_below,4096},
      {queue_index_max_journal_entries,32768},
      {quorum_cluster_size,5},
      {quorum_commands_soft_limit,256},
      {reverse_dns_lookups,false},
      {server_properties,[]},
      {ssl_allow_poodle_attack,false},
      {ssl_apps,[asn1,crypto,public_key,ssl]},
      {ssl_cert_login_from,common_name},
      {ssl_handshake_timeout,5000},
      {ssl_listeners,[5671]},
      {ssl_options,
          [{cacertfile,"/etc/pki/CA/cacert.pem"},
           {certfile,"/etc/pki/CA/private/localhost.pem"},
           {keyfile,"/etc/pki/CA/private/localhost.rsa"},
           {depth,2},
           {verify,verify_peer},
           {fail_if_no_peer_cert,false}]},
      {tcp_listen_options,
          [{backlog,128},
           {nodelay,true},
           {linger,{true,0}},
           {exit_on_close,false}]},
      {tcp_listeners,[{"auto",5672}]},
      {trace_vhosts,[]},
      {vhost_restart_strategy,continue},
      {vm_memory_calculation_strategy,rss},
      {vm_memory_high_watermark,0.4},
      {vm_memory_high_watermark_paging_ratio,0.5},
      {writer_gc_threshold,1000000000}]},
 {rabbit_common,[]},
 {rabbitmq_auth_backend_http,
     [{http_method,post},
      {ssl_options,
          [{cacertfile,"/etc/pki/CA/cacert.pem"},
           {certfile,"/etc/pki/CA/private/localhost.pem"},
           {keyfile,"/etc/pki/CA/private/localhost.rsa"},
           {verify,verify_peer},
           {fail_if_no_peer_cert,true}]},
      {topic_path,"http://localhost:8000/auth/topic"},
 {rabbitmq_auth_mechanism_ssl,[{name_from,distinguished_name}]},
 {rabbitmq_consistent_hash_exchange,[]},
 {rabbitmq_event_exchange,[]},
 {rabbitmq_federation,
     [{internal_exchange_check_interval,30000},
      {pgroup_name_cluster_id,false}]},
 {rabbitmq_federation_management,[]},
 {rabbitmq_management,
     [{content_security_policy,
          "script-src 'self' 'unsafe-eval' 'unsafe-inline'; object-src 'self'"},
      {cors_allow_origins,[]},
      {cors_max_age,1800},
      {http_log_dir,none},
      {listener,[{ip,"127.0.0.1"},{port,5675}]},
      {load_definitions,none},
      {management_db_cache_multiplier,5},
      {process_stats_gc_timeout,300000},
      {stats_event_max_backlog,250}]},
 {rabbitmq_management_agent,
     [{rates_mode,basic},
      {sample_retention_policies,
          [{global,[{605,5},{3660,60},{29400,600},{86400,1800}]},
           {basic,[{605,5},{3600,60}]},
           {detailed,[{605,5}]}]}]},
 {rabbitmq_web_dispatch,[]},
 {ranch,[]},
 {recon,[]},
 {sasl,[{errlog_type,error},{sasl_error_logger,tty}]},
 {ssl,
     [{dtls_protocol_version,['dtlsv1.2',dtlsv1]},
      {protocol_version,['tlsv1.2','tlsv1.1',tlsv1]}]},
 {stdlib,[]},
 {stdout_formatter,[]},
 {syntax_tools,[]},
 {syslog,
     [{app_name,"rabbit"},
      {dest_host,"sredfrqappa"},
      {dest_port,516},
      {syslog_error_logger,false}]},
 {sysmon_handler,
     [{busy_dist_port,true},
      {busy_port,false},
      {gc_ms_limit,0},
      {heap_word_limit,0},
      {port_limit,100},
      {process_limit,100},
      {schedule_ms_limit,0}]},
 {tools,[{file_util_search_methods,[{[],[]},{"ebin","esrc"},{"ebin","src"}]}]},
 {xmerl,[]}]

Any idea, how this issue could be caused? 

Many thanks,

Friedrich

Michal Kuratczyk

unread,
Sep 21, 2021, 6:09:35 AM9/21/21
to rabbitm...@googlegroups.com
Hi,

Run `rabbitmq-diagnostics observer` and then "rr" to sort Erlang processes by the number of reductions (which roughly translate to CPU usage).

Also, you are quite far behind on both RabbitMQ and especially Erlang. Erlang 24 contains a just-in-time compiler which provides a pretty nice performance improvement.

Best,

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/c390c51a-b543-453a-be17-2dc65de1c2ben%40googlegroups.com.


--
Michał
RabbitMQ team

Fritz Schieber

unread,
Sep 22, 2021, 6:09:07 AM9/22/21
to rabbitmq-users
Hi,

thanks for the quick response - I really see a very high number of reductions: 
|Home(H)|Network(N)|System(S)|Ets(E)/Mnesia(M)|App(A)|Doc(D)|Plugin(P)recon:proc_window(reductions, 15, 5000) Interval | 5Days 3:40:56    |
|Erlang/OTP 21 [erts-10.3.5.6] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:192] [hipe]                                     |
|System     | Count/Limit           | System Switch             | Status                | Memory Info          | Size                     |
|Proc Count | 1943/1048576          | Version                   | 21.3.8.8              | Allocted Mem         | 494.0234 MB     | 100.0% |
|Port Count | 90/65536              | Multi Scheduling          | enabled               | Use Mem              | 279.4012 MB     | 56.56% |
|Atom Count | 75395/5000000         | Logical Processors        | 12                    | Unuse Mem            | 214.6223 MB     | 43.44% |
|Mem Type   | Size                  | Mem Type                  | Size                  | IO/GC                | Interval: 5000ms         |
|Total      | 284.4832 MB  | 100.0% | Binary                    | 109.0659 MB  | 38.34% | IO Output            | 6.7192 MB                |
|Process    | 113.9583 MB  | 40.06% | Code                      | 24.9186 MB   | 08.76% | IO Input             | 7.5674 MB                |
|Atom       | 2.7702 MB    | 00.97% | Reductions                | 64411609              | Gc Count             | 261606                   |
|Ets        | 12.8199 MB   | 04.51% | RunQueue/ErrorLoggerQueue | 4/0                   | Gc Words Reclaimed   | 250074121                |
||1  |||||||||||||          59.67% |7  |                      08.87% |13                        00.00% |19                         00.00% |
||2  ||||||||||||           59.05% |8  |                      05.87% |14                        00.00% |20                         00.00% |
||3  |||||||||||            51.80% |9                         02.11% |15                        00.00% |21                         00.00% |
||4  |||||||||              44.14% |10                        00.01% |16                        00.00% |22                         00.00% |
||5  ||||                   21.50% |11                        00.01% |17                        00.00% |23                         01.09% |
||6  |||                    14.78% |12                        00.01% |18                        00.00% |24                         00.00% |
|No | Pid        |     Reductions      |Name or Initial Call                  |      Memory | MsgQueue |Current Function                  |
|1  |<0.4318.0>  |50321655             |rabbit_event                          |    2.8398 KB| 0        |gen_event:fetch_msg/6             |
|2  |<0.5507.0>  |18290861             |rabbit_prequeue:init/1                |  174.1641 KB| 0        |code_server:call/1                |
|3  |<0.5496.0>  |8234920              |rabbit_reader:init/3                  |   26.4258 KB| 1        |gen:do_call/4                     |
|4  |<0.5504.0>  |7479697              |rabbit_channel:init/1                 |    2.4478 MB| 0        |gen:do_call/4                     |
|5  |<0.5493.0>  |6977675              |tls_connection:init/1                 |   66.7305 KB| 0        |gen_statem:loop_receive/3         |
|6  |<0.5514.0>  |5727849              |rabbit_federation_queue_link:init/1   |   54.5664 KB| 2        |gen_server2:drain/1               |
|7  |<0.5530.0>  |5515161              |tls_sender:init/1                     |   11.6992 KB| 0        |prim_inet:send_recv_reply/2       |
|8  |<0.5541.0>  |3979285              |rabbit_writer:enter_mainloop/2        |  172.2617 KB| 0        |gen:do_call/4                     |
|9  |<0.5540.0>  |3398158              |amqp_channel:init/1                   |  119.5000 KB| 1        |gen_server:loop/7                 |
|10 |<0.5539.0>  |3252608              |amqp_gen_consumer:init/1              |   29.0508 KB| 1        |gen_server2:process_next_msg/1    |
|11 |<0.17169.51>|2600068              |rabbit_mgmt_db_cache_channels         |   15.6072 MB| 0        |gen_server:loop/7                 |
|12 |<0.4757.0>  |1067482              |consumer_deleted_metrics_gc           |   41.3398 KB| 0        |gen_server:loop/7                 |
|13 |<0.49.0>    |828193               |code_server                           |  673.3633 KB| 0        |code_server:loop/1                |
|14 |<0.122.0>   |74421                |error_logger                          |  675.3555 KB| 0        |gen_event:fetch_msg/6             |
|15 |<0.22938.284|50216                |cowboy_clear:connection_process/4     |   16.3984 KB| 0        |cowboy_http:loop/2                |
|q(quit) p(pause) r/rr(reduction) m/mm(mem) b/bb(binary mem) t/tt(total heap size) mq/mmq(msg queue) 9(proc 9 info) F/B(page forward/back |

Do you also have any hint how to interpret or debug this values? 

The reason we can't upgrade to a newer version of erlang/rabbitmq yet has been due to the incompatibility of the resource agent for our ha-cluster: 
The resource agent uses node_health_check - a command that is no longer supported starting from Rabbitmq 3.8.5. 
Do you know, if there is any upgrade of the RA planned or is another RA to be used? 

Many thanks,

Friedrich

Loïc Hoguin

unread,
Sep 23, 2021, 5:55:46 AM9/23/21
to rabbitm...@googlegroups.com

Hello,

 

You have one queue doing a lot of work, and rabbit_event processing being busier processing events, which is a little odd but could be explained by a great number of things. You can look into the process itself to see what it’s doing. The current_stacktrace is already a good indication. The process state another.

 

Note that before doing any of that you should rule out common problems like high connection churn, and have a look into the logs for anything unusual.

 

Chances are if this is due to an issue it was noticed and fixed as well so trying a more recent version should help.

 

> The resource agent uses node_health_check - a command that is no longer supported starting from Rabbitmq 3.8.5. 

 

While node_health_check has been deprecated, the command is still available. I do not think this will cause any problems. Have you tried upgrading RabbitMQ to see if it works with the script?

 

Reading the script it seems that the node_health_check can also be disabled: https://github.com/rabbitmq/rabbitmq-server/blob/master/scripts/rabbitmq-server-ha.ocf#L336-L344

 

Also note that the script now lives in the main RabbitMQ repository.

 

> Do you know, if there is any upgrade of the RA planned or is another RA to be used? 


I don’t know anything about it personally. But I have seen that we have merged a few patches to make it work with recent Erlang versions, suggesting it might still be used and compatible with recent releases.

 

Sorry I can’t be of more help.

 

-- 

Loïc Hoguin

Fritz Schieber

unread,
Sep 28, 2021, 4:45:30 AM9/28/21
to rabbitmq-users
Hi again,

what's a possible way to look into the current_stacktrace? I did not find too much information on how to debug this and am quite new to rabbitmq/erlang myself. 

Thank you very much for the information about the resource agent, I will try to perform an upgrade in the meantime on another system. 

Best regards,

Friedrich Schieber
Reply all
Reply to author
Forward
0 new messages