RabbitMQ use 2 Node(Node0 Node1), enabled ha-mode=all, also create 25K queues and 25K TCP connection(25K channel) on rabbitmq-broker.
25K queues was create at Node0, and mirred all queues on Node1, when restart Node0, the Node1 accepting 25k TCP connection, and the close the connection because of Missed heartbeats from client, timeout: 60s.
1、the node0 down ,node1 recv the node down message, but node1 have no message ' Slave <rab...@rabbitmqNode1.1.3227.0> saw deaths of mirrors <rab...@rabbitmqNode0.1.11859.0> and promoting slave to master“
the normal case , if node0 down , the node1 have these messages in log, but this time have no these messages in mq log, only tcp conntion accepting and closing in 15 min.
2、why rabbitmq have so many heartBeat miss and close the client connection, at last,the connection reach the system limit
3、after the node0 restart, node0 want to join node1 cluster for autocluster, node0 have {badrpc, nodedown} messge ,auto-cluster failed。
..................
=ERROR REPORT==== 28-Apr-2018::01:57:14 ===
Missed heartbeats from client, timeout: 60s
=ERROR REPORT==== 28-Apr-2018::01:57:14 ===
Too many processes
=ERROR REPORT==== 28-Apr-2018::01:57:14 ===
Too many processes
=ERROR REPORT==== 28-Apr-2018::01:57:14 ===
** Generic server <0.14434.6> terminating
** Last message in was {inet_async,#Port<0.69843>,14461,{ok,#Port<0.485076>}}
** When Server state == {state,
{rabbit_networking,start_ssl_client,
[[{versions,['tlsv1.1','tlsv1.2']},
{certfile,
"/etc/rabbitmq/rabbitmq-server.crt"},
{keyfile,
"/etc/rabbitmq/rabbitmq-server.key"},
{ciphers,
[{dhe_rsa,aes_256_cbc,sha256},
{dhe_dss,aes_256_cbc,sha256},
{dhe_rsa,aes_128_cbc,sha256},
{dhe_dss,aes_128_cbc,sha256},
{dhe_dss,aes_256_cbc,sha},
{dhe_dss,aes_128_cbc,sha},
{ecdhe_ecdsa,aes_256_gcm,null,sha384},
{ecdhe_rsa,aes_256_gcm,null,sha384},
{ecdh_ecdsa,aes_256_gcm,null,sha384},
{ecdh_rsa,aes_256_gcm,null,sha384},
{dhe_rsa,aes_256_gcm,null,sha384},
{dhe_dss,aes_256_gcm,null,sha384},
{ecdhe_ecdsa,aes_128_gcm,null,sha256},
{ecdhe_rsa,aes_128_gcm,null,sha256},
{ecdh_ecdsa,aes_128_gcm,null,sha256},
{ecdh_rsa,aes_128_gcm,null,sha256},
{dhe_rsa,aes_128_gcm,null,sha256},
{dhe_dss,aes_128_gcm,null,sha256}]}]]},
#Port<0.69843>,14461}
** Reason for termination ==
** {{badmatch,
{error,
{'EXIT',
{{badmatch,
{error,
{{'EXIT',
{system_limit,
[{erlang,spawn_opt,
[proc_lib,init_p,
[<0.31152.144>,
[rabbit_tcp_client_sup,rabbit_sup,<0.88.0>],
gen,init_it,
[gen_server,<0.31152.144>,<0.31152.144>,supervisor2,
{self,rabbit_connection_helper_sup,[]},
[]]],
[link]],
[]},
{proc_lib,start_link,5,[{file,"proc_lib.erl"},{line,330}]},
{supervisor2,do_start_child,2,[]},
{supervisor2,handle_start_child,2,[]},
{supervisor2,handle_call,3,[]},
{gen_server,try_handle_call,4,
[{file,"gen_server.erl"},{line,629}]},
{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,661}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,240}]}]}},
{child,undefined,helper_sup,
{rabbit_connection_helper_sup,start_link,[]},
intrinsic,infinity,supervisor,
[rabbit_connection_helper_sup]}}}},
[{rabbit_connection_sup,start_link,0,[]},
{supervisor2,do_start_child_i,3,[]},
{supervisor2,handle_call,3,[]},
{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,629}]},
{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,661}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}}}},
[{rabbit_networking,start_client,2,[]},
{tcp_acceptor,handle_info,2,[]},
{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,615}]},
{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,681}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}
=ERROR REPORT==== 28-Apr-2018::01:57:14 ===
Too many processes
=SUPERVISOR REPORT==== 28-Apr-2018::01:57:14 ===
Supervisor: {<0.31553.124>,rabbit_channel_sup}
Context: start_error
Reason: {'EXIT',
{system_limit,
[{erlang,spawn_opt,
[proc_lib,init_p,
[<0.31553.124>,
[<0.10384.160>,<0.14218.175>,<0.7228.105>,
rabbit_tcp_client_sup,rabbit_sup,<0.88.0>],
gen,init_it,
[gen_server2,<0.31553.124>,<0.31553.124>,
rabbit_limiter,
1}],
[]]],
[link]],
[]},
{proc_lib,start_link,5,
[{file,"proc_lib.erl"},{line,330}]},
{supervisor2,do_start_child,2,[]},
{supervisor2,start_children,3,[]},
{supervisor2,init_children,2,[]},
{gen_server,init_it,6,
[{file,"gen_server.erl"},{line,328}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,240}]}]}}
Offender: [{pid,undefined},
{name,limiter},
{mfargs,
{rabbit_limiter,start_link,
{restart_type,transient},
{shutdown,4294967295},
{child_type,worker}]
=CRASH REPORT==== 28-Apr-2018::01:57:14 ===
crasher:
initial call: rabbit_reader:init/2
pid: <0.29614.92>
registered_name: []
exception error: a system limit has been reached
in function erlang:spawn_opt/1
called as erlang:spawn_opt({erlang,apply,
[#Fun<rabbit_net.1.86161000>,[]],
[monitor]})
in call from spawn_monitor/1
in call from rabbit_net:fast_close/1
in call from rabbit_reader:'-start_connection/5-after$^0/0-0-'/1 (src/rabbit_reader.erl, line 286)
in call from rabbit_reader:start_connection/5 (src/rabbit_reader.erl, line 269)
ancestors: [<0.30344.91>,rabbit_tcp_client_sup,rabbit_sup,<0.88.0>]
messages: []
links: [<0.30344.91>]
dictionary: [{{channel,1},
{<0.32356.92>,{method,rabbit_framing_amqp_0_9_1}}},
{process_name,
{rabbit_reader,
{{ch_pid,<0.32356.92>},{1,#Ref<0.0.262188.104540>}}]
trap_exit: true
status: running
heap_size: 1598
stack_size: 27
reductions: 8756
neighbours:
=SUPERVISOR REPORT==== 28-Apr-2018::01:57:14 ===
Supervisor: {<0.30344.91>,rabbit_connection_sup}
Context: child_terminated
Reason: system_limit
Offender: [{pid,<0.29614.92>},
{name,reader},
{mfargs,{rabbit_reader,start_link,[<0.29579.92>]}},
{restart_type,intrinsic},
{shutdown,4294967295},
{child_type,worker}]