[RabbitMQ 3.8.2 Erlang 22.2.7] Generic server aten_detector terminating

330 views
Skip to first unread message

moutia el amrani

unread,
Apr 29, 2020, 10:28:24 AM4/29/20
to rabbitmq-users
We are facing a crash report for a while after started rabbitmq service. Below you can find crash report text and status output.

2020-04-28 12:31:16 =ERROR REPORT====
** Generic server aten_detector terminating
** Last message in was poll
** When Server state == {state,#Ref<0.1411409538.2422996994.222353>,1000,0.99,#{},#{}}
** Reason for termination ==
** {{timeout,{gen_server,call,[aten_sink,get_failure_probabilities]}},[{gen_server,call,2,[{file,"gen_server.erl"},{line,215}]},{aten_detector,handle_infon_server,try_dispatch,4,[{file,"gen_server.erl"},{line,637}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,711}]},{proc_lib,init_p_do_apply,3,[
2020-04-28 12:31:17 =CRASH REPORT====
  crasher:
    initial call: aten_detector:init/1
    pid: <0.305.0>
    registered_name: aten_detector
    exception exit: {{timeout,{gen_server,call,[aten_sink,get_failure_probabilities]}},[{gen_server,call,2,[{file,"gen_server.erl"},{line,215}]},{aten_det},{line,97}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,637}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,711}]},{proc_lib,in}]}
    ancestors: [aten_sup,<0.300.0>]
    message_queue_len: 1
    messages: [poll]
    links: [<0.301.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 987
    stack_size: 27
    reductions: 35918218
  neighbours:
2020-04-28 12:31:17 =SUPERVISOR REPORT====
     Supervisor: {local,aten_sup}
     Context:    child_terminated
     Reason:     {timeout,{gen_server,call,[aten_sink,get_failure_probabilities]}}
     Offender:   [{pid,<0.305.0>},{id,aten_detector},{mfargs,{aten_detector,start_link,[]}},{restart_type,permanent},{shutdown,5000},{child_type,worker}]

2020-04-28 12:31:56 =SUPERVISOR REPORT====
     Supervisor: {<0.1988.0>,rabbit_connection_helper_sup}
     Context:    shutdown_error
     Reason:     shutdown
     Offender:   [{pid,<0.2296.0>},{name,collector},{mfargs,{rabbit_queue_collector,start_link,[<<"10.11.22.33:5187 -> 1.2.3.4:5672">>]}},{restart_ty]

2020-04-28 12:32:02 =SUPERVISOR REPORT====
     Supervisor: {<0.2117.0>,rabbit_connection_helper_sup}
     Context:    shutdown_error
     Reason:     shutdown
     Offender:   [{pid,<0.2571.0>},{name,collector},{mfargs,{rabbit_queue_collector,start_link,[<<"10.11.22.33:2394 -> 1.2.3.4:5672">>]}},{restart_ty]

2020-04-28 12:32:03 =ERROR REPORT====
** Generic server aten_detector terminating
** Last message in was poll
** When Server state == {state,#Ref<0.1411409538.2422996994.225059>,1000,0.99,#{},#{}}
** Reason for termination ==
** {{timeout,{gen_server,call,[aten_sink,get_failure_probabilities]}},[{gen_server,call,2,[{file,"gen_server.erl"},{line,215}]},{aten_detector,handle_infon_server,try_dispatch,4,[{file,"gen_server.erl"},{line,637}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,711}]},{proc_lib,init_p_do_apply,3,[
2020-04-28 12:32:03 =CRASH REPORT====
  crasher:
    initial call: aten_detector:init/1
    pid: <0.21758.136>
    registered_name: aten_detector
    exception exit: {{timeout,{gen_server,call,[aten_sink,get_failure_probabilities]}},[{gen_server,call,2,[{file,"gen_server.erl"},{line,215}]},{aten_det},{line,97}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,637}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,711}]},{proc_lib,in}]}
    ancestors: [aten_sup,<0.300.0>]
    message_queue_len: 1
    messages: [poll]
    links: [<0.301.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 27
    reductions: 794
  neighbours:
2020-04-28 12:32:03 =SUPERVISOR REPORT====
     Supervisor: {local,aten_sup}
     Context:    child_terminated
     Reason:     {timeout,{gen_server,call,[aten_sink,get_failure_probabilities]}}
     Offender:   [{pid,<0.21758.136>},{id,aten_detector},{mfargs,{aten_detector,start_link,[]}},{restart_type,permanent},{shutdown,5000},{child_type,worke

2020-04-28 18:01:43 =SUPERVISOR REPORT====
     Supervisor: {<0.26919.141>,rabbit_channel_sup}
     Context:    shutdown_error
     Reason:     noproc
     Offender:   [{pid,<0.26886.141>},{name,channel},{mfargs,{rabbit_channel,start_link,[1,<0.26939.141>,<0.26923.141>,<0.26939.141>,<<"10.11.22.33:57367{user,<<"TTAcquisition">>,[],[{rabbit_auth_backend_internal,none}]},<<"Vhost">>,[{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>mer_cancel_notify">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentication_failure_close">>,bool,true}],<0.26936.141>,<0.26920.141>]}},{restarker}]

2020-04-28 22:12:52 =SUPERVISOR REPORT====
     Supervisor: {<0.2345.145>,amqp_channel_sup_sup}
     Context:    shutdown_error
     Reason:     shutdown
     Offender:   [{nb_children,1},{name,channel_sup},{mfargs,{amqp_channel_sup,start_link,[direct,<0.2341.145>,<<"<rab...@server.2.2341.145>">>]}},{reype,supervisor}]

Any ideas on what might be the issue?

Thank you in advance,

Best regards,



Luke Bakken

unread,
Apr 29, 2020, 10:39:45 AM4/29/20
to rabbitmq-users
Hi Moutia,

Please provide this information:
  • RabbitMQ version, Erlang version
  • Operating system running RabbitMQ and version of operating system
  • Attach your complete RabbitMQ configuration files
  • Compress and attach your full RabbitMQ log files
Thank you -
Luke

moutia el amrani

unread,
Apr 29, 2020, 1:05:03 PM4/29/20
to rabbitm...@googlegroups.com
Hi Luke,

Please find below the requested information :

RabbitMQ version, Erlang version : RabbitMQ 3.8.2 Erlang 22.2.7
Operating system running RabbitMQ and version of operating system : 

- Virtualization: vmware
- Operating System: Red Hat Enterprise Linux Server 7.7 (Maipo)
- CPE OS Name: cpe:/o:redhat:enterprise_linux:7.7:GA:server
- Kernel: Linux 3.10.0-1062.el7.x86_64
- Architecture: x86-64

Attached the files you requested.

Thank you 

Moutia


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/7f876e14-9769-4f67-946f-1be21b08f7eb%40googlegroups.com.
crash.log.0
rabbitmq.config

Luke Bakken

unread,
Apr 29, 2020, 1:12:16 PM4/29/20
to rabbitmq-users
Hello,

There should be more log files, perhaps located in /var/log/rabbitmq

The crash.log file does not contain enough information to help.

Did you notice anything unusual at the time of this crash log entry?

Thanks,
Luke

moutia el amrani

unread,
Apr 29, 2020, 3:37:13 PM4/29/20
to rabbitm...@googlegroups.com
  Hello,

Please find attached the RMQ log file.
We have a peak load at 12:30 p.m.

image.png
Thank you,

Moutia

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
rabbit@vasrv00647.7z

Luke Bakken

unread,
Apr 29, 2020, 4:25:13 PM4/29/20
to rabbitmq-users
Hello,

You have many short-lived connections in your log file:

2020-04-28 00:01:10.119 [info] <0.13842.126> accepting AMQP connection <0.13842.126> (10.11.22.33:41470 -> 1.2.3.4:5672)
2020-04-28 00:01:10.122 [info] <0.13842.126> connection <0.13842.126> (10.11.22.33:41470 -> 1.2.3.4:5672): user 'user2' authenticated and granted access to vhost 'VHOST'
2020-04-28 00:01:10.197 [info] <0.13842.126> closing AMQP connection <0.13842.126> (10.11.22.33:41470 -> 1.2.3.4:5672, vhost: 'VHOST', user: 'user2')

This is extremely inefficient.

Please bring this up with your application developers. Connections must be long-lived for anything resembling decent RabbitMQ performance. In addition, connection churn like this adds load to your server (https://www.rabbitmq.com/connections.html#high-connection-churn)

Prior to the event at 12:30, you see log messages like these:

2020-04-28 12:26:44.100 [error] <0.10875.136> closing AMQP connection <0.10875.136> (10.11.22.33:2919 -> 1.2.3.4:5672):
{writer,send_failed,{error,timeout}}
2020-04-28 12:26:44.100 [error] <0.13652.136> closing AMQP connection <0.13652.136> (10.11.22.33:56084 -> 1.2.3.4:5672):
{writer,send_failed,{error,timeout}}

2020-04-28 12:27:21.215 [error] <0.20961.135> closing AMQP connection <0.20961.135> (10.11.22.33:15778 -> 1.2.3.4:5672):
{writer,send_failed,{error,enotconn}}

2020-04-28 12:27:31.858 [error] <0.10292.135> closing AMQP connection <0.10292.135> (10.11.22.33:60009 -> 1.2.3.4:5672):
{writer,send_failed,{error,timeout}}

Basically, you're overloading your RabbitMQ server. The screenshot you shared from your monitoring system is too small to read. Please attach it rather than inserting it into your message.

Thanks -
Luke
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.

moutia el amrani

unread,
Apr 29, 2020, 5:21:16 PM4/29/20
to rabbitm...@googlegroups.com
Hello,

At 12:31:07 p.m. our load balancer vip has switched to the other node => connection cut

Thanks
Moutia,

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/16d3d7ab-3934-48db-9b1d-0f35b4f5bd03%40googlegroups.com.
LoadServerRMQ.PNG

Luke Bakken

unread,
Apr 29, 2020, 5:39:30 PM4/29/20
to rabbitmq-users
Hi Moutia,

I can't effectively help you when you give me important bits of information, one at a time. This is an important piece of information that you should have shared in your first message!

What connections go through your load balancer? Only connections from client applications?

I also requested that you provide a readable screenshot from your monitoring system. Please carefully read my messages and provide information when I request it.

Luke

On Wednesday, April 29, 2020 at 2:21:16 PM UTC-7, moutia el amrani wrote:
Hello,

At 12:31:07 p.m. our load balancer vip has switched to the other node => connection cut

Thanks
Moutia,

moutia el amrani

unread,
Apr 30, 2020, 9:57:34 AM4/30/20
to rabbitm...@googlegroups.com
Hello Luke,


Please find below our rabbitmq architecture and  the screenshot from our  monitoring system :
image.png
image.png

Thanks,
Moutia


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages