RabbitMQ suddenly stops working with error: Handshake_timeout, handshake error

3,878 views
Skip to first unread message

Zoran Jeremic

unread,
Sep 18, 2017, 2:52:20 PM9/18/17
to rabbitmq-users
Hi,

I'm using RabbitMQ to share messages between Moodle (PHP application) and Java backend web application. Both applications and RabbitMQ are hosted on one RHEL server. I'm in the middle of pilot with group of students, and everything was running fine for a weeks, until recently when RabbitMQ stopped sharing messages between applications and caused Moodle to be un-responsive when PHP client is trying to send a message. Restarting both Apache and Tomcat didn't solve anything, and only after RabbitMQ restart everything was back to normal. I've had similar experience some time ago, but I ignored it then. However, I would like to further investigate what is the problem, and to try to find solution for it. Piece of log when the problem happened is given bellow. It doesn't give me any idea what could be the problem, so I would appreciate if someone have an idea how to investigate this.

Thanks,
Zoran




=INFO REPORT==== 16-Sep-2017::20:21:34 ===
accepting AMQP connection <0.6370.248> (127.0.0.1:51442 -> 127.0.0.1:5672)

=INFO REPORT==== 16-Sep-2017::20:21:35 ===
closing AMQP connection <0.6432.248> (127.0.0.1:51428 -> 127.0.0.1:5672)

=INFO REPORT==== 16-Sep-2017::20:21:35 ===
accepting AMQP connection <0.6457.248> (127.0.0.1:51452 -> 127.0.0.1:5672)

=INFO REPORT==== 16-Sep-2017::20:21:36 ===
accepting AMQP connection <0.6427.248> (127.0.0.1:51456 -> 127.0.0.1:5672)

=INFO REPORT==== 16-Sep-2017::20:21:36 ===
closing AMQP connection <0.6427.248> (127.0.0.1:51456 -> 127.0.0.1:5672)

=INFO REPORT==== 16-Sep-2017::20:22:54 ===
closing AMQP connection <0.6392.248> (127.0.0.1:51436 -> 127.0.0.1:5672)

=INFO REPORT==== 16-Sep-2017::20:24:34 ===
accepting AMQP connection <0.6473.248> (127.0.0.1:51832 -> 127.0.0.1:5672)

=ERROR REPORT==== 16-Sep-2017::20:24:44 ===
closing AMQP connection <0.6473.248> (127.0.0.1:51832 -> 127.0.0.1:5672):
{handshake_timeout,frame_header}

=INFO REPORT==== 16-Sep-2017::20:24:59 ===
accepting AMQP connection <0.6578.248> (127.0.0.1:51888 -> 127.0.0.1:5672)

=INFO REPORT==== 16-Sep-2017::20:25:05 ===
closing AMQP connection <0.6345.248> (127.0.0.1:51430 -> 127.0.0.1:5672)

=INFO REPORT==== 16-Sep-2017::20:25:09 ===
accepting AMQP connection <0.6516.248> (127.0.0.1:51918 -> 127.0.0.1:5672)

=ERROR REPORT==== 16-Sep-2017::20:25:09 ===
closing AMQP connection <0.6578.248> (127.0.0.1:51888 -> 127.0.0.1:5672):
{handshake_timeout,frame_header}

=ERROR REPORT==== 16-Sep-2017::20:25:19 ===
closing AMQP connection <0.6516.248> (127.0.0.1:51918 -> 127.0.0.1:5672):
{handshake_timeout,frame_header}

=INFO REPORT==== 16-Sep-2017::20:32:42 ===
accepting AMQP connection <0.6493.248> (127.0.0.1:52832 -> 127.0.0.1:5672)

=INFO REPORT==== 16-Sep-2017::20:32:42 ===
closing AMQP connection <0.6493.248> (127.0.0.1:52832 -> 127.0.0.1:5672)

=INFO REPORT==== 16-Sep-2017::20:32:43 ===
accepting AMQP connection <0.6501.248> (127.0.0.1:52836 -> 127.0.0.1:5672)

=INFO REPORT==== 16-Sep-2017::20:32:43 ===
closing AMQP connection <0.6501.248> (127.0.0.1:52836 -> 127.0.0.1:5672)

=INFO REPORT==== 16-Sep-2017::20:34:20 ===
accepting AMQP connection <0.6579.248> (127.0.0.1:53036 -> 127.0.0.1:5672)

=ERROR REPORT==== 16-Sep-2017::20:34:30 ===
closing AMQP connection <0.6579.248> (127.0.0.1:53036 -> 127.0.0.1:5672):
{handshake_timeout,handshake}

=INFO REPORT==== 16-Sep-2017::20:36:09 ===
accepting AMQP connection <0.6582.248> (127.0.0.1:53254 -> 127.0.0.1:5672)

=ERROR REPORT==== 16-Sep-2017::20:36:19 ===
closing AMQP connection <0.6582.248> (127.0.0.1:53254 -> 127.0.0.1:5672):
{handshake_timeout,handshake}

=INFO REPORT==== 16-Sep-2017::20:38:57 ===
accepting AMQP connection <0.6497.248> (127.0.0.1:53512 -> 127.0.0.1:5672)

=ERROR REPORT==== 16-Sep-2017::20:39:07 ===
closing AMQP connection <0.6497.248> (127.0.0.1:53512 -> 127.0.0.1:5672):
{handshake_timeout,handshake}

Michael Klishin

unread,
Sep 18, 2017, 2:58:58 PM9/18/17
to rabbitm...@googlegroups.com
What version of RabbitMQ and Erlang is used? What does `rabbitmqctl environment` output?
Is there anything in the -sasl.log next to the regular one?
This is not something commonly reported.

{handshake_timeout,handshake} means that according to RabbitMQ connection, the
client did not proceed with protocol handshake negotiation. Only Wireshark [1]
and pretty involved node state inspection can explain why.



--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Zoran Jeremic

unread,
Sep 18, 2017, 4:22:11 PM9/18/17
to rabbitmq-users
Hi Michael,

Thank you for your quick response.
Versions are: RabbitMQ 3.3.5 on Erlang R16B03-1. You can also find other settings in the log bellow from service restart.

I've forgot to mention that in logs I could find the error handshake_timeout for about 2 hours, but after that log reports accepting and closing AMQP connection without error in regular 5 seconds intervals. However, RabbitMQ was not able to exchange messages.

-sasl.log is empty.

It is difficult to investigate this using Wireshark as I don't know how to reproduce the problem on the local PC. It happened on the production server two times, and I can't leave it too long in that state while I'm investigating. I have to solve the problem as soon as possible, as users are affected.


rabbitmqctl environment output
Application environment of node rabbit@morph_acs ...
[{auth_backends,[rabbit_auth_backend_internal]},
 {auth_mechanisms,['PLAIN','AMQPLAIN']},
 {backing_queue_module,rabbit_variable_queue},
 {channel_max,0},
 {channel_operation_timeout,70000},
 {cluster_nodes,{[],disc}},
 {cluster_partition_handling,ignore},
 {collect_statistics,fine},
 {collect_statistics_interval,5000},
 {default_permissions,[<<".*">>,<<".*">>,<<".*">>]},
 {default_user,<<"guest">>},
 {default_user_tags,[administrator]},
 {default_vhost,<<"/">>},
 {delegate_count,16},
 {disk_free_limit,50000000},
 {enabled_plugins_file,"/etc/rabbitmq/enabled_plugins"},
 {error_logger,{file,"/var/log/rabbitmq/rabbit@morph_acs.log"}},
 {frame_max,131072},
 {halt_on_upgrade_failure,true},
 {heartbeat,580},
 {hipe_compile,false},
 {hipe_modules,[rabbit_reader,rabbit_channel,gen_server2,rabbit_exchange,
                rabbit_command_assembler,rabbit_framing_amqp_0_9_1,
                rabbit_basic,rabbit_event,lists,queue,priority_queue,
                rabbit_router,rabbit_trace,rabbit_misc,rabbit_binary_parser,
                rabbit_exchange_type_direct,rabbit_guid,rabbit_net,
                rabbit_amqqueue_process,rabbit_variable_queue,
                rabbit_binary_generator,rabbit_writer,delegate,gb_sets,lqueue,
                sets,orddict,rabbit_amqqueue,rabbit_limiter,gb_trees,
                rabbit_queue_index,rabbit_exchange_decorator,gen,dict,ordsets,
                file_handle_cache,rabbit_msg_store,array,
                rabbit_msg_store_ets_index,rabbit_msg_file,
                rabbit_exchange_type_fanout,rabbit_exchange_type_topic,mnesia,
                mnesia_lib,rpc,mnesia_tm,qlc,sofs,proplists,credit_flow,pmon,
                ssl_connection,tls_connection,ssl_record,tls_record,gen_fsm,
                ssl]},
 {included_applications,[]},
 {log_levels,[{connection,info}]},
 {loopback_users,[]},
 {msg_store_file_size_limit,16777216},
 {msg_store_index_module,rabbit_msg_store_ets_index},
 {plugins_dir,"/usr/lib/rabbitmq/lib/rabbitmq_server-3.3.5/sbin/../plugins"},
 {plugins_expand_dir,"/var/lib/rabbitmq/mnesia/rabbit@morph_acs-plugins-expand"},
 {queue_index_max_journal_entries,65536},
 {reverse_dns_lookups,false},
 {sasl_error_logger,{file,"/var/log/rabbitmq/rabbit@morph_acs-sasl.log"}},
 {server_properties,[]},
 {ssl_apps,[asn1,crypto,public_key,ssl]},
 {ssl_cert_login_from,distinguished_name},
 {ssl_listeners,[]},
 {ssl_options,[]},
 {tcp_listen_options,[binary,
                      {packet,raw},
                      {reuseaddr,true},
                      {backlog,128},
                      {nodelay,true},
                      {linger,{true,0}},
                      {exit_on_close,false}]},
 {tcp_listeners,[5672]},
 {trace_vhosts,[]},
 {vm_memory_high_watermark,0.4},
 {vm_memory_high_watermark_paging_ratio,0.5}]
...done.





=INFO REPORT==== 17-Sep-2017::09:13:42 ===
accepting AMQP connection <0.4163.251> (127.0.0.1:54724 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:13:48 ===
closing AMQP connection <0.4163.251> (127.0.0.1:54724 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:13:48 ===
accepting AMQP connection <0.4033.251> (127.0.0.1:54734 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:13:51 ===
Stopping RabbitMQ

=INFO REPORT==== 17-Sep-2017::09:13:51 ===
stopped TCP Listener on [::]:5672

=ERROR REPORT==== 17-Sep-2017::09:13:51 ===
AMQP connection <0.1705.251> (running), channel 0 - error:
{amqp_error,connection_forced,
            "broker forced connection closure with reason 'shutdown'",none}

=ERROR REPORT==== 17-Sep-2017::09:13:51 ===
AMQP connection <0.4033.251> (running), channel 0 - error:
{amqp_error,connection_forced,
            "broker forced connection closure with reason 'shutdown'",none}

=INFO REPORT==== 17-Sep-2017::09:13:51 ===
Halting Erlang VM

=INFO REPORT==== 17-Sep-2017::09:13:55 ===
Starting RabbitMQ 3.3.5 on Erlang R16B03-1
Copyright (C) 2007-2014 GoPivotal, Inc.
Licensed under the MPL.  See http://www.rabbitmq.com/

=INFO REPORT==== 17-Sep-2017::09:13:55 ===
node           : rabbit@morph_acs
home dir       : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.config
cookie hash    : 6OiE4w689naJLoK+RqXDSQ==
log            : /var/log/rabbitmq/rabbit@morph_acs.log
sasl log       : /var/log/rabbitmq/rabbit@morph_acs-sasl.log
database dir   : /var/lib/rabbitmq/mnesia/rabbit@morph_acs

=INFO REPORT==== 17-Sep-2017::09:13:55 ===
Limiting to approx 924 file handles (829 sockets)

=INFO REPORT==== 17-Sep-2017::09:13:55 ===
Memory limit set to 6203MB of 15508MB total.

=INFO REPORT==== 17-Sep-2017::09:13:55 ===
Disk free limit set to 50MB

=INFO REPORT==== 17-Sep-2017::09:13:55 ===
msg_store_transient: using rabbit_msg_store_ets_index to provide index

=INFO REPORT==== 17-Sep-2017::09:13:55 ===
msg_store_persistent: using rabbit_msg_store_ets_index to provide index

=INFO REPORT==== 17-Sep-2017::09:13:55 ===
started TCP Listener on [::]:5672

=INFO REPORT==== 17-Sep-2017::09:13:56 ===
Management plugin started. Port: 15672

=INFO REPORT==== 17-Sep-2017::09:13:56 ===
Statistics database started.

=INFO REPORT==== 17-Sep-2017::09:13:56 ===
Server startup complete; 6 plugins started.
 * amqp_client
 * mochiweb
 * rabbitmq_management
 * rabbitmq_management_agent
 * rabbitmq_web_dispatch
 * webmachine

=INFO REPORT==== 17-Sep-2017::09:13:57 ===
accepting AMQP connection <0.293.0> (127.0.0.1:54740 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:03 ===
closing AMQP connection <0.293.0> (127.0.0.1:54740 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:03 ===
accepting AMQP connection <0.313.0> (127.0.0.1:54750 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:10 ===
closing AMQP connection <0.313.0> (127.0.0.1:54750 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:10 ===
accepting AMQP connection <0.329.0> (127.0.0.1:54752 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:16 ===
closing AMQP connection <0.329.0> (127.0.0.1:54752 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:16 ===
accepting AMQP connection <0.344.0> (127.0.0.1:54754 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:21 ===
closing AMQP connection <0.344.0> (127.0.0.1:54754 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:21 ===
accepting AMQP connection <0.359.0> (127.0.0.1:54768 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:28 ===
closing AMQP connection <0.359.0> (127.0.0.1:54768 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:28 ===
accepting AMQP connection <0.374.0> (127.0.0.1:54774 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:31 ===
accepting AMQP connection <0.386.0> (127.0.0.1:54788 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:31 ===
closing AMQP connection <0.386.0> (127.0.0.1:54788 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:31 ===
accepting AMQP connection <0.401.0> (127.0.0.1:54790 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:32 ===
closing AMQP connection <0.401.0> (127.0.0.1:54790 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:32 ===
accepting AMQP connection <0.416.0> (127.0.0.1:54792 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:32 ===
closing AMQP connection <0.416.0> (127.0.0.1:54792 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:32 ===
accepting AMQP connection <0.428.0> (127.0.0.1:54794 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:32 ===
closing AMQP connection <0.428.0> (127.0.0.1:54794 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:32 ===
accepting AMQP connection <0.440.0> (127.0.0.1:54796 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:32 ===
closing AMQP connection <0.440.0> (127.0.0.1:54796 -> 127.0.0.1:5672)

=INFO REPORT==== 17-Sep-2017::09:14:32 ===
accepting AMQP connection <0.452.0> (127.0.0.1:54798 -> 127.0.0.1:5672)


Thanks,
Zoran

Michael Klishin

unread,
Sep 18, 2017, 4:27:57 PM9/18/17
to rabbitm...@googlegroups.com
3.3.5 is 26 releases behind [1]. Consider upgrading.

Your node is limited to 1024 file descriptors (which is OS default on Linux, not a RabbitMQ default). With a surge in connections
and queue counts, it is quite easy to run into that limit after which RabbitMQ won't be able to accept
new client connections [2]. I'd expect no connection entries at all if that's the case but 3.3.5 and 3.6.x versions
log connection events slightly differently.

This is one variable factor (the effective file descriptor usage) which can go up and down over time, therefore connections
begin to succeed at some point. With PHP clients, which connect and disconnect per operation in many cases, this theory sounds
particularly plausible. See [2] for notes on how to lift the limit.

Without SASL logs and monitoring information I'm afraid that's my best guess.


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages