Looks like the old rabbitmq server (version 3.3) had heartbeat=600. We recently upgraded to v3.7 and that has heartbeat=60.
And now we are seeing very frequent connection drops from server since it doesn't receive timely heartbeats from the client.
Is that merely an issue due to sudden decrease in the default from 600s to 60s? Or there are known issues at scale with rabbitmq-3.7?
I should also mention that rabbitmq-3.7.7 seems to work slightly better than rabbit-3.7.8 in this sense. We also see {handshake_timeout, frame_header} errors with rabbitmq-3.7.8.
Even after setting the below in /etc/rabbitmq/rabbitmq.conf, we are seeing frequent errors for 'missed heartbeats from client'.
# rabbitmqctl status
Status of node rabbit@localhost ...
[{pid,20537},
{running_applications,
[{rabbitmq_management,"RabbitMQ Management Console","3.7.7"},
{rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.7.7"},
{cowboy,"Small, fast, modern HTTP server.","2.2.2"},
{rabbitmq_management_agent,"RabbitMQ Management Agent","3.7.7"},
{rabbit,"RabbitMQ","3.7.7"},
{amqp_client,"RabbitMQ AMQP Client","3.7.7"},
{rabbit_common,
"Modules shared by rabbitmq-server and rabbitmq-erlang-client",
"3.7.7"},
{ranch_proxy_protocol,"Ranch Proxy Protocol Transport","1.5.0"},
{ranch,"Socket acceptor pool for TCP protocols.","1.5.0"},
{ssl,"Erlang/OTP SSL application","8.2.4"},
{public_key,"Public key infrastructure","1.5.2"},
{asn1,"The Erlang ASN1 compiler version 5.0.5","5.0.5"},
{cowlib,"Support library for manipulating Web protocols.","2.1.0"},
{inets,"INETS CXC 138 49","6.5"},
{xmerl,"XML parser","1.3.16"},
{os_mon,"CPO CXC 138 46","2.4.4"},
{jsx,"a streaming, evented json parsing toolkit","2.8.2"},
{recon,"Diagnostic tools for production use","2.3.2"},
{crypto,"CRYPTO","4.2.1"},
{mnesia,"MNESIA CXC 138 12","4.15.3"},
{lager,"Erlang logging framework","3.6.3"},
{goldrush,"Erlang event stream processor","0.1.9"},
{compiler,"ERTS CXC 138 10","7.1.5"},
{syntax_tools,"Syntax tools","2.1.4"},
{syslog,"An RFC 3164 and RFC 5424 compliant logging framework.","3.4.2"},
{sasl,"SASL CXC 138 11","3.1.1"},
{stdlib,"ERTS CXC 138 10","3.4.4"},
{kernel,"ERTS CXC 138 10","5.4.3"}]},
{os,{unix,linux}},
{erlang_version,
"Erlang/OTP 20 [erts-9.3] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:128] [hipe] [kernel-poll:true]\n"},........
We are using rabbitmq in our OpenStack deployment and seeing these errors very frequently since upgrading from rabbitmq-3.3 to rabbitmq-3.7.
Also, some services seem to be using the new 150s heartbeat (default was 60s), but there are still some errors with 'timeout: 10s'. How does the heartbeat setting work for both client and server?
Appreciate any pointers you may provide. Please let me know if you need packet capture output or any other info for this.