We have long-running services that sometimes get no traffic for hours (or days). We are seeing the server forcibly closing connections after some period of time (appears to be anywhere from 6 - 12 hours, but the timing is not consistent. This does not happen in our older production server (Rabbit 3.0.2), so it does not appear to be something environmental. Rather, this is new behavior with some version of Rabbit server and/or .Net client DLL after 3.0.2 (we're currently running 3.5.0).
In the RabbitMQ log we see information like:
=ERROR REPORT==== 18-Apr-2015::05:50:35 ===
closing AMQP connection <0.348.0> ([::1]:42670 -> [::1]:5672):
{heartbeat_timeout,running}
We've explicitly NOT requested a heartbeat.
We were hoping that a workaround would be to request a 60 second heartbeat on each connection, but we're also seeing that when we do enable heartbeat connections are still periodically terminated. If we have the heartbeat enabled, at least one (but by no means 'all') of our connections will be terminated overnight with a timeout. This makes it hard to a nice reproducible test case.
In our application we have dozens of connections, each with multiple channels (most have 3-5, but some have as many as 50). So far we've been unable to determine anything about which connection will be terminated, only that we always see one or more connections terminated at some point.
We've also tried enabling AutomaticRecoveryEnabled, but as far as we've been able to determine, that option has no effect as we are still getting a System.IO.EndOfStreamException in the client when the server closes the connection as a "heartbeat_timeout".
Ideally we'd like to be using both a 60 second heartbeat (to avoid lost connections due to network configuration etc), with auto-recovery such that if the connection is closed by the server it will be automatically reestablished by the client. So far we've not seen recovery work successfully. Perhaps we've misunderstood the meanings of the client configuration options.
Here's a sample program that, when left running overnight, reproduces the timeout. This one has RequestedHeartbeat set to 0:
static void Main(string[] args)
{
:
:
var userName = args[0];
var password = args[1];
var exchangeName = "TestExchange";
var queueName = "test";
var factory = new ConnectionFactory
{
HostName = "localhost",
UserName = userName,
Password = password,
//AutomaticRecoveryEnabled = true,
RequestedHeartbeat = 0
//RequestedHeartbeat = 60
};
var connection = factory.CreateConnection();
var channel = connection.CreateModel();
channel.ExchangeDeclare(exchangeName, ExchangeType.Topic, true);
channel.QueueDeclare(queueName, false, false, true, null);
var messageQueue = new SharedQueue<BasicDeliverEventArgs>();
var consumer = new QueueingBasicConsumer(channel, messageQueue);
channel.BasicConsume(queueName, true, consumer);
channel.BasicQos(0, 100, false);
channel.QueueBind(queueName, exchangeName, queueName + ".#");
Console.WriteLine("Waiting for a message, heartbeat disabled");
while (true)
{
BasicDeliverEventArgs eventArguments;
if (messageQueue.Dequeue(100, out eventArguments))
{
// do something
Console.WriteLine("got a message");
}
}
connection.Close();
}