reconnect keepalive

Ján Kianička

unread,

Aug 15, 2014, 10:26:04 AM8/15/14

to rabbitm...@googlegroups.com

Dear RabbitMQ developers,
We are implementing RabbitMQ centric system in c using librabbitmq-c library.
I need to resolve failover of the cluster and reconnect feature. I know that this feature is supported by official client implementations: Java, .NET and Erlang.
But we are limited to c. We are now having constantly open connection. And I have made experiments to reconnect by catching response library exception from 'amqp_consume_message'.
This works only on 'gracefull' shutdown scenario when client side of TCP connection is alerted about the connection failure. But the cluster has VIP mechanism which moves virtual IP to other nod, and running open connections just start to hang without any error.
There is coming connection interruption only after TCP keep alive on both sides expire (the host system kernel configured one).
Don't you have someone resolved this issue of client's robustness in C?

I am thinking about setting socket keep alive per connection, but amqp_socket is somehow different that standard socket in C and did not work with getsockopt() and setsockopt().
Also I am thinking about amqp keep alive implementation, but I did not find some examples in C using librabbitmq-c. Don't you have someone experience?

Thank you very much for any kind of help.
Kind regards
Jan

Michael Klishin

unread,

Aug 15, 2014, 11:09:46 AM8/15/14

to Ján Kianička, rabbitm...@googlegroups.com

On 15 August 2014 at 18:26:10, Ján Kianička (jan.ki...@gmail.com) wrote:
> > I know that this feature is supported by official client implementations:
> Java, .NET and Erlang.

Of the official ones, only Java.

> But we are limited to c. We are now having constantly open connection.
> And I have made experiments to reconnect by catching response
> library exception from 'amqp_consume_message'.
> This works only on 'gracefull' shutdown scenario when client
> side of TCP connection is alerted about the connection failure.
> But the cluster has VIP mechanism which moves virtual IP to other
> nod, and running open connections just start to hang without
> any error.
> There is coming connection interruption only after TCP keep
> alive on both sides expire (the host system kernel configured
> one).
> Don't you have someone resolved this issue of client's robustness
> in C?

This is why the protocol has heartbeats: to detect broken TCP connections
quicker. Keepalive timeouts with default settings on most OSes are so
high they are useless for messaging (or virtually anything, really).

> I am thinking about setting socket keep alive per connection,
> but amqp_socket is somehow different that standard socket in
> C and did not work with getsockopt() and setsockopt().
> Also I am thinking about amqp keep alive implementation, but
> I did not find some examples in C using librabbitmq-c. Don't you
> have someone experience?

Implementing heartbeats correctly and reliably assumes the library
can use a separate thread (or similar). In case of the C client, there
is no consensus about how that should be done, from what I see.

But it's good someone is starting this conversation.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Alan Antonuk

unread,

Aug 15, 2014, 2:27:28 PM8/15/14

to Michael Klishin, Ján Kianička, rabbitm...@googlegroups.com

On Fri, Aug 15, 2014 at 8:09 AM, Michael Klishin <mkli...@pivotal.io> wrote:

On 15 August 2014 at 18:26:10, Ján Kianička (jan.ki...@gmail.com) wrote:

> I am thinking about setting socket keep alive per connection,
> but amqp_socket is somehow different that standard socket in
> C and did not work with getsockopt() and setsockopt().

amqp_socket abstracts away the differences between a raw TCP socket, and one that has SSL layered on top of it. If you want to manipulate the underlying socket use amqp_get_sockfd() to get the socket, then use socket functions as you might. See: https://github.com/alanxz/rabbitmq-c/blob/master/librabbitmq/amqp.h#L992

Note that incorrect use of socket functions such as setsockopt() on sockets that rabbitmq-c owns can have adverse affects. So test what you do carefully.

> Also I am thinking about amqp keep alive implementation, but
> I did not find some examples in C using librabbitmq-c. Don't you
> have someone experience?

Implementing heartbeats correctly and reliably assumes the library
can use a separate thread (or similar). In case of the C client, there
is no consensus about how that should be done, from what I see.

rabbitmq-c has partial support for AMQP heartbeats. By partial I mean, the heartbeats are only serviced while blocking on amqp_basic_publish() or amqp_simple_wait_frame() (or anything that depends on that - such as amqp_consume_message()). This tends to cover the use-case where the program does a wait for messages to be delivered from a consumer. Enabling these in rabbitmq-c involves passing a value greater than 0 for the heartbeat parameter of amqp_login().

As mentioned by Michael: to get complete support for heartbeats in a client, there needs to be some kind of currency to handle sending and receiving heartbeats (such as a background thread, or an event-loop), which doesn't exist currently in rabbitmq-c (though I have been making some slow progress on how this might be done).