RabbitMQ-C: Function amqp_basic_publish blocks without network link. How to avoid this behavior?

833 views
Skip to first unread message

Rodrigo Pimenta Carvalho

unread,
Dec 19, 2014, 12:51:39 PM12/19/14
to rabbitm...@googlegroups.com


Hi.

In my client (it is a producer and consumer) I'm using the functions: 'amqp_consume_message' and 'amqp_basic_publish'.
The client is consuming and producing messages from/to the broker perfectly.

Now, if I remove the Ethernet cable that links the client to the broker (simulating a network breakdown), I get:

A. For the consumer part (amqp_consume_message) : heartbeat timeout. The function returns and the consumer has chance to perform others tasks. Perfect!
B. For the producer part (amqp_basic_publish): It stays blocked until I plug the cable again. So, the publisher stays blocked and it cann't go ahead doing others tasks.

When I plug the cable again, the producer returns with the erro: AMQP_STATUS_SOCKET_ERROR.

How to avoid this blocking situation? I have used the same code to create the connection for producer and consumer. So, both of them use the same heartbeat timeout value configured.

Any hint will be very helpful!

Best regards.


RODRIGO PIMENTA CARVALHO
Inatel Competence Center
Software
Ph: +55 35 3471 9300 (Brasil)

Alan Antonuk

unread,
Dec 22, 2014, 4:27:44 PM12/22/14
to Rodrigo Pimenta Carvalho, rabbitm...@googlegroups.com
You've hit one of the limitations of the heartbeat implementation in rabbitmq-c. Currently rabbitmq-c doesn't have a way to detect that a socket will block before calling send() or recv(), thus the behavior you're seeing. If you wait long enough there is a long-ish timeout that the OS has that will mark the connection as broken and send() will return with an error.

Fixing this would require switching rabbitmq-c to use non-blocking sockets, which difficult to do in a portable manner (different OSs have different levels of support for non-blocking sockets). I do have it on a list of features I'd like to add to rabbitmq-c, but I don't have a timeline for implementing this.

-Alan

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send an email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rodrigo Pimenta Carvalho

unread,
Jan 6, 2015, 6:13:44 AM1/6/15
to Alan Antonuk, rabbitm...@googlegroups.com
Hi Alan,

My system is going to run only over the Linux SO. In this case, I would like to develop a specific solution for my case.

Could you help me giving me some hints, please?

1 - How to change that long-ish timeout that the OS has that will mark the connection as broken? Is there a kind of configuration file in the SO that has the timeout value?
2 - What part of the RabbitMQ-C library code should be modified to switching rabbitmq-c to use non-blocking sockets? (In my case I don't need a portable manner).

Do you have some suggestion about what to do in the RabbitMQ-C code?

Any hint will be very helpful.
Thank you very much.

RODRIGO PIMENTA CARVALHO
Inatel Competence Center
Software
Ph: +55 35 3471 9979 (Brasil)
________________________________________
De: Alan Antonuk [alan.a...@gmail.com]
Enviado: segunda-feira, 22 de dezembro de 2014 19:27
Para: Rodrigo Pimenta Carvalho
Cc: rabbitm...@googlegroups.com
Assunto: Re: [rabbitmq-users] RabbitMQ-C: Function amqp_basic_publish blocks without network link. How to avoid this behavior?

You've hit one of the limitations of the heartbeat implementation in rabbitmq-c. Currently rabbitmq-c doesn't have a way to detect that a socket will block before calling send() or recv(), thus the behavior you're seeing. If you wait long enough there is a long-ish timeout that the OS has that will mark the connection as broken and send() will return with an error.

Fixing this would require switching rabbitmq-c to use non-blocking sockets, which difficult to do in a portable manner (different OSs have different levels of support for non-blocking sockets). I do have it on a list of features I'd like to add to rabbitmq-c, but I don't have a timeline for implementing this.

-Alan

On Fri, Dec 19, 2014 at 9:50 AM, Rodrigo Pimenta Carvalho <pim...@inatel.br<mailto:pim...@inatel.br>> wrote:


Hi.

In my client (it is a producer and consumer) I'm using the functions: 'amqp_consume_message' and 'amqp_basic_publish'.
The client is consuming and producing messages from/to the broker perfectly.

Now, if I remove the Ethernet cable that links the client to the broker (simulating a network breakdown), I get:

A. For the consumer part (amqp_consume_message) : heartbeat timeout. The function returns and the consumer has chance to perform others tasks. Perfect!
B. For the producer part (amqp_basic_publish): It stays blocked until I plug the cable again. So, the publisher stays blocked and it cann't go ahead doing others tasks.

When I plug the cable again, the producer returns with the erro: AMQP_STATUS_SOCKET_ERROR.

How to avoid this blocking situation? I have used the same code to create the connection for producer and consumer. So, both of them use the same heartbeat timeout value configured.

Any hint will be very helpful!

Best regards.


RODRIGO PIMENTA CARVALHO
Inatel Competence Center
Software
Ph: +55 35 3471 9300<tel:%2B55%2035%203471%209300> (Brasil)

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com<mailto:rabbitmq-users%2Bunsu...@googlegroups.com>.
To post to this group, send an email to rabbitm...@googlegroups.com<mailto:rabbitm...@googlegroups.com>.

Alan Antonuk

unread,
Jan 6, 2015, 1:05:34 PM1/6/15
to Rodrigo Pimenta Carvalho, rabbitm...@googlegroups.com
On Tue, Jan 6, 2015 at 3:11 AM, Rodrigo Pimenta Carvalho <pim...@inatel.br> wrote:
Hi Alan,

My system is going to run only over the Linux SO. In this case, I would like to develop a specific solution for my case.

Could you help me giving me some hints, please?

1 -   How to change that  long-ish timeout that the OS has that will mark the connection as broken? Is there a kind of configuration file in the SO that has the timeout value?
 
If you want to go down this route, I would enable the TCP_USER_TIMEOUT socket ioctl() as described here: http://stackoverflow.com/a/5907951/786714. Its also possible to change the tcp_retries1 and tcp_retries2 knobs as described here: http://man7.org/linux/man-pages/man7/tcp.7.html, this is global to the system its running on, so it may not be the best thing to change.
 
2 -   What part of the RabbitMQ-C library code should be modified  to switching rabbitmq-c to use non-blocking sockets? (In my case I don't need a portable manner).

 There's already a function you can use to set the socket to be non-blocking: https://github.com/alanxz/rabbitmq-c/blob/bfd8cc9129da46136691a88c06d505b78f0043ae/librabbitmq/amqp_socket.c#L143

You'll need to look at the amqp_send_frame and wait_frame_inner as starting points to investigate how it currently works and how socket timeouts interact with heartbeat timeouts. For non-SSL cases, the code will likely become simpler, for SSL cases its becomes more difficult, because sending on a socket may result in both a send and a receive. (You'll need to check the OpenSSL docs for details on how this works.

A correction on what I've said earlier: this is something that is well supported by multiple OSs. I've added an issue with some of my notes so far: https://github.com/alanxz/rabbitmq-c/issues/227

Good Luck

-Alan

Rodrigo Pimenta Carvalho

unread,
Jan 8, 2015, 3:26:47 PM1/8/15
to Alan Antonuk, rabbitm...@googlegroups.com


Hi Alan,

I would like to share the solution I have found for the case discussed below.

I have decided to take control of the timeout on my on and avoiding the long-ish timeout that the OS has that will mark the connection as broken, and I have changed it to 10 seconds.

See the code I have used just after opening the socket:

........................................................................................................................................

struct timeval timeout;
timeout.tv_sec = SOCKET_TIME_OUT; // equal to 10 seconds.
timeout.tv_usec = 0;

if (setsockopt (amqp_socket_get_sockfd(amqp_socket_t), SOL_SOCKET, SO_SNDTIMEO, (char *)&timeout,sizeof(timeout)) < 0)
{
syslog(LOG_WARNING, " ERROR. Impossible to execut setsockopt");
}
else{
syslog(LOG_INFO, "Socket SO_SNDTIMEO configured to %d seconds: OK.", SOCKET_TIME_OUT);
}

...................................................................................................................................

This works fine for the socket that links my client to the broker. It doesn't impact others sockets.

Any comment?

Regards.


RODRIGO PIMENTA CARVALHO
Inatel Competence Center
Software
Ph: +55 35 3471 9979 (Brasil)
________________________________________
De: Alan Antonuk [alan.a...@gmail.com]
Enviado: terça-feira, 6 de janeiro de 2015 16:05
Para: Rodrigo Pimenta Carvalho
Cc: rabbitm...@googlegroups.com
Assunto: Re: [rabbitmq-users] RabbitMQ-C: Function amqp_basic_publish blocks without network link. How to avoid this behavior?

Alan Antonuk

unread,
Jan 8, 2015, 3:41:34 PM1/8/15
to Rodrigo Pimenta Carvalho, rabbitm...@googlegroups.com
I wanted this to go to the whole list....

On Thu, Jan 8, 2015 at 12:41 PM, Alan Antonuk <alan.a...@gmail.com> wrote:
It's a bit of a hack, but yes that will likely do what you intend. Test it thoroughly with your use case.

Another thing to keep in mind: when the broker becomes overloaded (e.g., hits a high water mark with memory usage) the broker will apply back pressure to clients by stopping reading from the network socket.  From rabbitmq-c side this will look like the amqp_basic_publish()/send() is stuck. So this will not work in the general case.

-Alan

Rodrigo Pimenta Carvalho

unread,
Jan 9, 2015, 7:26:13 AM1/9/15
to Alan Antonuk, rabbitm...@googlegroups.com
Ok Alan.

I understood your point.
Thank you for your hint!

Regards.

RODRIGO PIMENTA CARVALHO
Inatel Competence Center
Software
Ph: +55 35 3471 9979 (Brasil)
________________________________________
De: Alan Antonuk [alan.a...@gmail.com]
Enviado: quinta-feira, 8 de janeiro de 2015 18:41
Para: Rodrigo Pimenta Carvalho; rabbitm...@googlegroups.com
Assunto: Re: Function amqp_basic_publish blocks without network link. Sharing the solution.

I wanted this to go to the whole list....

On Thu, Jan 8, 2015 at 12:41 PM, Alan Antonuk <alan.a...@gmail.com<mailto:alan.a...@gmail.com>> wrote:
It's a bit of a hack, but yes that will likely do what you intend. Test it thoroughly with your use case.

Another thing to keep in mind: when the broker becomes overloaded (e.g., hits a high water mark with memory usage) the broker will apply back pressure to clients by stopping reading from the network socket. From rabbitmq-c side this will look like the amqp_basic_publish()/send() is stuck. So this will not work in the general case.

-Alan

On Thu, Jan 8, 2015 at 12:24 PM, Rodrigo Pimenta Carvalho <pim...@inatel.br<mailto:pim...@inatel.br>> wrote:


Hi Alan,

I would like to share the solution I have found for the case discussed below.

I have decided to take control of the timeout on my on and avoiding the long-ish timeout that the OS has that will mark the connection as broken, and I have changed it to 10 seconds.

See the code I have used just after opening the socket:

........................................................................................................................................

struct timeval timeout;
timeout.tv_sec = SOCKET_TIME_OUT; // equal to 10 seconds.
timeout.tv_usec = 0;

if (setsockopt (amqp_socket_get_sockfd(amqp_socket_t), SOL_SOCKET, SO_SNDTIMEO, (char *)&timeout,sizeof(timeout)) < 0)
{
syslog(LOG_WARNING, " ERROR. Impossible to execut setsockopt");
}
else{
syslog(LOG_INFO, "Socket SO_SNDTIMEO configured to %d seconds: OK.", SOCKET_TIME_OUT);
}

...................................................................................................................................

This works fine for the socket that links my client to the broker. It doesn't impact others sockets.

Any comment?

Regards.


RODRIGO PIMENTA CARVALHO
Inatel Competence Center
Software
Ph: +55 35 3471 9979<tel:%2B55%2035%203471%209979> (Brasil)
________________________________________
De: Alan Antonuk [alan.a...@gmail.com<mailto:alan.a...@gmail.com>]
Enviado: terça-feira, 6 de janeiro de 2015 16:05
Para: Rodrigo Pimenta Carvalho
Cc: rabbitm...@googlegroups.com<mailto:rabbitm...@googlegroups.com>
Assunto: Re: [rabbitmq-users] RabbitMQ-C: Function amqp_basic_publish blocks without network link. How to avoid this behavior?

Reply all
Reply to author
Forward
0 new messages