connection error handling with ampq-c library

233 views
Skip to first unread message

l rus

unread,
Nov 19, 2014, 1:31:49 PM11/19/14
to rabbitm...@googlegroups.com


For the most part the rabbitmq-c-master works for a third party broker/server.
However, occasionally we fail to correctly detect different failure conditions.
while amqp_consume_message(), we get different errors from the library:

Question:  What are the differences among these error conditions to help proper
error handling?

AMQP_STATUS_HEARTBEAT_TIMEOUT
AMQP_STATUS_CONNECTION_CLOSED
AMQP_STATUS_SSL_CONNECTION_FAILED
AMQP_STATUS_TIMER_FAILURE

======

What would SSL_CONNECTION_FAILED indicate after a sustained successful
receipt of messages from the server? Does this require to destroy entire connection?

Does lack of HEARTBEAT_TIMEOUT or TIMER_FAILURE after an initial error
imply  'all is good now' or would require destroy connection and reestablish?

I have rcvd HEARTBEAT_TIMEOUT followed with CONNECTION_CLOSED.
Here, i destroy connection and attempt to reestablish connection.

Other errors from the library code, seems possible during initialization only.. e.g.

AMQP_STATUS_HOSTNAME_RESOLUTION_FAILED
AMQP_STATUS_SSL_HOSTNAME_VERIFY_FAILED
AMQP_STATUS_SSL_PEER_VERIFY_FAILED


thanks



l rus

unread,
Nov 19, 2014, 11:44:22 PM11/19/14
to rabbitm...@googlegroups.com


We also get a stream of AMQP_STATUS_SSL_ERROR after continuous
successful message delivery from the server.

What is the recovery from here?

Alan Antonuk

unread,
Nov 20, 2014, 11:29:59 AM11/20/14
to l rus, rabbitm...@googlegroups.com
On Wed, Nov 19, 2014 at 8:44 PM, l rus <zxto...@gmail.com> wrote:


We also get a stream of AMQP_STATUS_SSL_ERROR after continuous
successful message delivery from the server.

What is the recovery from here?

OpenSSL has reported that there has been an error with the SSL stream. You should destroy the connection and start again. 



On Wednesday, November 19, 2014 1:31:49 PM UTC-5, l rus wrote:


For the most part the rabbitmq-c-master works for a third party broker/server.
However, occasionally we fail to correctly detect different failure conditions.
while amqp_consume_message(), we get different errors from the library:

Question:  What are the differences among these error conditions to help proper
error handling?

AMQP_STATUS_HEARTBEAT_TIMEOUT
Timed out waiting for a heartbeat frame from the broker. Means the connection is dead. You need to destroy the connection and start a new one
AMQP_STATUS_CONNECTION_CLOSED
The underlying transport has been closed.  Means the connection is dead. You need to destroy the connection and start a new one.
AMQP_STATUS_SSL_CONNECTION_FAILED
The SSL handshake failed. Connection never was open. You may try opening the socket again. 
AMQP_STATUS_TIMER_FAILURE
The OS returned an error when trying to call a timer function. This usually means something is seriously wrong with the state of the program. Likely you'll want to exit your program. 

======

What would SSL_CONNECTION_FAILED indicate after a sustained successful
receipt of messages from the server? Does this require to destroy entire connection?
You won't get this while consuming messages, only while making the initial connection to the broker.

Does lack of HEARTBEAT_TIMEOUT or TIMER_FAILURE after an initial error
imply  'all is good now' or would require destroy connection and reestablish?
Depends on what the initial error is. Using the connection after getting HEARTBEAT_TIMEOUT or TIMER_FAILURE other than to destroy it will result in undefined behavior. 

I have rcvd HEARTBEAT_TIMEOUT followed with CONNECTION_CLOSED.
Here, i destroy connection and attempt to reestablish connection.

Other errors from the library code, seems possible during initialization only.. e.g.

AMQP_STATUS_HOSTNAME_RESOLUTION_FAILED
AMQP_STATUS_SSL_HOSTNAME_VERIFY_FAILED
AMQP_STATUS_SSL_PEER_VERIFY_FAILED


thanks



--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Vipul Patel

unread,
Nov 20, 2014, 10:15:35 PM11/20/14
to Alan Antonuk, rabbitm...@googlegroups.com

This is VERY helpful...

In short, most of these errors would require amqp_destroy_connection()
and restart with new connection, socket, channels and queues.

BTW, i did receive AMQP_STATUS_SSL_CONNECTION_FAILED
after an active connection for several hours.. That seems to be
out-of-line..

I plan to treat it the same as destroy entire connection and reattempt
a couple of times to reconnect before stopping..

Should i terminate with AMQP_STATUS_TIMER_FAILURE w/o
attempt to reconnect?




Alan Antonuk

unread,
Nov 21, 2014, 12:01:00 PM11/21/14
to Vipul Patel, rabbitm...@googlegroups.com
AMQP_STATUS_TIMER_FAILURE means that the OS returned an error when trying to query a timer. You'll have to decide what that means for your application; I've found that in practice this is rare and when it does happen the system is shutting down or has something seriously wrong with it.

HTH
-Alan

Vipul Patel

unread,
Dec 28, 2014, 11:11:52 AM12/28/14
to Alan Antonuk, rabbitm...@googlegroups.com

Hello,

Occasionally, our application using rabbit-mq c wrapper crashes
while attempting SSL_write. This is usually when the broker server
have be brought down. (self->ssl = 0).

Library may want to protect against such conditions..


 #0  0x00007f7af2e4bd04 in SSL_write () from libssl.so.1.0.0
#1  0x00007f7af30828bd in amqp_ssl_socket_send (base=0x7f7ad8010c40, buf=0x7f7af0177010, len=8)
    at amqp_openssl.c:89
#2  0x00007f7af307cee0 in amqp_socket_send (self=0x7f7ad8010c40, buf=0x7f7af0177010, len=8)
    at amqp_socket.c:208
#3  0x00007f7af307c4f3 in amqp_send_frame (state=0x7f7ad8000aa0, frame=0x7f7ad2bfc7d0)
    at amqp_connection.c:517
#4  0x00007f7af307df84 in wait_frame_inner (state=0x7f7ad8000aa0, decoded_frame=0x7f7ad2bfc8f0, timeout=0x7f7ad2bfccf0)
    at amqp_socket.c:733
#5  0x00007f7af307e5f7 in amqp_simple_wait_frame_noblock (state=0x7f7ad8000aa0, decoded_frame=0x7f7ad2bfc8f0, timeout=0x7f7ad2bfccf0)
    at amqp_socket.c:925
#6  0x00007f7af308229a in amqp_consume_message (state=0x7f7ad8000aa0, envelope=0x7f7ad2bfc9c0, timeout=0x7f7ad2bfccf0, flags=0)
    at amqp_consumer.c:154


(gdb) p self
$1 = (struct amqp_ssl_socket_t *) 0x7f7ad8010c40
(gdb) p self->ssl
$2 = (SSL *) 0x0
(gdb) p buf
$3 = (const void *) 0x7f7af0177010



Alan Antonuk

unread,
Dec 29, 2014, 12:59:26 PM12/29/14
to Vipul Patel, rabbitm...@googlegroups.com
That looks like a use-after-free error, likely whats happened is a heartbeat times out, which causes amqp_socket_close() to be called which does a SSL_free(self->ssl). The problem is there's a subsequent call to amqp_socket_write/amqp_socket_read which tries to use this member.

Ideally this shouldn't cause a segfault. I've opened an issue: https://github.com/alanxz/rabbitmq-c/issues/228 to track this.

A workaround is: if you get an AMQP_STATUS_HEARTBEAT_TIMEOUT, consider the connection terminated, and destroy the connection immediately with amqp_destroy_connection.

-Alan

Alan Antonuk

unread,
Dec 30, 2014, 2:01:41 AM12/30/14
to l rus, rabbitm...@googlegroups.com
This is fixed in the master branch. 

-Alan

On Mon Dec 29 2014 at 7:47:19 PM Vipul Patel <zxto...@gmail.com> wrote:
thanks for the follow up.


Reply all
Reply to author
Forward
0 new messages