Network connectivity error handling

401 views
Skip to first unread message

Desmond Bowe

unread,
Jul 16, 2013, 1:19:24 PM7/16/13
to ruby...@googlegroups.com
Hi, I have a Rails application that publishes messages to Rabbit.  I'd like to make the Rails app tolerate a lost connection, but if I kill the broker, the app promptly crashes with the following:

E, [2013-07-16T13:06:24.210953 #6897] ERROR -- #<Bunny::Session:70100018889260 user@host:5672, vhost=my_host>: Exception in the reader loop: Bunny::InternalError: Connection-level error: INTERNAL_ERROR
E, [2013-07-16T13:06:24.211192 #6897] ERROR -- #<Bunny::Session:70100018889260 user@host:5672, vhost=my_host>: Backtrace:
E, [2013-07-16T13:06:24.211278 #6897] ERROR -- #<Bunny::Session:70100018889260 user@host:5672, vhost=my_host>: /Users/desmond/.rvm/gems/ruby-2.0.0-p0@threering/gems/bunny-0.9.0/lib/bunny/session.rb:391:in `handle_frame'
E, [2013-07-16T13:06:24.211365 #6897] ERROR -- #<Bunny::Session:70100018889260 user@host:5672, vhost=my_host>: /Users/desmond/.rvm/gems/ruby-2.0.0-p0@threering/gems/bunny-0.9.0/lib/bunny/reader_loop.rb:75:in `run_once'
E, [2013-07-16T13:06:24.211441 #6897] ERROR -- #<Bunny::Session:70100018889260 user@host:5672, vhost=my_host>: /Users/desmond/.rvm/gems/ruby-2.0.0-p0@threering/gems/bunny-0.9.0/lib/bunny/reader_loop.rb:34:in `block in run_loop'
E, [2013-07-16T13:06:24.211519 #6897] ERROR -- #<Bunny::Session:70100018889260 user@host:5672, vhost=my_host>: /Users/desmond/.rvm/gems/ruby-2.0.0-p0@threering/gems/bunny-0.9.0/lib/bunny/reader_loop.rb:31:in `loop'
E, [2013-07-16T13:06:24.211596 #6897] ERROR -- #<Bunny::Session:70100018889260 user@host:5672, vhost=my_host>: /Users/desmond/.rvm/gems/ruby-2.0.0-p0@threering/gems/bunny-0.9.0/lib/bunny/reader_loop.rb:31:in `run_loop'
/Users/desmond/.rvm/rubies/ruby-2.0.0-p0/lib/ruby/2.0.0/irb/input-method.rb:152:in `readline': caught an unexpected exception in the network loop: Connection-level error: INTERNAL_ERROR (Bunny::NetworkFailure)

My current strategy is to connect to Rabbit when the app starts and check for a proper connection each time I publish a message, logging any errors and gracefully moving on.  I haven't found any documentation for managing connectivity issues once the app's connected.  Is there a better way to tolerate the broker (or its server) going down?

thanks
Desmond

Michael Klishin

unread,
Jul 16, 2013, 1:44:19 PM7/16/13
to ruby...@googlegroups.com
Desmond Bowe:

> /Users/desmond/.rvm/rubies/ruby-2.0.0-p0/lib/ruby/2.0.0/irb/input-method.rb:152:in `readline': caught an unexpected exception in the network loop: Connection-level error: INTERNAL_ERROR (Bunny::NetworkFailure)

This is not a network failure. RabbitMQ reports INTERNAL_ERROR. You need to investigate why.
See RabbitMQ log, it should have at least some pointers.

>
> My current strategy is to connect to Rabbit when the app starts and check for a proper connection each time I publish a message, logging any errors and gracefully moving on. I haven't found any documentation for managing connectivity issues once the app's connected. Is there a better way to tolerate the broker (or its server) going down?

http://rubybunny.info/articles/error_handling.html
--
MK

signature.asc

Desmond Bowe

unread,
Jul 16, 2013, 5:50:07 PM7/16/13
to ruby...@googlegroups.com
Whenever I run rabbitmqctl stop_app while the client is connected it sends a frame with the message INTERNAL_ERROR and code 541, hence the original error I was seeing.  The Rabbit logs have this to say:

=ERROR REPORT==== 16-Jul-2013::17:28:59 ===
AMQP connection <0.1181.0> (running), channel 1 - error:
shutdown

=WARNING REPORT==== 16-Jul-2013::17:28:59 ===
Non-AMQP exit reason 'shutdown'

Does Rabbit consider it an error to be shut down while clients are connected?  It seems rather painful if I have to stop all my services to modify Rabbit.

Also, the docs you referenced describe how to handle errors when I'm doing specific actions, like connecting or declaring a queue.  How would I trap generic frame exceptions like this 541 in Bunny?  Is there an analog to the AMQP gem's callbacks on connections, channels, etc?

thanks

Michael Klishin

unread,
Jul 16, 2013, 6:06:59 PM7/16/13
to ruby...@googlegroups.com
Desmond Bowe:

> Does Rabbit consider it an error to be shut down while clients are connected? It seems rather painful if I have to stop all my services to modify Rabbit.

stop_app is not considered to be clean shutdown (you force the Erlang app to stop), so RabbitMQ cannot report anything more specific than 541.

> Also, the docs you referenced describe how to handle errors when I'm doing specific actions, like connecting or declaring a queue. How would I trap generic frame exceptions like this 541 in Bunny? Is there an analog to the AMQP gem's callbacks on connections, channels, etc?

If you don't handle anything, automatic connection recovery will try to reconnect every several
seconds (currently forever).

Passing :automatically_recover => false to Bunny.new will result in an exception being
raised in the thread where connection was instantiated. All exceptions subclass Bunny::Exception,
Bunny::NetworkFailure carries a cause on it (in this case, a Bunny::ConnectionLevelException
subclass).

Why do you need to use stop_app as opposed to simply shutting down the process?
The end result is the same for Bunny but you'd get a less vague error message.
--
MK

signature.asc

Desmond Bowe

unread,
Jul 17, 2013, 12:28:18 PM7/17/13
to ruby...@googlegroups.com
I've been using stop_app since that's what I've found everywhere in the RabbitMQ documentation for stopping the process.  Sending it a SIGTERM(15) seems to make it behave better, thanks.  Now when I quit the broker, Bunny logs an Errno::ECONNRESET: Connection reset by peer and attempts to reconnect.  That sounds promising, but moments later the app crashes with the following:

/Users/desmond/.rvm/gems/ruby-2.0.0-p0@threering/gems/bunny-0.9.0/lib/bunny/socket.rb:41:in `read_nonblock': closed stream (IOError)
from /Users/desmond/.rvm/gems/ruby-2.0.0-p0@threering/gems/bunny-0.9.0/lib/bunny/socket.rb:41:in `block in read_fully'
from /Users/desmond/.rvm/gems/ruby-2.0.0-p0@threering/gems/bunny-0.9.0/lib/bunny/socket.rb:40:in `loop'
from /Users/desmond/.rvm/gems/ruby-2.0.0-p0@threering/gems/bunny-0.9.0/lib/bunny/socket.rb:40:in `read_fully'
from /Users/desmond/.rvm/gems/ruby-2.0.0-p0@threering/gems/bunny-0.9.0/lib/bunny/transport.rb:199:in `read_next_frame'
from /Users/desmond/.rvm/gems/ruby-2.0.0-p0@threering/gems/bunny-0.9.0/lib/bunny/session.rb:706:in `init_connection'

If I force the socket to close and re-initialize during a network recovery, everything works: Bunny loops through its automatic recovery and eventually restores the connection once the broker is restarted.  Is there a reason Bunny attempts to reuse its existing socket after a network error?  Seems like you'd want to reset the whole connection stack.

Michael Klishin

unread,
Jul 17, 2013, 1:32:13 PM7/17/13
to ruby...@googlegroups.com
Desmond Bowe:

> Is there a reason Bunny attempts to reuse its existing socket after a network error? Seems like you'd want to reset the whole connection stack.

It does not try to reuse the socket. The transport is recreated in the network loop thread,
so if something else tries to use it at the same time, it will fail with an exception.

Do you have a reliable way to reproduce this with a script?
--
MK

signature.asc

Desmond Bowe

unread,
Jul 17, 2013, 1:57:29 PM7/17/13
to ruby...@googlegroups.com
Where does it recreate the transport?  Bunny::Session#recover_from_network_failure just calls #start.

I'll get back to you with a script soon.

Michael Klishin

unread,
Jul 17, 2013, 2:10:04 PM7/17/13
to ruby...@googlegroups.com
Desmond Bowe:

> Bunny::Session#recover_from_network_failure just calls #start.

It should happen in one of the methods invoked by #start.

There were some changes around rc1 in that area to make it possible
to access socket and SSL context before Bunny::Session#start is invoked.
So it may be a regression.
--
MK

signature.asc

Michael Klishin

unread,
Jul 17, 2013, 2:27:13 PM7/17/13
to ruby...@googlegroups.com
Desmond Bowe:

> I'll get back to you with a script soon.

OK, and please try 0.9.x-stable with it first:

https://github.com/ruby-amqp/bunny/tree/0.9.x-stable
--
MK

signature.asc

Desmond Bowe

unread,
Jul 20, 2013, 4:18:43 PM7/20/13
to ruby...@googlegroups.com
I just tried 0.9.3-pre and everything works fine- the connection restores automatically unless I run stop_app (then it's up to me to reinitialize the connection, but at least the app doesn't crash!).  Incidentally, what is the preferred way to stop rabbit if it's running as a daemon?

0.9.0 still has the issues I described but only when connected to a broker on a remote server (ie, not localhost).  I set up a barebones rails app at https://github.com/desmondmonster/bunny-test.  Start a remote broker, update the connection string in config/initializers/rabbitmq.rb, and run $rails c.  I tried killing the broker 3 different ways and got the following stacktraces: https://gist.github.com/desmondmonster/6046274.

Meanwhile I'm going to move along with 0.9.3-pre since it seems good to go.

thanks
Desmond

Michael Klishin

unread,
Jul 21, 2013, 1:07:06 AM7/21/13
to ruby...@googlegroups.com
Desmond Bowe:

> 0.9.0 still has the issues I described

Obviously we don't re-publish existing gems.

> Meanwhile I'm going to move along with 0.9.3-pre since it seems good to go.

There is 0.9.2, no need to use git dependencies.
--
MK

signature.asc
Reply all
Reply to author
Forward
0 new messages