Bunny loses connection to RabbitMQ, never recovers

429 views
Skip to first unread message

matob...@gmail.com

unread,
Dec 1, 2020, 11:22:16 AM12/1/20
to Ruby RabbitMQ libraries

First background on what I am using.
Clients:
  •   Ruby 2.6.5
  •   bunny 2.17.0
Server:
  • RabbitMQ 3.8.9 (Running in K8S behind an ELB 3 "Pods" in cluster
There are times, when we have some issue with our RabbitMQ clusters(or they get upgraded and "go away" for short periods of time(~30 seconds) that my client code(Ruby + Bunny) never get back their connection to RabbitMQ.

I have maybe 50 clients running, most of the time they all reconnect just fine. However, in some cases they never come back. I think I have reproduced the problem. Buy my internal knowledge is not as good as it could be.

On a "good" client that reconnects we got message going to the logger I pass bunny basically saying `Will recover from a network failure (no retry limit)..." Some timeout message and then all reconnects or so.

However on the ones that do NOT reconnect, I get the follow stack traces to STDERR(not the logger) I'm thinking this is a thread crashing in the background, as I really don't see anything in the stack that is "my" code.

Once I see this(STDERR), it never seems to reconnect my client.

#<Thread:0x0000000001e187b8@/home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/reader_loop.rb:35 run> terminated with exception (report_on_exception is true):
Traceback (most recent call last):
    10: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/reader_loop.rb:36:in `run_loop'
     9: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/reader_loop.rb:36:in `loop'
     8: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/reader_loop.rb:39:in `block in run_loop'
     7: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/reader_loop.rb:74:in `run_once'
     6: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/transport.rb:261:in `read_next_frame'
     5: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/transport.rb:239:in `read_fully'
     4: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/cruby/ssl_socket.rb:44:in `read_fully'
     3: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/cruby/ssl_socket.rb:44:in `loop'
     2: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/cruby/ssl_socket.rb:45:in `block in read_fully'
     1: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/2.6.0/openssl/buffering.rb:182:in `read_nonblock'
/home/myhome/.rbenv/versions/2.6.5/lib/ruby/2.6.0/openssl/buffering.rb:182:in `sysread_nonblock': read would block (OpenSSL::SSL::SSLErrorWaitReadable)
    7: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/reader_loop.rb:36:in `run_loop'
    6: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/reader_loop.rb:36:in `loop'
    5: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/reader_loop.rb:39:in `block in run_loop'
    4: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/reader_loop.rb:74:in `run_once'
    3: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/transport.rb:261:in `read_next_frame'
    2: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/transport.rb:239:in `read_fully'
    1: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/cruby/ssl_socket.rb:43:in `read_fully'
/home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/cruby/ssl_socket.rb:55:in `rescue in read_fully': IO timeout when reading 7 bytes (Timeout::Error)
    8: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/reader_loop.rb:36:in `run_loop'
    7: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/reader_loop.rb:36:in `loop'
    6: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/reader_loop.rb:37:in `block in run_loop'
    5: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/reader_loop.rb:47:in `rescue in block in run_loop'
    4: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/session.rb:718:in `handle_network_failure'
    3: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/session.rb:757:in `recover_from_network_failure'
    2: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/session.rb:322:in `start'
    1: from /home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/transport.rb:87:in `connect'
/home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/transport.rb:87:in `connect': SSL_connect returned=1 errno=32 state=error: unknown alert type (OpenSSL::SSL::SSLError)

Once that happens every time I try to add a job I just get this message

Bunny::ConnectionClosedError message Trying to send frame through a closed connection. Frame is ...

Any pointers on what I can do, to help correct this? Basically when it gets to this state I just have to bounce the service.

matob...@gmail.com

unread,
Dec 1, 2020, 12:31:05 PM12/1/20
to Ruby RabbitMQ libraries
A few things I left out.

The way I reproduced this was to force kill the RabbitMQ pods in K8S

The code I am looking at now, is just adding jobs to queue in a loop.  I've created a single bunny object for this.

We I see the STDERR exceptions above. I also notice that two "background" threads from bunny go away. so when I am normally running I see thow threads like this

#<Thread:0x000000000252c6d0@/home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/heartbeat_sender.rb:46 sleep
#<Thread:0x000000000252c4a0@/home/myhome/.rbenv/versions/2.6.5/lib/ruby/gems/2.6.0/gems/bunny-2.17.0/lib/bunny/reader_loop.rb:35 sleep>

After I that STDERR exception, those threads are no longer there.

Oddly enough I do have "Thread.abort_on_exception = true" set, so I would off thought it would have exited my app out when those threads "dissappered"

Michael Klishin

unread,
Dec 4, 2020, 8:57:36 AM12/4/20
to Ruby RabbitMQ libraries
My only guess is that some EWOULDBLOCK kind of errors returned by the socket are not retried specifically for TLS.

See reader_loop.rb and transport.rb. There should be a list of constants that lead to retries.

--
Bunny: http://rubybunny.info
March Hare: http://rubymarchhare.info
 
IRC: #rabbitmq on irc.freenode.net
 
Post to the group: ruby...@googlegroups.com | unsubscribe: ruby-amqp+...@googlegroups.com
---
You received this message because you are subscribed to the Google Groups "Ruby RabbitMQ libraries" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ruby-amqp+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ruby-amqp/32f1424e-0055-4102-aca4-8b16fde00f1bn%40googlegroups.com.

Michael Klishin

unread,
Dec 4, 2020, 9:01:37 AM12/4/20
to Ruby RabbitMQ libraries
We do not guess in this community. Either post your code or we can only assume that no specifics are known.

Bunny uses one reader loop thread per connection, and another for sending heartbeats. When the connection
fails the reader loop thread initiates a recovery and the heartbeat sender is stopped until recovery succeeds.

My best guess remains the same: that OpenSSL::SSL::SSLErrorWaitReadable is not retried and causes the reader loop thread to exit after the first recovery attempt when a similar non-TLS connection error would be handled and retried on.

-- 
Bunny: http://rubybunny.info
March Hare: http://rubymarchhare.info
 
IRC: #rabbitmq on irc.freenode.net
 
Post to the group: ruby...@googlegroups.com | unsubscribe: ruby-amqp+...@googlegroups.com
--- 
You received this message because you are subscribed to the Google Groups "Ruby RabbitMQ libraries" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ruby-amqp+...@googlegroups.com.

Michael Klishin

unread,
Jan 27, 2021, 7:07:33 AM1/27/21
to Ruby RabbitMQ libraries
It was fairly trivial so I went ahead and added a callback that will be called for every connection recovery attempt [1].

Adam Lett

unread,
Jan 29, 2021, 6:17:55 AM1/29/21
to ruby...@googlegroups.com
This is much appreciated. Thank you!

matob...@gmail.com

unread,
Feb 3, 2021, 1:51:41 PM2/3/21
to Ruby RabbitMQ libraries
Ah cool, so with this change I can give a block of code to get called each time this happens.

Sorry I fell off the face of the chain here. I had been given a few days to dig into this problem, and learned alot, but didn't get it fully understood.  The problem doesn't cause us a huge headache right now, so it keeps getting pushed back on the stack of work
Reply all
Reply to author
Forward
0 new messages