RabbitMQ restart - worker recovery

681 views
Skip to first unread message

Jon Snow

unread,
Jan 18, 2018, 11:07:10 AM1/18/18
to rabbitmq-users
Hello,
I wander how to handle RabbitMQ restart. My project is GenStateMachine server with a lot of workers written in Elixir. Problem comes if i restart RabbitMQ server. It gives:
-------------------------
17:15:12.723 [warn]  Connection (#PID<0.177.0>) closing: received hard error {:"connection.close",
 320,
 "CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'",
 0, 0} from server
-------------------------
I am not sure how to handle it. It's not :DOWN state so handle_event callback doesn't trigger. I tried something like:

handle_event(:info, {:"connection.close",_,_, _, _}, _state, _data)

but no luck. I need to know if Rabbit is dead(for whatever reason) so my workers wait for a new one to get up and connect again.
Any advice will be most welcome. Thanks.

Michael Klishin

unread,
Jan 18, 2018, 11:45:01 AM1/18/18
to rabbitm...@googlegroups.com
See http://www.rabbitmq.com/upgrade.html#rabbitmq-restart-handling.

When a node shuts down it will forcefully close all connections by sending a connection.close frame.
I'd need to take a look at the docs to tell what's the best way to handle those in the Erlang client
but the recovery procedure is the same for all clients.

First thing that comes to mind is that the process(es) that open RabbitMQ connections should monitor them/link to them
and should connection go down, terminate and let their supervisor restart them. Handling outstanding
operations and outstanding publisher confirms is the hard part then but if you only consume, using
manual acknowledgements should be all you need: http://rabbitmq.com/confirms.html.
> --
> You received this message because you are subscribed to the Google Groups "rabbitmq-users"
> group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
> To post to this group, send an email to rabbitm...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

--
MK

Staff Software Engineer, Pivotal/RabbitMQ


Michael Klishin

unread,
Jan 18, 2018, 1:50:09 PM1/18/18
to rabbitm...@googlegroups.com
Long story short, RabbitMQ connection processes should be linked to (or at least monitored) by the processes that spawn them/depend on them.
It largely works the same way for channels except that since re-opening a channel is easier, monitoring should be enough.

Your code should get a

{shutdown,
  {connection_closing,
     {server_initiated_close, _Code, _}}}

for server-initiated connection closure and a {shutdown, …} for other similar events. The recovery procedure
described in http://www.rabbitmq.com/upgrade.html#rabbitmq-restart-handling applies to the Erlang client
as much.

How this can be changed if you use an Elixir wrapper, I'm not sure but hopefully not too much.


> To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
> To post to this group, send an email to rabbitmq-users@googlegroups.com.

> For more options, visit https://groups.google.com/d/optout.
>

--
MK

Staff Software Engineer, Pivotal/RabbitMQ


Jon Snow

unread,
Jan 19, 2018, 8:34:05 AM1/19/18
to rabbitmq-users
Hello,

Following AMQP documentation message received from rabbit is:
Indeed, in the tuple it says :"connection.close" atom. 320 stands for reply-code 320 which is
"An operator intervened to close the connection for some reason. The client may retry at some later date.".

So my callback handler is

  def handle_event(
                     :info,
                     {
                       :"connection.close", 320, _reply_text, _class_id, _method_id
                     },
                     _state,
                     _data
                   ) do
    IO.puts "I need to see this"
    Process.send_after(self(), :connect, 3_000);
  end

And it is not working. I am starting to think it's not the worker receiving that message. But who is it then?
> To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
> To post to this group, send an email to rabbitm...@googlegroups.com.

> For more options, visit https://groups.google.com/d/optout.
>

--
MK

Staff Software Engineer, Pivotal/RabbitMQ


Jon Snow

unread,
Jan 19, 2018, 10:02:59 AM1/19/18
to rabbitmq-users
I think elixir library has no methods for that. Ill try to find a workaround. Thanks

Michael Klishin

unread,
Jan 19, 2018, 12:55:58 PM1/19/18
to rabbitm...@googlegroups.com
Please see https://github.com/rabbitmq/rabbitmq-erlang-client/blob/b48faff43ea530a50e8d438e6d0dc110545ca746/test/system_SUITE.erl#L1174,
which is a test that explicitly triggers a server-sent connection.close (with a very different reason but still).

As mentioned above, connection processes must monitored or linked to by the processes that spawn (open) them. In which case you
will get a ‘DOWN’ message either from the connection process or both the “spawner” and connection can die together
and let a supervisor take care of things (which is what I’d do if the goal is to only consume on the connection).

We do not know what “a worker” is in your case, what your process tree looks like or even whether you use
the Erlang client directly or via one of the Elixir wrappers.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
Staff Software Engineer, Pivotal/RabbitMQ

Luke Bakken

unread,
Jan 19, 2018, 1:19:55 PM1/19/18
to rabbitmq-users
Hi Jon,

You may not have seen this article, which discusses using supervision trees to restart AMQP connections and consumers when failure happens:


As always, if you could provide a set of code I can run to reproduce this, it would greatly assist in diagnosis.

Thanks,
Luke
--
Staff Software Engineer
Pivotal / RabbitMQ

Jon Snow

unread,
Jan 22, 2018, 7:39:02 AM1/22/18
to rabbitmq-users
This looks awesome. I thought to use process/erlang.monitor per process, but this looks much better. In this case i guess i can keep
1 heartbeat per N+ consumers which is really powerful. Trying to figure how to fit this logic in my project.

Jon Snow

unread,
Jan 22, 2018, 10:07:48 AM1/22/18
to rabbitmq-users
Hello,

I changed code a bit, playing with GenServer. Looks cool so far and working. Ill add workers logic later.
Thanks
amqp_connector.ex

Luke Bakken

unread,
Jan 22, 2018, 10:39:24 AM1/22/18
to rabbitmq-users
Hi Jon -

Thanks for reporting back and for sharing the code you're using.
Reply all
Reply to author
Forward
0 new messages