Shovels end up terminate forever when network gets laggy

218 views
Skip to first unread message

Boris Staal

unread,
Jun 14, 2018, 8:08:42 AM6/14/18
to rabbitmq-users
Good time of the day!

I'm having a machine that's connected to a server using a dynamic shovel with AMQP 0.9.1 protocol on both sides. The connection on the machine is via 3g modem and sometimes network becomes nearly dead for long periods. At this point numerous different errors start occurring at shovels but they keep reconnecting to some point. I know there's a throttling limit but I have only 2 shovels and reconnect-delay is set to 10 seconds. Yet at some point they get terminated completely and the only thing I can do to revive them is to completely restart Rabbit MQ service (I'm on windows). I'd love to provide some additional details but I'm not sure which details could help and how do I acquire them. Could you please kindly guide me to how I can approach solving that?

Thanks a lot in advance!

Michael Klishin

unread,
Jun 14, 2018, 8:21:27 AM6/14/18
to rabbitm...@googlegroups.com
Please help others help you by providing more information, specifically logs that demonstrate the failures.

Shovels will try to restart with a delay up to a certain restart intensity. There are known
cases where in very volatile network environments (satellite links) Shovels reconnect so often they are not functional even if the plugin never “gives up”.

Restart delay is the only tunable knob available.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nick Barham

unread,
Jan 15, 2019, 4:51:29 AM1/15/19
to rabbitmq-users
Have you any more information about these failures in "very volatile network environments"? I am planning on deploying Shovels working over an intermittent satellite connection, and am interested in any known issues with that kind of configuration.

Michael Klishin

unread,
Jan 15, 2019, 3:23:38 PM1/15/19
to rabbitm...@googlegroups.com
The scenario I referred to was an inter-continental connections (I don't remember if they were satellite or not) where Shovel often cannot reconnect
since the new TCP connection fails sometimes within a couple of seconds after the previous one did.

I don't remember if we had any packet loss stats but it was obvious that TCP connection could not stay stable for more than 5 seconds, nearly all the time.
The user has switched to a different networking route and Shovel started behaving as expected.

I'm afraid the only real way to know if your environment is good enough is to provision in, measure packet loss and give it a try,
while doing some tcpdump capturing to collect more metrics to work with.

HTH.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,
Jan 15, 2019, 3:25:23 PM1/15/19
to rabbitm...@googlegroups.com
Note that we are aware of deployments where ships with onboard systems using Shovel go out into the sea, come back in a week
or so and Shovel then moves messages to a centralized onshore cluster.

So a stable connection once a week is better than a very volatile one available at all times in theory :)
Reply all
Reply to author
Forward
0 new messages