Help! RabbitMq Plugin Shovels alway "starting" status

1,453 views
Skip to first unread message

Shen John

unread,
Sep 22, 2014, 6:24:19 AM9/22/14
to rabbitm...@googlegroups.com

Hi all,

     I am using rabbitmq shovels to transfer msg from queueA(nodeA) to queueB(nodeB). The network is not reliable.

     I met this problem: Shovels status is always starting while There is not any queueA shovels consumer.

     I don't known what has happened. Could you help me!! I am very wrong about it!


Michael Klishin

unread,
Sep 22, 2014, 6:28:05 AM9/22/14
to rabbitm...@googlegroups.com, Shen John
On 22 September 2014 at 14:24:25, Shen John (john.sh...@gmail.com) wrote:
> I met this problem: Shovels status is always starting while
> There is not any queueA shovels consumer.

How are you configuring the shovel? What is in RabbitMQ log? What version of
RabbitMQ do you run? 
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Shen John

unread,
Sep 22, 2014, 7:04:50 AM9/22/14
to rabbitm...@googlegroups.com, john.sh...@gmail.com

Sorry. I nearly forgot.

RabbitMQ version is RabbitMQ 3.3.4, Erlang 17.

config of dynamic shovels like this: 

./rabbitmqctl set_parameter shovel pg.sync.idc0-1.cache.usercenter '{"src-uri": "amqps://myworker1:95...@127.0.0.1", "src-queue": "pg.sync.upstream.cache.usercenter", "dest-uri": "amqps://myworker1:952d@54.191.110.233", "dest-queue": "pg.sync.downstream.cache.usercenter"}'

I have no idea how to see, so I copy all possible log as attachment. The picture is what is the problem. 
Hope it useful. 
Thanks.



在 2014年9月22日星期一UTC+8下午6时28分05秒,Michael Klishin写道:
similar.txt

Michael Klishin

unread,
Sep 22, 2014, 7:13:13 AM9/22/14
to rabbitm...@googlegroups.com, Shen John
On 22 September 2014 at 15:04:56, Shen John (john.sh...@gmail.com) wrote:
> I have no idea how to see, so I copy all possible log as attachment.
> The picture is what is the problem.

The log says that the peer keeps closing TCP connections (or rejects connections,
or becomes unreachable). Judging from the timestamps, this happens several times
a minute, sometimes every few seconds.

Please make sure that 54.191.110.233 is reachable over the network. It'd also
be worth investigating if 54.191.110.233's RabbitMQ log contains any info
about inbound connections, and if a firewall in between may be constantly
dropping them. 

Also, I'd edit passwords and other sensitive information (such as complete
IP addresses) out of public list emails in the future ;)

Shen John

unread,
Sep 22, 2014, 7:24:30 AM9/22/14
to rabbitm...@googlegroups.com, john.sh...@gmail.com
Yes. The network is not reliable even worse.
I tried "telnet 54.191.110.233 5372", it can be connectted.
What the reason is the consumer of "pg.sync.upstream.cache.usercenter" missing ?
how can i keep it going even the network failed?

在 2014年9月22日星期一UTC+8下午7时13分13秒,Michael Klishin写道:

Simon MacMullen

unread,
Sep 22, 2014, 7:28:46 AM9/22/14
to Shen John, rabbitm...@googlegroups.com
On 22/09/14 12:24, Shen John wrote:
> Yes. The network is not reliable even worse.
> I tried "telnet 54.191.110.233 5372", it can be connectted.

You mean 5672 right?

> What the reason is the consumer of "pg.sync.upstream.cache.usercenter"
> missing ?
> how can i keep it going even the network failed?

It will automatically restart connections on network failure.

But the logs you posted show the shovel repeatedly failing to connect,
generally having its connection closed in the early stages of
establishing a connection. That looks like extreme unreliability.

Is there some firewall or piece of networking equipment in the way which
could be forcibly closing network connections shortly after they're opened?

Also: check the logs on the remote broker - does it say anything about
these failures?

Cheers, Simon

> 在 2014年9月22日星期一UTC+8下午7时13分13秒,Michael Klishin写道:
>
> On 22 September 2014 at 15:04:56, Shen John (john.sh...@gmail.com
> <javascript:>) wrote:
> > I have no idea how to see, so I copy all possible log as attachment.
> > The picture is what is the problem.
>
> The log says that the peer keeps closing TCP connections (or rejects
> connections,
> or becomes unreachable). Judging from the timestamps, this happens
> several times
> a minute, sometimes every few seconds.
>
> Please make sure that 54.191.110.233 is reachable over the network.
> It'd also
> be worth investigating if 54.191.110.233's RabbitMQ log contains any
> info
> about inbound connections, and if a firewall in between may be
> constantly
> dropping them.
>
> Also, I'd edit passwords and other sensitive information (such as
> complete
> IP addresses) out of public list emails in the future ;)
> --
> MK
>
> Staff Software Engineer, Pivotal/RabbitMQ
>
> --
> You received this message because you are subscribed to the Google
> Groups "rabbitmq-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to rabbitmq-user...@googlegroups.com
> <mailto:rabbitmq-user...@googlegroups.com>.
> To post to this group, send email to rabbitm...@googlegroups.com
> <mailto:rabbitm...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Sep 22, 2014, 7:29:21 AM9/22/14
to rabbitm...@googlegroups.com, Shen John
 On 22 September 2014 at 15:24:38, Shen John (john.sh...@gmail.com) wrote:
> What the reason is the consumer of "pg.sync.upstream.cache.usercenter"
> missing ?

My guess is that until Shovel connects to the destination, it doesn't consume
anything.

> how can i keep it going even the network failed?

It will reconnect. I wonder if our restart strategy gives up too easily
(basically, doesn't try to reconnect indefinitely). In which case it can
be considered a bug.

Shen John

unread,
Sep 22, 2014, 7:29:29 AM9/22/14
to rabbitm...@googlegroups.com, john.sh...@gmail.com
Here is the snapshort. 

I want the shovel always work. But sometimes it not work until i restart it manual



在 2014年9月22日星期一UTC+8下午7时24分30秒,Shen John写道:

Shen John

unread,
Sep 22, 2014, 7:30:23 AM9/22/14
to rabbitm...@googlegroups.com, john.sh...@gmail.com
Yes. 5672

在 2014年9月22日星期一UTC+8下午7时28分46秒,Simon MacMullen写道:

Shen John

unread,
Sep 22, 2014, 7:34:37 AM9/22/14
to rabbitm...@googlegroups.com, john.sh...@gmail.com
Here is the snapshort. 

I want the shovel always work. But sometimes it not work until i restart it manual


在 2014年9月22日星期一UTC+8下午7时29分21秒,Michael Klishin写道:

Shen John

unread,
Sep 22, 2014, 7:39:28 AM9/22/14
to rabbitm...@googlegroups.com, john.sh...@gmail.com
Could you tell me how to check it will try restart?
It has disconnect for nearly 5 hours

在 2014年9月22日星期一UTC+8下午7时29分21秒,Michael Klishin写道:

Michael Klishin

unread,
Sep 22, 2014, 7:43:36 AM9/22/14
to rabbitm...@googlegroups.com, Shen John
On 22 September 2014 at 15:39:33, Shen John (john.sh...@gmail.com) wrote:
> Could you tell me how to check it will try restart?
> It has disconnect for nearly 5 hours

You will see messages about supervisor restarting its children in the log.

However, there's a limit to how many restarts are considered normal.
So the issue is that the thing that restarts the shovel gives up too early
because your network is really unreliable (or something kills of TCP
connections frequently for no good reason).

We've suggested investigating if you see any connections get through to the
other RabbitMQ node, and if there can be a firewall or proxy of some kind
in between.

Changing restart strategy to "infinity" is a possibility, we just haven't run
into this particular issue to date.

Michael Klishin

unread,
Sep 22, 2014, 7:46:17 AM9/22/14
to rabbitm...@googlegroups.com, Shen John
 On 22 September 2014 at 15:43:41, Michael Klishin (mic...@rabbitmq.com) wrote:
> However, there's a limit to how many restarts are considered
> normal.
> So the issue is that the thing that restarts the shovel gives up
> too early
> because your network is really unreliable (or something kills
> of TCP
> connections frequently for no good reason).

Simon suggests it should not be the case. Well, something prevents Shovel
from reconnecting when it repeatedly fails to set up a TCP connection.

To tell what exactly we need to know more about why your network happens
to be so unreliable. There may be an artificial reason.

Shen John

unread,
Sep 22, 2014, 7:50:29 AM9/22/14
to rabbitm...@googlegroups.com, john.sh...@gmail.com
Can i modify retry count greater? Or how can it resolved it ?

在 2014年9月22日星期一UTC+8下午7时43分36秒,Michael Klishin写道:

Shen John

unread,
Sep 22, 2014, 7:51:08 AM9/22/14
to rabbitm...@googlegroups.com, john.sh...@gmail.com
By the way. What is the strategy of retry?

在 2014年9月22日星期一UTC+8下午7时50分29秒,Shen John写道:

Michael Klishin

unread,
Sep 22, 2014, 7:52:47 AM9/22/14
to rabbitm...@googlegroups.com, Shen John
 On 22 September 2014 at 15:50:34, Shen John (john.sh...@gmail.com) wrote:
> Can i modify retry count greater? Or how can it resolved it ?

It is hardcoded. You can investigate what's going on with your network and
report to this list, we'll see if there's a way to make Shovel more robust
in the environments similar to yours.

Shen John

unread,
Sep 22, 2014, 7:54:07 AM9/22/14
to rabbitm...@googlegroups.com, john.sh...@gmail.com
There is a Great Firewall in China. So it is important

在 2014年9月22日星期一UTC+8下午7时46分17秒,Michael Klishin写道:

Michael Klishin

unread,
Sep 22, 2014, 7:55:11 AM9/22/14
to rabbitm...@googlegroups.com, Shen John
On 22 September 2014 at 15:51:14, Shen John (john.sh...@gmail.com) wrote:
> By the way. What is the strategy of retry?

Roughly this:

 * Throw away an Erlang process that was connecting
 * Spawn a new one

the process will try to connect using the URI configured basically
as soon as it is spawned.

See [1] and [2] if you want to learn more.

1. http://learnyousomeerlang.com/errors-and-processes
2. http://learnyousomeerlang.com/supervisors 

Shen John

unread,
Sep 22, 2014, 7:59:01 AM9/22/14
to rabbitm...@googlegroups.com, john.sh...@gmail.com
could i adjust "reconnect delay“ or add a tcp proxy(like HA proxy) to resolve it tempery?

在 2014年9月22日星期一UTC+8下午7时55分11秒,Michael Klishin写道:

Michael Klishin

unread,
Sep 22, 2014, 7:59:51 AM9/22/14
to rabbitm...@googlegroups.com, Shen John
On 22 September 2014 at 15:54:12, Shen John (john.sh...@gmail.com) wrote:
> There is a Great Firewall in China. So it is important

We need *technical details* about what causes TCP connections to fail. I'd expect
firewalls that block access to resources for political reasons usually
reject connections outright, not every few seconds, though. Not that I know
anything about the specifics, of course.

So far you are giving is *zero* technical details about that. 
If you are crossing the country boundaries, please at least state that clearly. 

Also, if it's the Great Firewall that gets you, the RabbitMQ team likely cannot do much about your issue.

Michael Klishin

unread,
Sep 22, 2014, 8:00:29 AM9/22/14
to rabbitm...@googlegroups.com, Shen John
On 22 September 2014 at 15:59:07, Shen John (john.sh...@gmail.com) wrote:
> could i adjust "reconnect delay“ or add a tcp proxy(like HA proxy)
> to resolve it tempery?

You can't. No, a TCP proxy in between will not help, at best it will delay
the inevitable.

Shen John

unread,
Sep 22, 2014, 8:11:18 AM9/22/14
to rabbitm...@googlegroups.com, john.sh...@gmail.com
Why TCP proxy can not help me?
Should i do like these: 
./rabbitmqctl set_parameter shovel pg.sync.idc0-1.cache.usercenter '{"src-uri": "amqps://myworker1:952d@127.0.0.1", "src-queue": "pg.sync.upstream.cache.usercenter", "dest-uri": "amqps://myworker1:952d@54.191.110.233", "dest-queue": "pg.sync.downstream.cache.usercenter"}'

54.191.110.233 is a proxy url, it redirect the tcp to US.
Could you tell Why we can not?

在 2014年9月22日星期一UTC+8下午8时00分29秒,Michael Klishin写道:

Michael Klishin

unread,
Sep 22, 2014, 8:20:08 AM9/22/14
to rabbitm...@googlegroups.com, Shen John
On 22 September 2014 at 16:11:24, Shen John (john.sh...@gmail.com) wrote:
> 54.191.110.233 is a proxy url, it redirect the tcp to US.
> Could you tell Why we can not?

It depends on where your proxy will be located, actually. My thinking
was about having a proxy inside the country.

In that case, even if your proxy will reconnect indefinitely, Shovel will notice
that the peer went down. HAproxy is not entirely transparent and the "HA" in
its name should not mislead you.

Once Shovel notices a connection issue, it will cause an exception, which will
cause a restart, which takes us back to square one.

If you are sure you are crossing the national boundary,
consider hosting your proxy in a different location. EU and UK come to mind
since they are roughly between East Asia and North America, but I see that
54.191.110.233 is in Oregon, so a proxy in South-East Asia or Pacific may work better.
 Then see if TCP connections to it are more reliable.
The proxy will relay your traffic to 54.191.110.233.

Feel free to try this. Your goal is to find a location that has better
connectivity from your local node.

Shen John

unread,
Sep 22, 2014, 8:32:09 AM9/22/14
to rabbitm...@googlegroups.com, john.sh...@gmail.com
Oh. Thanks very very much. Could i ask some one more question?
I have one question while nobody answer my question.  And I can not found any document detailed. : (

在 2014年9月22日星期一UTC+8下午8时20分08秒,Michael Klishin写道:

Michael Klishin

unread,
Sep 22, 2014, 8:33:34 AM9/22/14
to rabbitm...@googlegroups.com, Shen John
On 22 September 2014 at 16:32:16, Shen John (john.sh...@gmail.com) wrote:
> Could i ask some one more question?

Sure, just start a separate thread (discussion).
Reply all
Reply to author
Forward
0 new messages