rabbitmq-shovel: does it process messages one-by-one in a single actor?

450 views
Skip to first unread message

Sokolovskiy Roman

unread,
Feb 9, 2017, 5:34:08 AM2/9/17
to rabbitmq-users
Good day,


We observed a rather strange behavior of a rabbit shovel replicating messages between datacenters (WAN) during network latency issues.
The performance of rabbit-shovel had drastically decreased, while ordinary send over TCP socket with same rate (~ 300 Kb/sec, 30 msgs - 10 msgs/sec) worked just fine.

While digging deeper inside rabbitmq-shovel source code it appeared to me that a single  rabbit_shovel_worker sends messages over
amqp_channel, which, under the covers, uses ordinary bang operator to send messages. As far as I understand, in that case multiplying
the latency of a network by a factor of 2 will increase message processing time by a factor of 4 at least (TCP send + TCP ack).

Are my reverse-engineered deductions correct? If so, is it possible to organize some kind of batching on the sender in order to increase
throughput on unstable networks as it is done under the covers of ZeroMQ sockets?


Thank you in advance,
Roma

Michael Klishin

unread,
Feb 9, 2017, 5:52:34 AM2/9/17
to rabbitm...@googlegroups.com
Shovels consume and re-publish messages in a single process
but the "one by one" argument is more involved and depends on
the prefetch and ack mode configured: http://www.rabbitmq.com/shovel-dynamic.html.

Those values can be tweaked to trade off safety (both as far as republishing goes
and as far as how many messages a Shovel can keep around in memory at a time)
for throughput.

Shovel uses a regular Erlang client consumer and publishes
much like any other client would.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,
Feb 9, 2017, 5:54:27 AM2/9/17
to rabbitm...@googlegroups.com
Your analysis has some merit but ignores the prefetch aspect, publisher
confirms and the fact that TCP sockets perform their own buffering,
plus neither Shovel (unless you use QoS = 1 and the most conservative ack mode) nor TCP
work synchronously.

So I don't think the math is quite that simple in practice.

On Thu, Feb 9, 2017 at 1:34 PM, Sokolovskiy Roman <sokolovs...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sokolovskiy Roman

unread,
Feb 9, 2017, 6:16:27 AM2/9/17
to rabbitmq-users
Thank you for your answer, Michael,


I understand that rabbit-level acks are delivered asynchronously or can be configured to do so.

Nonetheless, an ordinary erlang bang operator, executed sequentially for a number of messages,
will wait for TCP ACK arrival for each message before getting to the next one.
In that case no benefit from TCP asynchronicity is gained.

Is that correct?


Roma

On Thursday, February 9, 2017 at 1:54:27 PM UTC+3, Michael Klishin wrote:
Your analysis has some merit but ignores the prefetch aspect, publisher
confirms and the fact that TCP sockets perform their own buffering,
plus neither Shovel (unless you use QoS = 1 and the most conservative ack mode) nor TCP
work synchronously.

So I don't think the math is quite that simple in practice.
On Thu, Feb 9, 2017 at 1:34 PM, Sokolovskiy Roman <sokolovs...@gmail.com> wrote:
Good day,


We observed a rather strange behavior of a rabbit shovel replicating messages between datacenters (WAN) during network latency issues.
The performance of rabbit-shovel had drastically decreased, while ordinary send over TCP socket with same rate (~ 300 Kb/sec, 30 msgs - 10 msgs/sec) worked just fine.

While digging deeper inside rabbitmq-shovel source code it appeared to me that a single  rabbit_shovel_worker sends messages over
amqp_channel, which, under the covers, uses ordinary bang operator to send messages. As far as I understand, in that case multiplying
the latency of a network by a factor of 2 will increase message processing time by a factor of 4 at least (TCP send + TCP ack).

Are my reverse-engineered deductions correct? If so, is it possible to organize some kind of batching on the sender in order to increase
throughput on unstable networks as it is done under the covers of ZeroMQ sockets?


Thank you in advance,
Roma

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Sokolovskiy Roman

unread,
Feb 9, 2017, 9:58:47 AM2/9/17
to rabbitmq-users
Michael,

My previous suggestion that the problem lies within Erlang bang operation was wrong.
Bang operation returns immediately even if network is down.

However, I found another place where the problem of synchronisity over high-latency networks can lie in:

rabbit_shovel_worker:handle_info({#'basic.deliver' ...)
-> rabbit_shovel_worker:publish(... #'basic.publish' ...)
-> ok = amqp_channel:call(OutboundChan, ...) // which is remote, as far as I understand (?)
-> gen_server:call(...)

The docs for amqp_channel:call indicate:
%% Note that for asynchronous methods, the synchronicity implied by
%% 'call' only means that the client has transmitted the method to
%% the broker. It does not necessarily imply that the broker has
%% accepted responsibility for the message.

That means, if I understand it correctly, that the client (single rabbit_shovel_worker in our case) waits for the broker (on another DC)
to receive the message (not to process it, that's true) and reply with 'ok'. That involves a full RTT over the WAN and, consequently,
in case of saturated networks where WAN RTT might be the biggest impact in overall processing latency, drastically decreases shovel's throughput.

If my deductions are correct, that 'call' guarantees that only one rabbit message is on the wire at any particular moment.

Is that correct?


Roman


On Thursday, February 9, 2017 at 1:54:27 PM UTC+3, Michael Klishin wrote:
Your analysis has some merit but ignores the prefetch aspect, publisher
confirms and the fact that TCP sockets perform their own buffering,
plus neither Shovel (unless you use QoS = 1 and the most conservative ack mode) nor TCP
work synchronously.

So I don't think the math is quite that simple in practice.
On Thu, Feb 9, 2017 at 1:34 PM, Sokolovskiy Roman <sokolovs...@gmail.com> wrote:
Good day,


We observed a rather strange behavior of a rabbit shovel replicating messages between datacenters (WAN) during network latency issues.
The performance of rabbit-shovel had drastically decreased, while ordinary send over TCP socket with same rate (~ 300 Kb/sec, 30 msgs - 10 msgs/sec) worked just fine.

While digging deeper inside rabbitmq-shovel source code it appeared to me that a single  rabbit_shovel_worker sends messages over
amqp_channel, which, under the covers, uses ordinary bang operator to send messages. As far as I understand, in that case multiplying
the latency of a network by a factor of 2 will increase message processing time by a factor of 4 at least (TCP send + TCP ack).

Are my reverse-engineered deductions correct? If so, is it possible to organize some kind of batching on the sender in order to increase
throughput on unstable networks as it is done under the covers of ZeroMQ sockets?


Thank you in advance,
Roma

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Karl Nilsson

unread,
Feb 9, 2017, 11:10:22 AM2/9/17
to rabbitm...@googlegroups.com
No that is not correct. Messages will be delivered to the shovel worker process who will re-publish them to the remote. The 'call' only refers to the local channel process, not to the channel "on the other side". No network roundtrip is implicit in the call. Once the worker process has handed the message off to the channel process it is free to process other messages in it's mailbox such as further deliveries or publisher confirms/acks.

Regarding your performance issues I am less sure. What shovel configuration have you got in place?

Cheers
Karl

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Karl Nilsson

Staff Software Engineer, Pivotal/RabbitMQ

Sokolovskiy Roman

unread,
Feb 19, 2017, 2:55:42 PM2/19/17
to rabbitmq-users
Hi Karl,


We've figured out that the root cause of performance degradation wasn't hidden in shovel's internals.

The true reason was that we've configured tcp buffers size exactly like in an example on the documentation
page (http://www.rabbitmq.com/networking.html) to be 192KB. This disallowed OS kernel's auto-tuning of
the buffer size (which could be increased up to 128 MB).

Thank you very much for your help!


Roman

Michael Klishin

unread,
Feb 19, 2017, 3:20:22 PM2/19/17
to rabbitm...@googlegroups.com
Roman,

Thank you for reporting back.

May I ask you what kind of throughput do the links over which you use Shovel offer?
Is it 100 MBit/s, 1 GBit/s or more?

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages