Messages requeued on high load

53 views
Skip to first unread message

Adam Cigánek

unread,
Oct 29, 2009, 6:08:23 AM10/29/09
to AMQP
Hello there,

We have a system which background-process some task using RabbitMQ and
AMQP gem. There is a simple requeueing of messages whose processing
failed. This is the relevant part of the code:

queue = @amq.queue('foo')
queue.subscribe(:ack => true) do |headers, payload|
data = Marshal.load(payload)

begin
process(data)
headers.ack
rescue Exception => exception
log_exception(exception, data)
end
end

EM.add_periodic_timer(30) { queue.recover(:requeue => true) }

So the messages is ack'd only if the processing succeeds, otherwise,
the exception is logged and if I understand it correctly, the message
should be requeued. Then every 30 seconds, the queue is recovered. The
recover method is monkey-patched into MQ::Queue class like this:

def recover(options = {})
@mq.callback do
@mq.send(AMQP::Protocol::Basic::Recover.new({:requeue =>
false}.merge(options)))
end

self
end

Now the problem is this: The system was working fine for some time,
but after the load exceeded certain level*, it went crazy and messages
were never removed from the queue, but were processed over and over
again. And this happened to messages whose processing did NOT raise
any exception.

* not sure if the high load was the real cause though

Then, after some experimenting, I added

@amq.prefetch(1)

to the top of the script and this seems to solve the problem. So the
messages are no longer requeued when their processing succeeds, no
matter how hight the load is.

So my questions (basically I just want to understand things):

1. Why were the messages requeued in the first place?
2. How come the prefetch(1) thing solved it?
3. Did it really solve it? :)

Daniel DeLeo

unread,
Nov 6, 2009, 10:15:29 AM11/6/09
to ruby...@googlegroups.com
Hi Adam,
I'm not exactly an EventMachine expert, but this is what's happening as best I can understand and explain:

Without the prefetch option, RabbitMQ is sending messages to subscribers as quickly as the subscribers can fetch them from the buffer. Under lighter loads, this is fine, because you'll clear all the messages from the queue before any trouble strikes. When the load increases, however, you reach a state where EventMachine is constantly reading messages from the buffer, and the event loop never gets the opportunity to fire the callbacks that send acks back to RabbitMQ.

So, to answer your questions: the messages were re-queued because they were never acked from the broker's point of view, prefetch solves this because it prevents the situation I described above from occurring, thus your acks get sent and RabbitMQ knows not to re-queue the messages. And yes, this does solve the problem.

HTH, and anyone feel free to jump in and correct me on any details I might've butchered.

Dan DeLeo

Joseph Palermo

unread,
Nov 7, 2009, 1:58:01 AM11/7/09
to AMQP
I spent a few days stuck on a similar problem. It helps to understand
what Event Machine is doing under the covers.

Event Machine has a single thread of execution, and it simply runs a
loop. Inside that loop it checks to see if it should send any data
out, checks to see if it should process any incoming data, and checks
to see if it should execute any periodic timers (there may be another
step or two that it does, it has been a while).

The problem arises that RabbitMQ will basically try to keep your
socket full of data at all times, so what happens is Event Machine
gets stalled in the loop pulling data off the incoming socket. It
won't do this forever, but for us RabbitMQ was sending about 500
messages at a time, so Event Machine would process all 500 incoming
messages before it had a chance to continue the loop. RabbitMQ had
already filled the socket back up, so the next time through the loop
it will get stalled for just as long.

Depending on how long it takes to process each of your messages, this
may not be a problem, but if your Event Machine loop is getting
stalled for more than a couple of seconds it probably will be. In
Adam's case I would expect that by the time it got done processing the
incoming socket, it was time to process the periodic timer that told
RabbitMQ to do the Recover and THEN it actually had a chance to start
the loop over again and probably sent all the ACK's out before getting
stalled on the incoming socket again.

Setting the prefetch to 1 solves this problem because RabbitMQ will
only keep a single message in your incoming socket so your loop
doesn't get stalled and it can process the outgoing ACK's quickly.
The trade off here is that you may need to spend time waiting for
RabbitMQ to send data to your incoming socket because it is not always
full. You can probably tweak the prefetch value to be something more
optimal where it won't stall the loop, but will still keep a healthy
number of messages queued up in the socket.



On Nov 6, 7:15 am, Daniel DeLeo <d...@kallistec.com> wrote:
> Hi Adam,
> I'm not exactly an EventMachine expert, but this is what's happening as best
> I can understand and explain:
>
> Without the prefetch option, RabbitMQ is sending messages to subscribers as
> quickly as the subscribers can fetch them from the buffer. Under lighter
> loads, this is fine, because you'll clear all the messages from the queue
> before any trouble strikes. When the load increases, however, you reach a
> state where EventMachine is constantly reading messages from the buffer, and
> the event loop never gets the opportunity to fire the callbacks that send
> acks back to RabbitMQ.
>
> So, to answer your questions: the messages were re-queued because they were
> never acked from the broker's point of view, prefetch solves this because it
> prevents the situation I described above from occurring, thus your acks get
> sent and RabbitMQ knows not to re-queue the messages. And yes, this does
> solve the problem.
>
> HTH, and anyone feel free to jump in and correct me on any details I
> might've butchered.
>
> Dan DeLeo
>
Reply all
Reply to author
Forward
0 new messages