[rabbitmq-discuss] Memory Management Concerns / Questions

70 views
Skip to first unread message

AndyB

unread,
Dec 22, 2011, 11:12:55 AM12/22/11
to rabbitmq...@lists.rabbitmq.com
http://www.rabbitmq.com/memory.html

I'm running into some problems that I'm hoping that someone here can
help me with. Let me first state my setup. I'm running the latest
release of RabbitMQ on CentOS (64bit) in a clustered configuration
with 2 nodes in my Production environment. In my development
environment I only have 1 node though. My code is written in C# and
is using the downloaded SDK from the website. The memory
configuration value is set to 40% as the default.

I ran into a problem in my Production environment a week ago where
work was building up on my queue faster than my consumers were
processing it. Unfortunately I wasnt able to perform any debugging or
metrics gathering before the system was recycled or the queue was
purged. I'm still not exactly sure exactly which happened. But what
I can tell you is that it looked to me like exceptions were occurring
for both the publishers and consumers and nothing was really
happening. I have since added more consumers and have not had the
problem occur again. But this obviously has me concerned.

I am currently in the process of trying to reproduce this problem in
my development environment, but I'm running into some confusing
results. My test case is to run multiple publishers sending messages
over and over as fast as possible while also have a single consumer
processing those messages with a delay. The goal is to obviously
force messages to pile up in memory on the node to trigger the memory
alert. Since I have both publishers and consumers connected, I'm
expecting to see the consumers begin to get some sort of exception
saying that they cant submit anymore work while the consumer continues
to process. But what I'm actually seeing is that the publishers
continue piling on and the consumer continues to process, but the
machine eventually runs out of disk space and crashes.

Is there anyone that can advise me on what I'm doing wrong or help me
figure out what changes I can make?

Thanks
Andy
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq...@lists.rabbitmq.com
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Steve Powell

unread,
Dec 22, 2011, 1:31:33 PM12/22/11
to AndyB, rabbitmq...@lists.rabbitmq.com
Andy,

By the 'lastest release' I presume you mean 2.7.0 (a week ago that would be
true, but now we have released 2.7.1.)

Please can you show us your rabbitmq log after a crash? The test environment
case would be interesting, though the production system is probably
experiencing issues which are of an application nature.

If the consumers were failing (getting exceptions) for some application reason,
and they were responsible for acknowledging the messages, then it is entirely
likely that the messages they failed to process are being re-queued, and the
queue is building up without being drained. The application exceptions are
therefore very interesting, and you should take care that a consumer should
acknowledge the message WHEN IT HAS BEEN DEALT WITH -- even if that means it
was an error that has been logged/passed on, or whatever.

In the latest release message re-queuing preserves the order (for a single
consumer) so a failing message might reappear in the queue at the head -- this
would cause it to be re-processed more-or-less straight away, and if it is a
message payload logical error of some kind, this is likely to fail again --
and so on. The previous releases did not try to preserve message order, and
so the failing messages could be overtaken by non-failing ones. This would not
show up as a bottle-neck during high-load.

I'm interested in your RAM configuration. It is entirely possible for rabbitmq
to run out of memory even if there is a threshold set. Continual high
publication rates, especially with new publishers all the time, will not be
blocked entirely even then. This might mean that the test you ran is giving
you misleading information.

When the memory piles up you could also issue a rabbitmqctl report, which
should tell us the general situation.

Steve Powell (a perplexed bunny)
----------some more definitions from the SPD----------
avoirdupois (phr.) 'Would you like peas with that?'
distribute (v.) To denigrate an award ceremony.
definite (phr.) 'It's hard of hearing, I think.'
modest (n.) The most mod.

Matthias Radestock

unread,
Dec 22, 2011, 2:30:28 PM12/22/11
to AndyB, rabbitmq...@lists.rabbitmq.com
On 22/12/11 16:12, AndyB wrote:
> My test case is to run multiple publishers sending messages
> over and over as fast as possible while also have a single consumer
> processing those messages with a delay. The goal is to obviously
> force messages to pile up in memory on the node to trigger the memory
> alert. Since I have both publishers and consumers connected, I'm
> expecting to see the consumers begin to get some sort of exception
> saying that they cant submit anymore work while the consumer continues
> to process. But what I'm actually seeing is that the publishers
> continue piling on and the consumer continues to process, but the
> machine eventually runs out of disk space and crashes.

Rabbit does not enforce any disk space limits. So if producers keep
publishing messages, and consumers do not keep up, rabbit will page more
and more messages to disk and eventually disk space runs out.

Limiting disk space usage, similar to the way rabbit limits ram usage,
is on our todo list. Though in practice most systems have ample disk
space and operational monitoring usually detects situations that fill up
the disk with plenty of time to take remedial action.

Matthias.

AndyB

unread,
Dec 22, 2011, 2:35:44 PM12/22/11
to rabbitmq...@lists.rabbitmq.com
Thanks for the quick reply guys. With the help of one of my systems
guys, I think that we have made some progress on this issue. After
performing further tests with my application and doing some close
monitoring on the server node itself. We noticed some unexpected
messages being written to disk for the queue which is transient. That
immediately confused us. And after some creative google'ing we came
across the following page which helped us a great deal ...
http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2011-October/015793.html.
Given that information and some tweaks to the tests, I was finally
able to reproduce the memory throttling of publishers that I was
expecting to see. And I think that I have a better understanding now
for the RAM usage and statistics and their tie to the watermark value.

But this gets me to my next concern ... The throttling of the
publishers appears to be a blocking operation from within the
"BasicPublish" method and as best I can tell, I'm not seeing any sort
of timeout. This indefinite blocking would be pretty bad if it were
to occur in my production environment. Is there a way that I can
specify some sort of timeout for the blocking operation? I see an
overload for "BasicPublish" which has boolean params for "immediate"
and "mandatory". Are either of those meant to help in this situation?

Thanks
Andy

> > rabbitmq-disc...@lists.rabbitmq.com


> >https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>
> _______________________________________________
> rabbitmq-discuss mailing list

> rabbitmq-disc...@lists.rabbitmq.comhttps://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Matthias Radestock

unread,
Dec 22, 2011, 2:48:06 PM12/22/11
to AndyB, rabbitmq...@lists.rabbitmq.com
On 22/12/11 19:35, AndyB wrote:
> But this gets me to my next concern ... The throttling of the
> publishers appears to be a blocking operation from within the
> "BasicPublish" method and as best I can tell, I'm not seeing any
> sort of timeout. This indefinite blocking would be pretty bad if it
> were to occur in my production environment.

It's not going to block indefinitely, since the paging to disk, or the
consumption of messages, will free up space, at which point the
producers are unblocked.

Just think of this situation as being the same as a slow network /
server; it's indistinguishable from that.

> Is there a way that I can specify some sort of timeout for the
> blocking operation?

No.

Matthias.

AndyB

unread,
Dec 22, 2011, 2:53:59 PM12/22/11
to rabbitmq...@lists.rabbitmq.com
In my test case, I have intentionally coded the consumer to never
catch up. During the process, I saw the publishers get blocked after
the alert, messages were streamed to disk enough to get below the
watermark, and the publishers were unblocked. Of course they hit the
watermark soon after and the same process happened again. I'd say
this happened maybe 4 or 5 times and then they just remained in a
blocked state. I let the test continue running for almost 10 minutes
and they never became unblocked and the watermark alert never seemed
to clear on the server. So I guess that means that it stopped
streaming the messages to disk or something? Either way, I'm going to
have to come up with a way to implement something in my code to try to
avoid tying up a thread for an unknown amount of time. Any ideas? Is
there a way to subscribe to the watermark event or something?

Andy

On Dec 22, 2:48 pm, Matthias Radestock <matth...@rabbitmq.com> wrote:
> On 22/12/11 19:35, AndyB wrote:
>
> > But this gets me to my next concern ... The throttling of the
> > publishers appears to be a blocking operation from within the
> > "BasicPublish" method and as best I can tell, I'm not seeing any
> > sort of timeout.  This indefinite blocking would be pretty bad if it
> > were to occur in my production environment.
>
> It's not going to block indefinitely, since the paging to disk, or the
> consumption of messages, will free up space, at which point the
> producers are unblocked.
>
> Just think of this situation as being the same as a slow network /
> server; it's indistinguishable from that.
>
> > Is there a way that I can specify some sort of timeout for the
> > blocking operation?
>
> No.
>
> Matthias.
> _______________________________________________
> rabbitmq-discuss mailing list

> rabbitmq-disc...@lists.rabbitmq.comhttps://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Jason J. W. Williams

unread,
Dec 22, 2011, 3:02:54 PM12/22/11
to Matthias Radestock, rabbitmq...@lists.rabbitmq.com, AndyB
>> Is there a way that I can specify some sort of timeout for the
>> blocking operation?
>
>
> No.

I think a timeout would be a useful setting on the client side. If
Rabbit backs up enough, you don't want your frontend just appearing to
hang to the frontend's client. At some point it should raise a timeout
exception you can handle and give feedback to the app's client.

-J

Matthias Radestock

unread,
Dec 22, 2011, 3:21:36 PM12/22/11
to AndyB, rabbitmq...@lists.rabbitmq.com
Andy,

On 22/12/11 19:53, AndyB wrote:
> In my test case, I have intentionally coded the consumer to never
> catch up. During the process, I saw the publishers get blocked
> after the alert, messages were streamed to disk enough to get below
> the watermark, and the publishers were unblocked. Of course they hit
> the watermark soon after and the same process happened again. I'd
> say this happened maybe 4 or 5 times and then they just remained in
> a blocked state. I let the test continue running for almost 10
> minutes and they never became unblocked and the watermark alert never
> seemed to clear on the server. So I guess that means that it
> stopped streaming the messages to disk or something?

It's possible that you ran into another limit...

Each message has a small memory footprint, even when it has been paged
to disk. So there is an upper bound to how many messages rabbit can hold
on to. When that bound is reached producers will remain blocked until
some messages have been consumed.

There is way around that - changing the message store index module to
one that is operating entirely on disk. See
https://github.com/rabbitmq/rabbitmq-toke. However, I don't know of any
production rabbits that have actually run into this limitation.

> Either way, I'm going to have to come up with a way to implement
> something in my code to try to avoid tying up a thread for an unknown
> amount of time. Any ideas?

You could perform all the invocations of the AMQP client's publish
methods from a single, separate thread. It would sit in a loop, pulling
messages off a bounded buffer / queue (e.g. an ArrayBlockingQueue if
this was Java; there are presumably similar data structures in C#, and
worst case you could roll your own) and invoking the publish methods in
the AMQP client.

The "real" publishing threads simply deposit messages into the buffer /
queue using an operation with a timeout, e.g. BlockingQueue.offer(E o,
long timeout, TimeUnit unit).


Matthias.
_______________________________________________
rabbitmq-discuss mailing list

rabbitmq...@lists.rabbitmq.com
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Matthias Radestock

unread,
Dec 22, 2011, 3:25:42 PM12/22/11
to Jason J. W. Williams, rabbitmq...@lists.rabbitmq.com, AndyB
On 22/12/11 20:02, Jason J. W. Williams wrote:
> I think a timeout would be a useful setting on the client side. If
> Rabbit backs up enough, you don't want your frontend just appearing to
> hang to the frontend's client. At some point it should raise a timeout
> exception you can handle and give feedback to the app's client.

There's a bug open to address that. Alas it's been open for >1 year and
is a major piece of work across all our clients. So I doubt we will work
on it any time soon.

Matthias.

AndyB

unread,
Dec 22, 2011, 3:31:55 PM12/22/11
to rabbitmq...@lists.rabbitmq.com
This sounds brutal and exactly what I was afraid of.

Andy

On Dec 22, 3:25 pm, Matthias Radestock <matth...@rabbitmq.com> wrote:
> On 22/12/11 20:02, Jason J. W. Williams wrote:
>
> > I think a timeout would be a useful setting on the client side. If
> > Rabbit backs up enough, you don't want your frontend just appearing to
> > hang to the frontend's client. At some point it should raise a timeout
> > exception you can handle and give feedback to the app's client.
>
> There's a bug open to address that. Alas it's been open for >1 year and
> is a major piece of work across all our clients. So I doubt we will work
> on it any time soon.
>
> Matthias.
> _______________________________________________
> rabbitmq-discuss mailing list

> rabbitmq-disc...@lists.rabbitmq.comhttps://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Reply all
Reply to author
Forward
0 new messages