Getting size of message body before consumption

343 views
Skip to first unread message

James Gardner

unread,
Dec 31, 2014, 11:22:41 AM12/31/14
to rabbitm...@googlegroups.com

Given that clients have restrictions on max memory usage, and that large
message bodies can potentially challenge these, how is it possible to
determine the size of a message (body) before 'retrieving' it (consuming
it) ?
'prefetch-size' wouldn't seem to help since the broker won't pre-send
the message at all if it's larger than that size, so it's not as if you
can get a 'peek' at the headers first :). Not that that would apparently
help either, since there is no standard attribute indicating message
body size.
I had Java foremost in mind, as a client.
Thanks,
James Gardner

Michael Klishin

unread,
Dec 31, 2014, 12:46:00 PM12/31/14
to James Gardner, rabbitm...@googlegroups.com
It's not possible.

MK
> --
> You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
> To post to this group, send an email to rabbitm...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

James Gardner

unread,
Jan 5, 2015, 11:01:07 AM1/5/15
to Michael Klishin, Simon MacMullen, rabbitm...@googlegroups.com

And in trying to make it possible, am I correct in saying that one would
be quite fundamentally limited by the AMQP spec, or is there any room
for a workaround, theoretically? For instance, is it possible to have a
handleDelivery method with an input stream instead of a byte[] for the
body argument, or is such an incremental transfer simply not supportable
by the underlying protocol?

Also, if it is not possible to deal with it at the consumer level, would
it be possible to deal with it by filtering higher up? The queue would
seem to be the appropriate place to do this; eg. by not routing messages
to the queue if the body size exceeds a given limit - a queue argument
like 'x-max-msg-body-bytes'...

It just seems to me that being able to crash N clients at once, just by
publishing a large message, is a liability.

- James


On 12/31/2014 11:45 AM, Michael Klishin wrote:
> It's not possible.
>
> MK
>
>> On 31/12/2014, at 19:22, James Gardner<james....@noaa.gov> wrote:
>>
>>
>> Given that clients have restrictions on max memory usage, and that large message bodies can potentially challenge these, how is it possible to determine the size of a message (body) before 'retrieving' it (consuming it) ?
>> 'prefetch-size' wouldn't seem to help since the broker won't pre-send the message at all if it's larger than that size, so it's not as if you can get a 'peek' at the headers first :). Not that that would apparently help either, since there is no standard attribute indicating message body size.
>> I had Java foremost in mind, as a client.
>> Thanks,
>> James Gardner
>>
>> --
>> You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email torabbitmq-use...@googlegroups.com.
>> To post to this group, send an email torabbit...@googlegroups.com.
>> For more options, visithttps://groups.google.com/d/optout.

Laing, Michael

unread,
Jan 5, 2015, 11:06:45 AM1/5/15
to James Gardner, Michael Klishin, Simon MacMullen, rabbitm...@googlegroups.com
Hmmm. We test for large message bodies prior to publish, put them in S3 (if too large), and substitute a URL header.

One could probably do something similar with a custom 'ingest' exchange.

ml

To unsubscribe from this group and stop receiving emails from it, send an email torabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send an email torabbitmq-users@googlegroups.com.
For more options, visithttps://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send an email to rabbitmq-users@googlegroups.com.

Michael Klishin

unread,
Jan 5, 2015, 11:35:55 AM1/5/15
to James Gardner, Simon MacMullen, rabbitm...@googlegroups.com
It would take developing a pretty low level client.

MK

Alvaro Videla

unread,
Jan 5, 2015, 11:54:08 AM1/5/15
to Michael Klishin, James Gardner, Simon MacMullen, rabbitm...@googlegroups.com
I think a custom exchange could do the filtering by message size but the exchange would need to size() the binaries while routing
On Mon, Jan 5, 2015 at 5:35 PM Michael Klishin <mkli...@pivotal.io> wrote:
It would take developing a pretty low level client.

MK

> On 5/1/2015, at 19:00, James Gardner <james....@noaa.gov> wrote:
>
>
> And in trying to make it possible, am I correct in saying that one would be quite fundamentally limited by the AMQP spec, or is there any room for a workaround, theoretically? For instance, is it possible to have a handleDelivery method with an input stream instead of a byte[] for the body argument, or is such an incremental transfer simply not supportable by the underlying protocol?
>
> Also, if it is not possible to deal with it at the consumer level, would it be possible to deal with it by filtering higher up? The queue would seem to be the appropriate place to do this; eg. by not routing messages to the queue if the body size exceeds a given limit - a queue argument like 'x-max-msg-body-bytes'...
>
> It just seems to me that being able to crash N clients at once, just by publishing a large message, is a liability.
>
> - James
>
>
>> On 12/31/2014 11:45 AM, Michael Klishin wrote:
>> It's not possible.
>>
>> MK
>>
>>> On 31/12/2014, at 19:22, James Gardner<james....@noaa.gov> wrote:
>>>
>>>
>>> Given that clients have restrictions on max memory usage, and that large message bodies can potentially challenge these, how is it possible to determine the size of a message (body) before 'retrieving' it (consuming it) ?
>>> 'prefetch-size' wouldn't seem to help since the broker won't pre-send the message at all if it's larger than that size, so it's not as if you can get a 'peek' at the headers first :). Not that that would apparently help either, since there is no standard attribute indicating message body size.
>>> I had Java foremost in mind, as a client.
>>> Thanks,
>>> James Gardner
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email torabbitmq-users+unsubscribe@googlegroups.com.
>>> To post to this group, send an email torabbitmq-users@googlegroups.com.

>>> For more options, visithttps://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
> To post to this group, send an email to rabbitmq-users@googlegroups.com.

> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send an email to rabbitmq-users@googlegroups.com.

gatesvp

unread,
Jan 5, 2015, 4:13:07 PM1/5/15
to rabbitm...@googlegroups.com, james....@noaa.gov, mkli...@pivotal.io, si...@rabbitmq.com
I'm going to echo Michael Laing here, this really sounds like a job that belongs at the Publisher level. (i.e.: the person populating the queue)

If you are worried that workers receiving a message of size N will crash, then they're also going to crash when trying to read the data of size N from a DB right? Unless you're storing this large piece data somewhere else (file system, DB, etc.) and then streaming it. If your consumers need to stream the data in chunks, then it probably belongs in a real engine for storing the data (like S3 or a DB) and you need pass the ID in the message.

Are you in some odd situation where you only control the Consumer but not the Publisher?


On Monday, January 5, 2015 8:06:45 AM UTC-8, Michael Laing wrote:
Hmmm. We test for large message bodies prior to publish, put them in S3 (if too large), and substitute a URL header.

One could probably do something similar with a custom 'ingest' exchange.

ml
On Mon, Jan 5, 2015 at 5:00 PM, James Gardner <james....@noaa.gov> wrote:

And in trying to make it possible, am I correct in saying that one would be quite fundamentally limited by the AMQP spec, or is there any room for a workaround, theoretically? For instance, is it possible to have a handleDelivery method with an input stream instead of a byte[] for the body argument, or is such an incremental transfer simply not supportable by the underlying protocol?

Also, if it is not possible to deal with it at the consumer level, would it be possible to deal with it by filtering higher up? The queue would seem to be the appropriate place to do this; eg. by not routing messages to the queue if the body size exceeds a given limit - a queue argument like 'x-max-msg-body-bytes'...

It just seems to me that being able to crash N clients at once, just by publishing a large message, is a liability.

- James


On 12/31/2014 11:45 AM, Michael Klishin wrote:
It's not possible.

MK

On 31/12/2014, at 19:22, James Gardner<james....@noaa.gov>  wrote:


Given that clients have restrictions on max memory usage, and that large message bodies can potentially challenge these, how is it possible to determine the size of a message (body) before 'retrieving' it (consuming it) ?
'prefetch-size' wouldn't seem to help since the broker won't pre-send the message at all if it's larger than that size, so it's not as if you can get a 'peek' at the headers first :). Not that that would apparently help either, since there is no standard attribute indicating message body size.
I had Java foremost in mind, as a client.
Thanks,
James Gardner

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email torabbitmq-use...@googlegroups.com.
To post to this group, send an email torabbit...@googlegroups.com.
For more options, visithttps://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send an email to rabbitm...@googlegroups.com.

James Gardner

unread,
Jan 6, 2015, 3:24:48 PM1/6/15
to gatesvp, rabbitm...@googlegroups.com, mkli...@pivotal.io, si...@rabbitmq.com

No, we do control the publishers in our environment at the current time. If we did decide to allow publishing from other areas of the organization, that might introduce less predictability.
I think you're right in drawing a comparison with a DB, in that developers wouldn't assume to have to check data size before reading from there either. The only difference with a DB or file system of course, is that I *can* easily get a data size before I ingest it into my app, if there was a need for it. I would also (typically) have the option of streaming it, whereas neither option exists within RabbitMQ.

Actually, my original intent was to confirm that there was no mechanism to protect consumers against potential memory exhaustion from unexpectedly large messages, and to get a feel for whether there was an appetite for remedying that in some way. I get the feeling this might be a non-issue within the community, and I can understand that perspective. Ultimately I suppose there's nothing you can't mitigate at another level, whether it be by filtering 'foreign' publishes or doing sanity checks on trusted publishing processes.

Cheers,
James

Gaëtan Voyer-Perrault

unread,
Jan 7, 2015, 2:24:49 AM1/7/15
to James Gardner, rabbitm...@googlegroups.com, mkli...@pivotal.io, si...@rabbitmq.com
A quick look at Amazon's SQS would indicate that it actually implements a Max Payload Size of 256kB, which it calls "Large":

MSMQ seems to have an actual incoming limit of 4MB for messages.

It seems like ActiveMQ has a 2GB limit for messages:

ZeroMq does not seem to have a hard limit, but possibly a configurable one?

As just a random community member, I kind of like the idea of "front-door" blocking of over-sized messages. If RabbitMQ had a configuration setting to turn away any message greater than a given size, then that sounds like a feature to me :)

However, once a Message is on the Queue, I don't know that "selective reading" is a killer feature. It's just such a weird scenario that involves lots of data living in a volatile state for a very long time while possibly eating dangerous amounts of RAM and requiring multiple round-trips. And it's all pretty easy to work around with existing technology.

Regards;
Gates


Michael Klishin

unread,
Jan 7, 2015, 2:34:44 AM1/7/15
to Gaëtan Voyer-Perrault, James Gardner, rabbitm...@googlegroups.com, si...@rabbitmq.com
Modern RabbitMQ versions have a limit of 2GB, larger publishes will result in an error.

MK

Simon MacMullen

unread,
Jan 7, 2015, 7:06:30 AM1/7/15
to Gaëtan Voyer-Perrault, James Gardner, rabbitm...@googlegroups.com, mkli...@pivotal.io
On 07/01/15 07:24, Gaëtan Voyer-Perrault wrote:
> As just a random community member, I kind of like the idea of
> "front-door" blocking of over-sized messages. If RabbitMQ had a
> configuration setting to turn away any message greater than a given
> size, then that sounds like a feature to me :)

That sounds like a reasonable idea; I've filed a bug to look at it.

> However, once a Message is on the Queue, I don't know that "selective
> reading" is a killer feature. It's just such a weird scenario that
> involves lots of data living in a volatile state for a very long time
> while possibly eating dangerous amounts of RAM and requiring multiple
> round-trips. And it's all pretty easy to work around with existing
> technology.

Yeah, "selective anything" doesn't really describe how RabbitMQ's queues
are designed to work. We very much assume selection happens at the
exchange level.

Cheers, Simon

Reply all
Reply to author
Forward
0 new messages