[rabbitmq-discuss] Exactly Once Delivery

1,126 views
Skip to first unread message

Mike Petrusis

unread,
Aug 3, 2010, 4:43:56 AM8/3/10
to rabbitmq...@lists.rabbitmq.com
Greetings,

In reviewing the mailing list archives, I see various threads which state that ensuring "exactly once" delivery requires deduplication by the consumer. For example the following:

"Exactly-once requires coordination between consumers, or idempotency,
even when there is just a single queue. The consumer, broker or network
may die during the transmission of the ack for a message, thus causing
retransmission of the message (which the consumer has already seen and
processed) at a later point." http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2009-July/004237.html

In the case of competing consumers which pull messages from the same queue, this will require some sort of shared state between consumers to de-duplicate messages (assuming the consumers are not idempotent).

Our application is using RabbitMQ to distribute tasks across multiple workers residing on different servers, this adds to the cost of sharing state between the workers.

Another message in the email archive mentions that "You can guarantee exactly-once delivery if you use transactions, durable queues and exchanges, and persistent messages, but only as long as any failing node eventually recovers."

From the way I understand it, the transaction only affects the publishing of the message into RabbitMQ and prevents the message from being queued until the transaction is committed. If this is correct, I don't understand how the transaction will prevent a duplicate message in the previously mentioned scenarios that will cause a retransmission. Can anybody clarify?

On a more practical level:

What's the recommended way to deal with the potential of duplicate messages?
What do people generally do?
Is this a rare enough edge case that most people just ignore it?


Thanks,

Mike
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq...@lists.rabbitmq.com
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Matthew Sackman

unread,
Aug 5, 2010, 7:22:52 AM8/5/10
to Mike Petrusis, rabbitmq...@lists.rabbitmq.com
Hi Mike,

On Tue, Aug 03, 2010 at 04:43:56AM -0400, Mike Petrusis wrote:
> In reviewing the mailing list archives, I see various threads which state that ensuring "exactly once" delivery requires deduplication by the consumer. For example the following:
>
> "Exactly-once requires coordination between consumers, or idempotency,
> even when there is just a single queue. The consumer, broker or network
> may die during the transmission of the ack for a message, thus causing
> retransmission of the message (which the consumer has already seen and
> processed) at a later point." http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2009-July/004237.html
>
> In the case of competing consumers which pull messages from the same queue, this will require some sort of shared state between consumers to de-duplicate messages (assuming the consumers are not idempotent).
>
> Our application is using RabbitMQ to distribute tasks across multiple workers residing on different servers, this adds to the cost of sharing state between the workers.
>
> Another message in the email archive mentions that "You can guarantee exactly-once delivery if you use transactions, durable queues and exchanges, and persistent messages, but only as long as any failing node eventually recovers."

All the above is sort of wrong. You can never *guarantee* exactly once
(there's always some argument about whether receiving message duplicates
but relying on idempotency is achieving exactly once. I don't feel it
does, and this should become clearer as to why further on...)

The problem is publishers. If the server on which RabbitMQ is running
crashes, after commiting a transaction containing publishes, it's
possible the commit-ok message may get lost. Thus the publishers still
think they need to republish, so wait until the broker comes back up and
then republishes. This can happen an infinite number of times: the
publishers connect, start a transaction, publish messages, commit the
transaction and then the commit-ok gets lost and so the publishers
repeat the process.

As a result, on the clients, you need to detect duplicates. Now this is
really a barrier to making all operations idempotent. The problem is
that you never know how many copies of a message there will be. Thus you
never know when it's safe to remove messages from your dedup cache. Now
things like redis apparently have the means to delete entries after an
amount of time, which would at least allow you to avoid the database
eating up all the RAM in the universe, but there's still the possibility
that after the entry's been deleted, another duplicate will come along
which you now won't detect as a duplicate.

This isn't just a problem with RabbitMQ - in any messaging system, if
any message can be lost, you can not achieve exactly once semantics. The
best you can hope for is a probability of a large number of 9s that you
will be able to detect all the duplicates. But that's the best you can
achieve.

Scaling horizontally is thus more tricky because, as you say, you may
now have multiple consumers which each receive one copy of a message.
Thus the dedup database would have to be distributed. With high message
rates, this might well become prohibitive because of the amount of
network traffic due to transactions between the consumers.

> What's the recommended way to deal with the potential of duplicate messages?

Currently, there is no "recommended" way. If you have a single consumer,
it's quite easy - something like tokyocabinet should be more than
sufficiently performant. For multiple consumers, you're currently going
to have to look at some sort of distributed database.

> Is this a rare enough edge case that most people just ignore it?

No idea. But one way of making your life easier is for the producer to
send slightly different messages on every republish (they would still
obviously need to have the same msg id). That way, if you detect a msg
with "republish count" == 0, then you know it's the first copy, so you
can insert async into your shared database and then act on the message.
You only need to do a query on the database whenever you receive a msg
with "republish count" > 0 - thus you can tune your database for
inserts and hopefully save some work - the common case will then be the
first case, and lookups will be exceedingly rare.

The question then is: if you've received a msg, republish count > 0 but
there are no entries in the database, what do you do? It shouldn't have
overtaken the first publish (though if consumers disconnected without
acking, or requeued messages, it could have), but you need to cause some
sort of synchronise operation between all the consumers to ensure none
are in the process of adding to the database - it all gets a bit hairy
at this point.

Thus if your message rate is low, you're much safer doing the insert and
select on every message. If that's too expensive, you're going to have
to think very hard indeed about how to avoid races between different
consumers thinking they're both/all responsible for acting on the same
message.

This stuff isn't easy.

Matthew

John Apps

unread,
Aug 5, 2010, 9:00:11 AM8/5/10
to rabbitmq...@lists.rabbitmq.com
Matthew,
  an excellent response and thank you for it! Yes, difficult it is!

It raises a somewhat philosophical discussion around where the onus is placed in terms of guaranteeing such things as 'guaranteed once', i.e., on the client side or on the server side? The JMS standard offers guaranteed once, whereby the onus is on the server (JMS implementation) and not on the client. 

What I am trying to say is that, in my opinion, client programs should be as 'simple' as possible with the servers doing all the hard work. This is what the JMS standard forces on implementors and, perhaps to a lesser extent today, do does AMQP.

Note: the word 'server' is horribly overloaded these days. It is used here to indicate the software with which clients, producers and consumers, communicate.

Oh well, off to librabbitMQ and some example programs written in COBOL...

Cheers, John
--
---
John Apps
(49) 171 869 1813

Tony Garnock-Jones

unread,
Aug 5, 2010, 9:48:40 AM8/5/10
to John Apps, rabbitmq...@lists.rabbitmq.com
John Apps wrote:
> The JMS standard offers guaranteed once

What exactly do they mean by that? In particular, how do they deal with
duplicates? Do they report failure, or silently let a dup through in certain
situations? If you could point me to the part of the spec that sets out the JMS
resolution of these issues, that's be really useful.

Tony

Tony Garnock-Jones

unread,
Aug 5, 2010, 9:50:46 AM8/5/10
to Mike Petrusis, rabbitmq...@lists.rabbitmq.com
Matthew Sackman wrote:
> As a result, on the clients, you need to detect duplicates. Now this is
> really a barrier to making all operations idempotent. The problem is
> that you never know how many copies of a message there will be. Thus you
> never know when it's safe to remove messages from your dedup cache.

The other piece of this is time-to-live (TTL). Given a finite-length dedup
cache and message TTL, you can detect and report failure. (And if the ack
travels upstream to the publisher, you can report failures at the send end,
too.) Without the TTL, you have silent dups on rare occasions.

Tony

David Wragg

unread,
Aug 5, 2010, 10:16:50 AM8/5/10
to Tony Garnock-Jones, rabbitmq...@lists.rabbitmq.com
Tony Garnock-Jones <to...@rabbitmq.com> writes:
> John Apps wrote:
>> The JMS standard offers guaranteed once
>
> What exactly do they mean by that? In particular, how do they deal
> with duplicates? Do they report failure, or silently let a dup through
> in certain situations? If you could point me to the part of the spec
> that sets out the JMS resolution of these issues, that's be really
> useful.

As an API spec, it's quite easy for JMS to mandate something apparently
impossible, without hinting at how it might actually be implemented.

Most of the spec says that the PERSISTENT delivery mode gives
"once-and-only-once" delivery. But section 4.4.13 (of JMS 1.1) admits
that there are a number of caveats to this. So it's really
"once-and-only-once-except-in-some-corner-cases".

I think the wrinkle that might prevent us saying that RabbitMQ gives the
same guarantees is on the publishing side. The caveats in JMS all seems
to apply only to the consuming side. But what happens with an AMQP
producer if the connection gets dropped before a tx.commit-ok gets back
to the client? In that case the client has to re-publish, leading to a
potential dup. This can be avoided by a de-dup filter on published
messages in the broker. I don't know if JMS brokers really go to such
lengths.

David

--
David Wragg
Staff Engineer, RabbitMQ
SpringSource, a division of VMware

Michael Bridgen

unread,
Aug 5, 2010, 10:17:28 AM8/5/10
to Tony Garnock-Jones, rabbitmq...@lists.rabbitmq.com
> John Apps wrote:
>> The JMS standard offers guaranteed once
>
> What exactly do they mean by that? In particular, how do they deal with
> duplicates? Do they report failure, or silently let a dup through in certain
> situations? If you could point me to the part of the spec that sets out the JMS
> resolution of these issues, that's be really useful.

For consumers, JMS has client ack mode; the application acknowledges
messages, and the server must not resend a message that has been
acknowledged.

A failure in the connection may result in the server resending a message
which the application thinks it has acknowledged. The spec suggests
"Since such clients cannot know for certain if a particular message has
been acknowledged, they must be prepared for redelivery of the last
consumed message.". I.e., the client application has to have an
idempotency barrier.

For producers, duplicate publishing is simply prohibited. As for
failure modes -- "A message that is redelivered due to session recovery
is not considered a duplicate message."

So JMS cannot magically do "exactly once" any more than anything else.


--Michael

Matthew Sackman

unread,
Aug 5, 2010, 10:25:18 AM8/5/10
to rabbitmq...@lists.rabbitmq.com
On Thu, Aug 05, 2010 at 03:17:28PM +0100, Michael Bridgen wrote:
> For producers, duplicate publishing is simply prohibited.

So that seems to suggest that every messages is universally unique?

If this is correct, who's responsibility is it to add GUIDs (or some
such) to every message? Does the client library do that automatically?

Matthew

Tony Menges

unread,
Aug 5, 2010, 12:16:14 PM8/5/10
to Matthew Sackman, rabbitmq...@lists.rabbitmq.com

The JMS provider sets the message id. It is supposed to be unique enough to be used for a "historical repository" but the scope of uniqueness is left to the provider. It is recommended that it should be at least unique for a given "installation". I don't think this helps on the publisher side since as you pointed out the notification of the completion of the publish might not make it back to the producer.

JMS requires the provider to set the redelivered flag (and optionally the delivery count) field if it thinks the message has been given to the application before. The application may or may not have seen it but this flag can be used to trigger the check for a duplicate by the application. The use of unique message ids helps on this end.

Tony Menges
VMware, Inc.

John Apps

unread,
Aug 5, 2010, 12:20:32 PM8/5/10
to Tony Garnock-Jones, rabbitmq...@lists.rabbitmq.com
From my possibly naive understanding of the spec, it means quite simply that a message will be delivered guaranteed once and only once; but I somehow do not think that that is quite what you were asking?

The nice part about JMS is that it is only an API spec and says nothing about implementation.
I would have to look into the spec to see what the answer is to the question: "...how do they deal with duplicates..." etc. If I find the time, I shall be happy to look at the odd JMS implementation and see what the various vendors do in cases such as that in question.
What I do know is that one can specify notification for when a message with "guaranteed delivery" simply cannot be delivered, for whatever reason. This can be to the client or, more likely, as a message from the 'server' to those that want to know.

A relatively unknown product called Reliable Transaction Router (RTR), architected and developed long ago by DEC and still maintained and developed by HP, warns, when it considers that a message *may* be a duplicate, i.e., has possibly been delivered previously, of the fact. This is also the case when messages are being 'replayed' after a server has been brought down and is now receiving messages which flowed through the network whilst it was down. 

There is much discussion around the word "guaranteed", the objection being that nothing can be "guaranteed". Of course it cannot, but if we take things to that extent, we may as well give up right away!

On Thu, Aug 5, 2010 at 15:48, Tony Garnock-Jones <to...@rabbitmq.com> wrote:
John Apps wrote:
> The JMS standard offers guaranteed once

What exactly do they mean by that? In particular, how do they deal with
duplicates? Do they report failure, or silently let a dup through in certain
situations? If you could point me to the part of the spec that sets out the JMS
resolution of these issues, that's be really useful.

Tony




Matthew Sackman

unread,
Aug 5, 2010, 12:25:36 PM8/5/10
to rabbitmq...@lists.rabbitmq.com
On Thu, Aug 05, 2010 at 09:16:14AM -0700, Tony Menges wrote:
> The JMS provider sets the message id. It is supposed to be unique enough to be used for a "historical repository" but the scope of uniqueness is left to the provider. It is recommended that it should be at least unique for a given "installation". I don't think this helps on the publisher side since as you pointed out the notification of the completion of the publish might not make it back to the producer.
>
> JMS requires the provider to set the redelivered flag (and optionally the delivery count) field if it thinks the message has been given to the application before. The application may or may not have seen it but this flag can be used to trigger the check for a duplicate by the application. The use of unique message ids helps on this end.

Ahh interesting. It would thus seem that JMS requires slightly more of
the producer when publishing messages (more logic is required in the
client library there) and AMQP possibly requires more at the consumer
side.

Mike Petrusis

unread,
Aug 5, 2010, 10:28:17 PM8/5/10
to rabbitmq...@lists.rabbitmq.com
Thanks all for the input. I've got a better understanding of the issues now and it sounds like the issue is the same regardless of the use of transactions.

Matthew's idea of having producers add a "republish count" to the messages is good suggestion to optimize the de-duplication of messages, but this only helps for messages resent by a producer.

Can messages get duplicated while they are propagating in the broker? If duplicates are produced in the broker they will have the same "republish count" and this method won't work.

Matthew Sackman

unread,
Aug 6, 2010, 5:40:11 AM8/6/10
to Mike Petrusis, rabbitmq...@lists.rabbitmq.com
On Thu, Aug 05, 2010 at 10:28:17PM -0400, Mike Petrusis wrote:
> Can messages get duplicated while they are propagating in the broker? If duplicates are produced in the broker they will have the same "republish count" and this method won't work.

Well, a message that is sent to an exchange which then results in the
message going to several queues will obviously be duplicated. But
presumably in that case, your consumers consuming from the different
queues would be doing different tasks with the messages, hence the need
for the different queues in the first place.

That aside, no, within a queue, Rabbit does not arbitrarily duplicate
messages.

Tim Fox

unread,
Aug 6, 2010, 5:43:56 PM8/6/10
to rabbitmq...@lists.rabbitmq.com
On 05/08/10 15:16, David Wragg wrote:
> Tony Garnock-Jones<to...@rabbitmq.com> writes:
>
>> John Apps wrote:
>>
>>> The JMS standard offers guaranteed once
>>>
>> What exactly do they mean by that? In particular, how do they deal
>> with duplicates? Do they report failure, or silently let a dup through
>> in certain situations? If you could point me to the part of the spec
>> that sets out the JMS resolution of these issues, that's be really
>> useful.
>>
> As an API spec, it's quite easy for JMS to mandate something apparently
> impossible, without hinting at how it might actually be implemented.
>
> Most of the spec says that the PERSISTENT delivery mode gives
> "once-and-only-once" delivery. But section 4.4.13 (of JMS 1.1) admits
> that there are a number of caveats to this. So it's really
> "once-and-only-once-except-in-some-corner-cases".
>
> I think the wrinkle that might prevent us saying that RabbitMQ gives the
> same guarantees is on the publishing side. The caveats in JMS all seems
> to apply only to the consuming side. But what happens with an AMQP
> producer if the connection gets dropped before a tx.commit-ok gets back
> to the client? In that case the client has to re-publish, leading to a
> potential dup. This can be avoided by a de-dup filter on published
> messages in the broker. I don't know if JMS brokers really go to such
> lengths.
>
Some do. It's fairly common for JMS brokers to implement duplicate
detection on the server side, to get around the "lost commit-ok problem"
and give us near as possible once and only once, from the publisher to
the server at least.

The way we do it in HornetQ is we have a well defined header key
"_HQ_DUP_ID". The client can set this with some unique value of it's
choice before sending (e.g. a GUID). When the server receives the
message if the _HQ_DUP_ID header is set, it looks up the value in it's
cache, and if it's seen it before it ignores it. The cache can
optionally be persisted.

On the client side, the producer can resend the message/transaction if
it does not receive a confirmation-ok, so it effectively makes
sends/commits idempotent.
David
>


--
Sent from my BBC Micro Model B

Tim Fox
JBoss

HornetQ - putting the buzz in messaging http://hornetq.org
http://hornetq.blogspot.com/
http://twitter.com/hornetq
irc://irc.freenode.net:6667#hornetq
f...@redhat.com

Matthew Sackman

unread,
Aug 7, 2010, 7:50:17 AM8/7/10
to rabbitmq...@lists.rabbitmq.com
On Fri, Aug 06, 2010 at 10:43:56PM +0100, Tim Fox wrote:
> The way we do it in HornetQ is we have a well defined header key
> "_HQ_DUP_ID". The client can set this with some unique value of it's
> choice before sending (e.g. a GUID). When the server receives the
> message if the _HQ_DUP_ID header is set, it looks up the value in
> it's cache, and if it's seen it before it ignores it. The cache can
> optionally be persisted.

How do you prevent the cache from growing without bound?

Matthew

John Apps

unread,
Aug 7, 2010, 11:09:45 AM8/7/10
to rabbitmq...@lists.rabbitmq.com
On Sat, Aug 7, 2010 at 13:50, Matthew Sackman <mat...@rabbitmq.com> wrote:
On Fri, Aug 06, 2010 at 10:43:56PM +0100, Tim Fox wrote:
> The way we do it in HornetQ is we have a well defined header key
> "_HQ_DUP_ID". The client can set this with some unique value of it's
> choice before sending (e.g. a GUID). When the server receives the
> message if the _HQ_DUP_ID header is set, it looks up the value in
> it's cache, and if it's seen it before it ignores it. The cache can
> optionally be persisted.

How do you prevent the cache from growing without bound?

Matthew

That's really like the piece of string question, no? Of course it can fill up, as can the DB where things are persisted for those cases where messages cannot be delivered.
Having an unique ID in every message is not something new and not restricted to messaging, of course. It is simply a very good idea!
TCP/IP claims to be a 'reliable' transport...The problem with that is that packets get 'lost' or 'dropped' or simply die of 'old age'. Similar, but more complex, problems exist with queuing.
What has not been touched on in this little discussion so far is the question of transactions, and I do not mean those in the 0.9.1 spec, but those described in the 1.0 spec. Here again, JMS is leading the way with something which in my mind is as necessary as guaranteed once (or at least once). Updating DBs from queues and posting the results of those updates to queues should be atomic; and if I want my debit/credit to happen once rather than many times or not at all, then a combination of transactions and guaranteed delivery becomes very attractive both to the designer and the developers. Yes, ACID comes to mind here...and it is indeed what I am referring to.

It is great to participate in conversations of this nature - thank you for putting up with my sometimes oblique ramblings:-)

Matthias Radestock

unread,
Aug 7, 2010, 3:46:50 PM8/7/10
to John Apps, rabbitmq...@lists.rabbitmq.com
John,

John Apps wrote:
> That's really like the piece of string question, no? Of course it can
> fill up, as can the DB where things are persisted for those cases where
> messages cannot be delivered.
> Having an unique ID in every message is not something new and not
> restricted to messaging, of course. It is simply a very good idea!

I believe Matthew was simply trying to point out that many of the
supposed guarantees of messaging systems are a lot softer than most
people think. In reality a "guarantee" is little more than an increase
in the probability that the right thing will happen. Coming clean about
that is going to be important for cloud computing to succeed - improving
the probabilities does come at a price, and for systems at massive
scales the cost/benefit calculations look quite different.

So, for example, using publisher-supplied message ids for de-duping
simply does not scale. Think what a genuine cloud messaging system would
have to do to handle the case where a producer injects the same message
first in a node in Australia and then in New York.

> What has not been touched on in this little discussion so far is the
> question of transactions

Similar considerations apply here. XA in the cloud? Hmmm.


Regards,

Matthias.

Tony Garnock-Jones

unread,
Aug 7, 2010, 4:13:07 PM8/7/10
to Matthias Radestock, rabbitmq...@lists.rabbitmq.com
Matthias Radestock wrote:
> So, for example, using publisher-supplied message ids for de-duping
> simply does not scale. Think what a genuine cloud messaging system would
> have to do to handle the case where a producer injects the same message
> first in a node in Australia and then in New York.

What is the problem you're thinking of? Would a setup like the following cope?

- publishers choose a message ID
- publishers choose a TTL
- receivers dedup based on message ID
- receiver's dedup buffer is expired by (some factor of) TTL
- each delivery contains an address to which the ACK should be routed

Tony

Matthias Radestock

unread,
Aug 7, 2010, 4:17:51 PM8/7/10
to Tony Garnock-Jones, rabbitmq...@lists.rabbitmq.com
Tony,

Tony Garnock-Jones wrote:
> Would a setup like the following cope?
>
> - publishers choose a message ID
> - publishers choose a TTL
> - receivers dedup based on message ID
> - receiver's dedup buffer is expired by (some factor of) TTL
> - each delivery contains an address to which the ACK should be routed

That's end-to-end dedup you are thinking of. Nothing wrong with that,
and it doesn't require the broker to do/know anything. The context of
the discussion here was a "broker dedups publishes" feature.

Matthias.

Alexis Richardson

unread,
Aug 7, 2010, 4:22:04 PM8/7/10
to rabbitmq...@lists.rabbitmq.com
On Sat, Aug 7, 2010 at 12:50 PM, Matthew Sackman <mat...@rabbitmq.com> wrote:
> On Fri, Aug 06, 2010 at 10:43:56PM +0100, Tim Fox wrote:
>> The way we do it in HornetQ is we have a well defined header key
>> "_HQ_DUP_ID". The client can set this with some unique value of it's
>> choice before sending (e.g. a GUID). When the server receives the
>> message if the _HQ_DUP_ID header is set, it looks up the value in
>> it's cache, and if it's seen it before it ignores it. The cache can
>> optionally be persisted.
>
> How do you prevent the cache from growing without bound?

AFAIK the normal approach with this system is to bound it arbitrarily.

Tony Garnock-Jones

unread,
Aug 7, 2010, 4:26:11 PM8/7/10
to Matthias Radestock, rabbitmq...@lists.rabbitmq.com
Matthias Radestock wrote:
> That's end-to-end dedup you are thinking of. Nothing wrong with that,
> and it doesn't require the broker to do/know anything. The context of
> the discussion here was a "broker dedups publishes" feature.

Well couldn't the broker take responsibility for the delivery itself by acking?
And use the same protocol (including perhaps a fresh message ID) to relay a
message out to a receiver on the outbound leg?

Perhaps it's a distraction: I guess I was really wondering what the New York vs
Australia part had to do with it.

Tony

Matthias Radestock

unread,
Aug 7, 2010, 4:33:06 PM8/7/10
to Tony Garnock-Jones, rabbitmq...@lists.rabbitmq.com
Tony,

Tony Garnock-Jones wrote:
> Matthias Radestock wrote:
>> That's end-to-end dedup you are thinking of. Nothing wrong with that,
>> and it doesn't require the broker to do/know anything. The context of
>> the discussion here was a "broker dedups publishes" feature.
>
> Well couldn't the broker take responsibility for the delivery itself by acking?
> And use the same protocol (including perhaps a fresh message ID) to relay a
> message out to a receiver on the outbound leg?
>
> Perhaps it's a distraction: I guess I was really wondering what the New York vs
> Australia part had to do with it.

1) publisher connects to cloud; ends up connecting to Australia node
2) publisher sends message
3) connection drops, publisher didn't get ack so must resend ...
4) publisher connects to cloud; ends up connecting to NY node
5) publisher re-sends message

At that point the cloud messaging service has two copies of the same
message in different locations. For the duplication to be detected, some
information needs to flow between the two locations. Which is expensive.


Regards,

Matthias.

Tony Garnock-Jones

unread,
Aug 7, 2010, 4:47:00 PM8/7/10
to Matthias Radestock, rabbitmq...@lists.rabbitmq.com
Matthias Radestock wrote:
> At that point the cloud messaging service has two copies of the same
> message in different locations. For the duplication to be detected, some
> information needs to flow between the two locations. Which is expensive.

Aha! I see. Thank you. I had the *publishers* being in different locations in
my mind. Which is of course a different problem.

In the context in which you originally posted the example ("using
publisher-supplied message ids for de-duping simply does not scale") I suspect
that the mention of publisher-supplied message IDs is not relevant: it's the
maintainance of any kind of dedup buffer at all in more than one place at the
same time that's not scalable.

Isn't the fundamental state-synchronisation-between-server-nodes problem
independent of the choice of message ID?

Tony

Matthew Sackman

unread,
Aug 7, 2010, 5:25:27 PM8/7/10
to Matthias Radestock, rabbitmq...@lists.rabbitmq.com
On Sat, Aug 07, 2010 at 08:46:50PM +0100, Matthias Radestock wrote:
> I believe Matthew was simply trying to point out that many of the
> supposed guarantees of messaging systems are a lot softer than most
> people think.

Well, in my mind, if you "guarantee" something, then you can offer a
proof in some branch of maths that the property you are guaranteeing can
never be violated except in the circumstances you qualify the statement
with.

That is why, as far as I'm concerned, no messaging system, regardless of
whether it's implemented in a computer or not, can ever guarantee
exactly once semantics: I can prove (or rather, other people have
proven) that provided any individual message can be lost, you can either
achieve "at least once" semantics or "at most once" semantics.

Quantum mechanics may offer a way to avoid messages ever being lost, but
I suspect that's probably not going to be sorted out in time for
RabbitMQ 3.0.

> In reality a "guarantee" is little more than an
> increase in the probability that the right thing will happen.

Indeed. And importantly, under different circumstances, that probability
can vary wildly. Once you start combining different systems, and the
error rates compound, the probability that the whole thing works end to
end can end up looking very shaky. I suspect the insurance industry will
do well here.

> > What has not been touched on in this little discussion so far is

> > the question of transactions. .. Updating DBs from queues and


> > posting the results of those updates to queues should be atomic; and
> > if I want my debit/credit to happen once rather than many times or
> > not at all, then a combination of transactions and guaranteed
> > delivery becomes very attractive both to the designer and the
> > developers. Yes, ACID comes to mind here...and it is indeed what I
> > am referring to.

Distributed transactions rely on 2-phase or 3-phase commit or some
variant thereof (eg there's an improved 3-phase build on paxos). Whilst
the protocols themselves are sound, my understanding is (and please
correct me if I'm wrong) that they all rely on, once everyone's agreed
there is no problem committing the transaction, no participant will
renege on that promise. But of course, no participant can actually
guarantee that. Thus distributed transactions can never guarantee ACID.

All you can get is increasing probability that the outcome will be
atomic, consistent and isolated. But you can never guarantee it.

Matthew

Alexis Richardson

unread,
Aug 7, 2010, 5:32:49 PM8/7/10
to Matthias Radestock, John Apps, rabbitmq...@lists.rabbitmq.com
On Sat, Aug 7, 2010 at 10:25 PM, Matthew Sackman <mat...@rabbitmq.com> wrote:
> On Sat, Aug 07, 2010 at 08:46:50PM +0100, Matthias Radestock wrote:
>> I believe Matthew was simply trying to point out that many of the
>> supposed guarantees of messaging systems are a lot softer than most
>> people think.
>
> Well, in my mind, if you "guarantee" something, then you can offer a
> proof in some branch of maths that the property you are guaranteeing can
> never be violated except in the circumstances you qualify the statement
> with.
>
> That is why, as far as I'm concerned, no messaging system, regardless of
> whether it's implemented in a computer or not, can ever guarantee
> exactly once semantics: I can prove (or rather, other people have
> proven) that provided any individual message can be lost, you can either
> achieve "at least once" semantics or "at most once" semantics.

AFAICT, the normal way this plays out is that people seek to provide
>=1 semantics for publisher-broker, and <=1 semantics for
broker-consumer. Which for most people is 'good enough'.


> Quantum mechanics may offer a way to avoid messages ever being lost, but
> I suspect that's probably not going to be sorted out in time for
> RabbitMQ 3.0.

This would be renamed QbitMQ. Delivery would be determined by opening
a box. No bunnies would be harmed in this experiment.

Martin Sustrik

unread,
Aug 8, 2010, 6:30:04 AM8/8/10
to Alexis Richardson, rabbitmq...@lists.rabbitmq.com
Alexis Richardson wrote:

>> Quantum mechanics may offer a way to avoid messages ever being lost, but
>> I suspect that's probably not going to be sorted out in time for
>> RabbitMQ 3.0.
>
> This would be renamed QbitMQ. Delivery would be determined by opening
> a box. No bunnies would be harmed in this experiment.

Even better:

If message is missing, try to guess what was that the sender intended to
send. Deliver it to the receiver. If it turns out later on that the
guess was incorrect, cancel the transaction. Additional advantage is
that you can get negative latencies this way.

Martin

John Apps

unread,
Aug 9, 2010, 4:41:24 AM8/9/10
to Martin Sustrik, Alexis Richardson, rabbitmq...@lists.rabbitmq.com
On Sun, Aug 8, 2010 at 12:30, Martin Sustrik <sus...@250bpm.com> wrote:
Alexis Richardson wrote:

Quantum mechanics may offer a way to avoid messages ever being lost, but
I suspect that's probably not going to be sorted out in time for
RabbitMQ 3.0.

This would be renamed QbitMQ.  Delivery would be determined by opening
a box.  No bunnies would be harmed in this experiment.
 
Thus spoke the CEO - end of discussion!



Even better:

If message is missing, try to guess what was that the sender intended to send. Deliver it to the receiver. If it turns out later on that the guess was incorrect, cancel the transaction. Additional advantage is that you can get negative latencies this way.

Martin

It is good to see humour in discussions of this nature; it would be even better if those implementing the applications were to share the same humour! I suspect the world of open source is at times a different one to that which I seem to work in. Oh well, back to the drawing board.

Alexis Richardson

unread,
Aug 9, 2010, 4:46:38 AM8/9/10
to John Apps, rabbitmq...@lists.rabbitmq.com
John

On Mon, Aug 9, 2010 at 9:41 AM, John Apps <john...@gmail.com> wrote:
>
> It is good to see humour in discussions of this nature; it would be even
> better if those implementing the applications were to share the same humour!
> I suspect the world of open source is at times a different one to that which
> I seem to work in. Oh well, back to the drawing board.

John we take this stuff as seriously as anyone. That's why we do it
professionally.

We've often found customers whose requirements include "please break
the laws of physics, and cure cancer". I'm sure you know what I mean.
If you could advise us on how to best help in such cases, we're all
ears.

In the meantime we are stuck in world where "guaranteed" has no fixed
connotation.

alexis

Tim Fox

unread,
Aug 9, 2010, 6:49:42 AM8/9/10
to rabbitmq...@lists.rabbitmq.com
On 07/08/10 12:50, Matthew Sackman wrote:
> On Fri, Aug 06, 2010 at 10:43:56PM +0100, Tim Fox wrote:
>
>> The way we do it in HornetQ is we have a well defined header key
>> "_HQ_DUP_ID". The client can set this with some unique value of it's
>> choice before sending (e.g. a GUID). When the server receives the
>> message if the _HQ_DUP_ID header is set, it looks up the value in
>> it's cache, and if it's seen it before it ignores it. The cache can
>> optionally be persisted.
>>
> How do you prevent the cache from growing without bound?
>
Currently we use a circular buffer. It's up to the user to make it big
enough for their use case. For JMS, where persistent sends/ tx commit
must be synchronous, the buffer size would need to be somewhat larger
than the max number of producers, since there's never more than one
"inflight" message at any time per producer. But there's certainly an
element of guesswork here.

In HornetQ we also provide an interface above and beyond JMS, which
allows the user to receive an *asynchronous* ack that the message they
sent (or tx commit) has been received ok on the server so they can clear
it from their local resend cache. Since this is async it's not limited
by network latency as in the blocking JMS case. The downside is many
messages can be in-flight at any time per producer so the caches need to
be larger.

To do all of this without being limited by arbitrary cache size, would
need some kind of "ack of ack" (we don't implement this yet)- i.e. 1)
the client sends message to server, 2) server sends ack back to to
client to say "received-ok" 3) client sends further ack from client to
server saying received-received-ok-ok. At point 2) the client can clear
their resend cache. At point 3) the server can clear it's cache. I
believe AMQP 1.0 specifies something similar to this too (?)

So.. this could scale. You'd have a further buffer per producer on the
server side. If you're using TCP on the server, then every connection
will have it's own buffer anyway. The extra buffer per producer should
be of the same order of size as the TCP buffer, since it's effectively
defined by a window, kind of similar to the TCP window size.

Like others have said, 100% once and only once delivery doesn't happen.
To get very near at 100% you can implement stuff like the above, and
also make sure your storage is highly redundant, also mirrored in
geographically distributed sites in case the building blows up. Then
you've got bugs in your own app, device drivers or the operating system
that screw your once and only once (for example last week I hit a bug in
the Linux kernel TCP impl which can cause packets to be lost at high
load) that you might hit well before having to take quantum effects into
account :)


> Matthew
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq...@lists.rabbitmq.com
> https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>

--
Sent from my BBC Micro Model B

Tim Fox
JBoss

_______________________________________________

Tony Garnock-Jones

unread,
Aug 9, 2010, 11:35:59 AM8/9/10
to Tim Fox, rabbitmq...@lists.rabbitmq.com
Tim Fox wrote:
> To do all of this without being limited by arbitrary cache size, would
> need some kind of "ack of ack"

That's an occasionally-useful optimization, but it really only works well with
message UUIDs (i.e. never repeating a message ID). Even then, in the general
case you're likely to want to bound the time you wait for the message
transmission to complete anyway (consider what happens when an ack or ack-ack
goes missing), it seems simpler (not to mention much less chatty) to me to
avoid ack-acks and go with a Delta-T style timeout-bounded buffer. At that
point you're free to choose any kind of message ID space, including
compressible fixed-length reusable spaces like those in TCP/SCTP etc.

> Like others have said, 100% once and only once delivery doesn't happen.
> To get very near at 100% you can implement stuff like the above, and
> also make sure your storage is highly redundant, also mirrored in
> geographically distributed sites in case the building blows up. Then
> you've got bugs in your own app, device drivers or the operating system
> that screw your once and only once (for example last week I hit a bug in
> the Linux kernel TCP impl which can cause packets to be lost at high
> load) that you might hit well before having to take quantum effects into
> account :)

Amen :-)

Tony

Martin Sustrik

unread,
Aug 10, 2010, 3:52:08 AM8/10/10
to Alexis Richardson, rabbitmq...@lists.rabbitmq.com
Alexis Richardson wrote:

> We've often found customers whose requirements include "please break
> the laws of physics, and cure cancer". I'm sure you know what I mean.
> If you could advise us on how to best help in such cases, we're all
> ears.
>
> In the meantime we are stuck in world where "guaranteed" has no fixed
> connotation.

Yes. People tend to have unrealistic expectations.

It kind of reminds me of requirements for "security". Yes, application
can be made more resistant to attacks, however, "security" is not a
purely technical issue. It's a technical and _operational_ issue. To
achieve "security" you have to use good software _and_ introduce a sane
security policy in your organisation.

Same with guaranteed delivery. While applications can do all kinds of
tricks to improve reliability at some point you'll have to load the data
to USB key and ride across the country to deliver it.

So the goal of designing "guaranteed delivery" is twofold IMO. First,
it's doing all the tweaking of the software necessary to get as much
nines of reliability as possible. Second, its promoting sane
organisational patterns (patterns that would prevent message lost or
duplication).

Let me give a simple example:

1. Messages are confirmed by simple ack. There's no replay
functionality. This guarantees no duplicates.
2. Sender has a timeout for getting an ack. When the ack does not
arrive, the message is moved to dead letter queue. (Note that messages
in the DLQ are "dubious", i.e. they may have been delivered or not, we
don't know).
3. There's a person in the organisation responsible for the dead letter
queue. He goes through the queue once a day and tries to find out what's
the actual state of the dubious messages is (using personal
conversation, phone, checking the production system etc.) Delivered
messages can be simply deleted from the DLQ. Lost messages can be
resubmitted in different ways (doing a new transaction, reading the
details of the transaction over the phone etc.)

My 2c.
Martin

Reply all
Reply to author
Forward
0 new messages