RabbitMQ message deduplication plugin

2,223 views
Skip to first unread message

noxdafox

unread,
Mar 1, 2018, 12:52:59 PM3/1/18
to rabbitmq-users
Hi everybody,

I often found myself in need of de-duplicating RabbitMQ messages and I usually ended up relying on producer/consumer implementations based on third party data storages.

To learn a bit more about RabbitMQ and Elixir, I decided to build a simple message de-duplication plugin.

It is pretty simple and I didn't manage to test it at scale yet but so far it looks promising.

I shared it on my git account:
https://github.com/noxdafox/rabbitmq-message-deduplication

Contributions and comments are more than welcome :)

Matteo.

Michael Klishin

unread,
Mar 1, 2018, 1:15:20 PM3/1/18
to rabbitm...@googlegroups.com
Thank you!

It's interesting that you decided to deduplicate at the exchange level but I can see how
this is much easier than any other alternative :)

It would be nice if you could

1) Document what RabbitMQ versions are supported
2) Produce and distribute binary builds via GitHub releases

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

noxdafox

unread,
Mar 2, 2018, 3:04:55 AM3/2/18
to rabbitmq-users
On Thursday, 1 March 2018 20:15:20 UTC+2, Michael Klishin wrote:
Thank you!

It's interesting that you decided to deduplicate at the exchange level but I can see how
this is much easier than any other alternative :)

As this was my first approach to RabbitMQ plugin development, I chose the easiest alternative. Plugin development process is not well documented and if you're new to Erlang it gets overwhelming at first.
There were quite few exchange-based examples on the Web so I could at least follow some reference.

Moreover, what I needed was a way to filter duplicates as soon as possible. With this approach I can build several routes without the need of worrying about duplicates on each route.

Later on, I will add queue-level deduplication as well.
 

It would be nice if you could

1) Document what RabbitMQ versions are supported
2) Produce and distribute binary builds via GitHub releases

Do you have any reference examples on how RabbitMQ community distributes binaries via GitHub?
 

On Thu, Mar 1, 2018 at 8:52 PM, noxdafox <noxd...@gmail.com> wrote:
Hi everybody,

I often found myself in need of de-duplicating RabbitMQ messages and I usually ended up relying on producer/consumer implementations based on third party data storages.

To learn a bit more about RabbitMQ and Elixir, I decided to build a simple message de-duplication plugin.

It is pretty simple and I didn't manage to test it at scale yet but so far it looks promising.

I shared it on my git account:
https://github.com/noxdafox/rabbitmq-message-deduplication

Contributions and comments are more than welcome :)

Matteo.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Mar 2, 2018, 11:58:15 AM3/2/18
to rabbitm...@googlegroups.com
Just produce a release (an .ez), tag it and create a GitHub release with it.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

noxdafox

unread,
Mar 3, 2018, 6:19:21 AM3/3/18
to rabbitmq-users
Done as requested.

Built binary release and attached to git tag.

Added installation instructions with supported RabbitMQ versions.

noxdafox

unread,
May 10, 2018, 12:07:32 PM5/10/18
to rabbitmq-users
UPDATE: I added support for queue de duplication

Since version 0.2.0 of the plugin, it is possible to de duplicate messages both at the exchange and at the queue sides.

As a general guideline: queue level de duplication is to be used to ensure no duplicate messages are stored within a given queue at the same time.
Exchange level de duplication instead, allows to de duplicate messages over time windows. In other words allows to prevent the same message to be routed more than once in a given period of time.





Michael Klishin

unread,
May 10, 2018, 12:32:17 PM5/10/18
to rabbitm...@googlegroups.com
Interesting. How does it work for queues, implementation-wise?

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
May 10, 2018, 12:32:29 PM5/10/18
to rabbitm...@googlegroups.com
And thank you for your work on this!

On Thu, May 10, 2018 at 11:32 AM, Michael Klishin <mkli...@pivotal.io> wrote:
Interesting. How does it work for queues, implementation-wise?
On Thu, May 10, 2018 at 11:07 AM, noxdafox <noxd...@gmail.com> wrote:
UPDATE: I added support for queue de duplication

Since version 0.2.0 of the plugin, it is possible to de duplicate messages both at the exchange and at the queue sides.

As a general guideline: queue level de duplication is to be used to ensure no duplicate messages are stored within a given queue at the same time.
Exchange level de duplication instead, allows to de duplicate messages over time windows. In other words allows to prevent the same message to be routed more than once in a given period of time.





--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

noxdafox

unread,
May 10, 2018, 1:57:43 PM5/10/18
to rabbitmq-users
np, I hope it will be useful to somebody.

Both queues and exchanges rely on dedicated caches to check if a message is a duplicate. Every time a message carrying a `x-message-deduplication` header is published, its value is looked up within the exchange/queue cache. A cache hit results in the message being dropped, otherwise its deduplication header is added to the cache and the message continue its route.

The use of caches simplifies significantly the implementation. For queues in particular, this strategy results in a very thin layer which can be added on top of any queue implementation (variable queue, priority queue etc..). I implemented the `rabbit_backing_queue` behaviour and most of the logic simply delegates the operations to the underlying backing queue.

The cache itself is built on top of Mnesia.
 

Michael Klishin

unread,
May 10, 2018, 2:26:51 PM5/10/18
to rabbitm...@googlegroups.com
I see, so it is a `rabbit_backing_queue` implementation that passes through most operations. Cool!

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

noxdafox

unread,
May 10, 2018, 3:05:23 PM5/10/18
to rabbitmq-users
Yep! The goal was to be the least intrusive possible.

The cache makes it simple and fast by trading off some memory or disk space.

Akshaya

unread,
Jul 6, 2018, 5:03:51 AM7/6/18
to rabbitmq-users
Hi,

Could you please provide one sample command using rabbitmqadmin while declaring queue or exchange?How to specifiy this as header -x-message-deduplication?
Also it is possible to edit the existing messages and queues?

- Akshaya

Michael Klishin

unread,
Jul 6, 2018, 5:25:47 AM7/6/18
to rabbitm...@googlegroups.com
Should be no different from any other header, which has certainly been discussed before on this list and teh Internet.
The right thing to do in the vast majority of cases is to use a policy [1].


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Akshaya

unread,
Jul 7, 2018, 1:05:15 AM7/7/18
to rabbitmq-users
Hi,

Thank you. I am new to rabbitMQ. Could you please let me know is this correct?

        properties = pika.BasicProperties(headers = {'x-deduplication-header': 'true'})
        channel.basic_publish(exchange='test_exch',
                              routing_key='',
                              properties=properties,
                              body=topublish)

But still could find the duplicate messages in test_queue.
How to fix this?

- Akshaya

On Friday, July 6, 2018 at 2:55:47 PM UTC+5:30, Michael Klishin wrote:
Should be no different from any other header, which has certainly been discussed before on this list and teh Internet.
The right thing to do in the vast majority of cases is to use a policy [1].

On Fri, Jul 6, 2018 at 12:03 PM, Akshaya <nach...@gmail.com> wrote:
Hi,

Could you please provide one sample command using rabbitmqadmin while declaring queue or exchange?How to specifiy this as header -x-message-deduplication?
Also it is possible to edit the existing messages and queues?

- Akshaya


On Thursday, March 1, 2018 at 11:22:59 PM UTC+5:30, noxdafox wrote:
Hi everybody,

I often found myself in need of de-duplicating RabbitMQ messages and I usually ended up relying on producer/consumer implementations based on third party data storages.

To learn a bit more about RabbitMQ and Elixir, I decided to build a simple message de-duplication plugin.

It is pretty simple and I didn't manage to test it at scale yet but so far it looks promising.

I shared it on my git account:
https://github.com/noxdafox/rabbitmq-message-deduplication

Contributions and comments are more than welcome :)

Matteo.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

noxdafox

unread,
Jul 7, 2018, 3:09:07 AM7/7/18
to rabbitmq-users
You can check this gist for a reference example on how to deduplicate messages at the exchange.
https://gist.github.com/noxdafox/ad1fb4c3769e06a888c3a542fc08c544

The `x-deduplication-header` must contain a value which uniquely identifies the message itself. The plugin will drop all other messages with the same header.
If the body of the message is what you want to ensure to be unique, then you can use its hash (MD5, SHA1, ...) as a `x-deduplication-header` value.
Reply all
Reply to author
Forward
0 new messages