RabbitMQ queue behaviour internal implementations

noxdafox

unread,

Mar 5, 2023, 10:47:54 AM3/5/23

to rabbitmq-users

Greetings,

I am maintaining a RabbitMQ plugin which allows to de-duplicate messages published within the broker.

https://github.com/noxdafox/rabbitmq-message-deduplication

On a high level, the logic simply checks for the presence of a message header against a cache and if the message is already present it does not forward it through.

De-duplication is implemented both at the exchange level (using the `rabbit_exchange_type` behaviour) and at the queue level (using the `rabbit_backing_queue`).

The latter has always been a bit problematic as it feels like the `rabbit_backing_queue` behaviour (initially shared in `rabbit_common` and then moved into `rabbit`) is meant to be internal and not be implemented by third party plugins. Yet it worked for years even with its ups and downs.

I've been receiving lots of interest in extending the support to other queue implementations such as the mirroring queue and the new streams.

For example, this issue: https://github.com/noxdafox/rabbitmq-message-deduplication/issues/37

Challenge is that, apart from variable and priority queues, all other queue types do not implement the `rabbit_backing_queue` behaviour. Hence it would become too expensive for me to "hack" each and every supported queue (most likely not even possible).

The question for the development team I have is whether there is a plan to unify the internal behaviour of all queues to simplify this task. Apart from message de-duplication, I see several behaviours that would be common for all queue implementations (max-size, ttl, ...). More specifically, if there is mean for me to help the users of my plugins?

I would be glad to contribute myself but as this is not a trivial patch to submit, I would need guidance and also hints about future interfaces. I am familiar with the RMQ code (the pace at which changes makes things pretty hard) but I cannot see where to "hook the logic" without polluting the code base with third party plugins needs. This would indeed be less than desirable.

KR,

Matteo.

kjnilsson

unread,

Mar 6, 2023, 5:21:09 AM3/6/23

to rabbitmq-users

Hi Matteo,

Ah message de-duplication! A thorny subject and very hard to implement well, especially in a clustered high-throughput environment.

Firstly I don't think you need to worry about implementing anything for mirrored queues. They are deprecated and will be removed in the not so far future.

That leaves quorum queues and streams. Streams already support sequence based deduplication when using the stream protocol. This is reasonably efficient (but not free!) and does a good job. Quorum queues actually internally (between channel and queue) also use sequence based de-duplication but this isn't available externally to publishers. We are considering it but there challenges as well as pushing the problem problem onto the client applications (maintaining publisher id and assigning a sequence number to all messages).

Any approaches that uses arbitrary message ids for deduplication is unlikely to provide the space and consistency properties we need for clustered high-throughput use cases so we aren't planning any work in that area for the near future. For message id deduplication we tend to suggest consumer side de-duplication where the deduplication key-space is kept outside of RabbitMQ (say in redis).

Cheers

Karl

noxdafox

unread,

Mar 12, 2023, 12:53:22 PM3/12/23

to rabbitmq-users

Hello,

On Monday, 6 March 2023 at 12:21:09 UTC+2 kjnilsson wrote:

Hi Matteo,

...

That leaves quorum queues and streams. Streams already support sequence based deduplication when using the stream protocol. This is reasonably efficient (but not free!) and does a good job. Quorum queues actually internally (between channel and queue) also use sequence based de-duplication but this isn't available externally to publishers. We are considering it but there challenges as well as pushing the problem problem onto the client applications (maintaining publisher id and assigning a sequence number to all messages).

This is not the same Use Case as for the plugin. The plugin enables application-level de-duplication covering cases where producers might send messages which lead to duplicated outcomes on the consumer side. Messages can be distinct from the broker perspective but not for the distributed application one.

Any approaches that uses arbitrary message ids for deduplication is unlikely to provide the space and consistency properties we need for clustered high-throughput use cases so we aren't planning any work in that area for the near future. For message id deduplication we tend to suggest consumer side de-duplication where the deduplication key-space is kept outside of RabbitMQ (say in redis).

I guess this is what community plugins are for? This plugin has been around since 2019 and it has some degree of adoption. It surely is not designed for high-throughput use cases but it still provide good value for most of the simple cases.

I'm not suggesting the core development team should focus on this functionality. I am asking whether there would be the possibility of improving/unifying the queue interfaces (ex: `rabbit_queue_decorator`) such that we could easily extend them as it was for the `rabbit_backing_queue`. The idea would be to allow extending queue implementations in a non-intrusive way that would enable plugins to add value to them. As I said, I can help with the necessary work so it would not be too resource consuming from the development team perspective (PR reviews and guidance).

In this particular case, the need would be:

* Detecting if the queue has de-duplication enabled

* Allowing to report whether the message should be forwarded to it or not

* Understanding when existing messages are removed from it (ack, ttl, ...)

This is easily done (with some caveats which would be easy to fix) with classic (variable and priority) queues already.