Sharding plugin - how does it make sure all shards are consumed? Also, how it balances?

292 views
Skip to first unread message

Ariel B

unread,
Jun 7, 2018, 4:30:51 AM6/7/18
to rabbitmq-users
I was wondering - if i have for example 4 nodes and one logical queue (with 4 shards).
the plugin says it chooses the queue with the least number of subscriber.

Let's say i start 4 consumers all together, isn't there a chance that only 3 shards will be chosen, and the 4th will not be consumed?

if that's the case - i have to constantly monitor number of consumers and balance accordingly?

2nd Question - the publishing as i understand is round robin style.
What if one of the shards is getting full, and the other is relevantly empty? What's the recommendation to handle those situations? (is the solution Federated queue along with the sharding?)

Thanks

Michael Klishin

unread,
Jun 7, 2018, 5:43:56 AM6/7/18
to rabbitm...@googlegroups.com
The shard with the fewest consumers will be chosen. Which implies a roughly even distribution.

Sharded queues are not concerned with publishing, that's why the plugin mentions an exchange you have to use. It handles
message distribution using hashing the routing key. So as long as the routing keys don't all fall into a single hashing bin
(which in practice means there's more than a few of them), all shards will be selected.

As with all hashing-based techniques, if the input is always the same there won't be much sharding going on.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Daniil Fedotov

unread,
Jun 7, 2018, 6:02:25 AM6/7/18
to rabbitmq-users
Hi,

In your case, if you have only one shard per node, number of consumers will not affect consumers as they connect to a local shard only. Number of consumers is taken to account only if there are more than one shard per node.

Ariel B

unread,
Jun 7, 2018, 7:17:33 AM6/7/18
to rabbitmq-users
I understood that the shard with least consumer will be chosen - but i expect to do basic.consume only once, then continue reading - if i have 4 nodes - can i 100% be sure that if i consume 4 times all shards will be read?

what do you mean more than one shard per node? i thought shard is per node (distribute the logical q into physical several ones).

and i understand that it doesnt care if target shard is full when publishing to an exchange.

Michael Klishin

unread,
Jun 7, 2018, 7:38:25 AM6/7/18
to rabbitm...@googlegroups.com
Please take a look at the README of the plugin. There can verify and typically are more than one shard per node. [1] explains why. It doesn’t really change much as far as message distribution goes.


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
Staff Software Engineer, Pivotal/RabbitMQ

Daniil Fedotov

unread,
Jun 7, 2018, 7:39:39 AM6/7/18
to rabbitmq-users
Hi, 

There is a policy `shards-per-node`, which defines how many queue shards will be started on a single node.
Load balancing of messages is not handled by the plugin and is done by the exchange, to which you apply the policy. For example if it's a random exchange, messages will be routed randomly to all the shards, if it is a `x-modulus-hash` exchange, provided by the plugin - they will be partitioned based on a hash of the message routing key. It does not make much sense to apply the policy to fanout or direct exchanges.

Ariel B

unread,
Jun 7, 2018, 8:48:39 AM6/7/18
to rabbitmq-users
And about the subscription? am i guaranteed that subscription will be done to all my shards? i prefer not to manually subscribe to each shard. 

if i have 4 shards, i open 4 subscriptions in parallel - all shards will be connected?
i want to make sure there's no shard left unsubscribed.

i can make another process that monitors the queues and raises subscribers if no one reads from it, but again - i want to avoid those side processes.

Michael Klishin

unread,
Jun 7, 2018, 8:59:16 AM6/7/18
to rabbitm...@googlegroups.com
I believe this is covered in the plugin docs. Your clients are entirely unaware of sharding. They consume from a logical queue.

You are not guaranteed that all shards will be consumed from. If you have 4 shards on a node but only 2 consumers, 2 shards won’t have any. The plugin assumes that there are at least as many consumers as there are shards *on a given node*, at least most of the time.
--

Ariel B

unread,
Jun 11, 2018, 3:38:21 AM6/11/18
to rabbitmq-users
I'm actually referring to edge cases - where the plugin would choose 3 shards instead of 4 (even though i have 4 subscribers).
this can be devastating as i'll have a queue that will fill up.

I'm guessing i'll have to dig down the plugin code or just monitor the queue and raise more subscribers.

Michael Klishin

unread,
Jun 11, 2018, 1:29:37 PM6/11/18
to rabbitm...@googlegroups.com
As stated earlier this plugin uses an exchange type that relies on a hashing function, so if
the hashed values don't have a reasonably even distribution, so won't the data stored in each shard.

It works the same way in a lot of other systems where data distribution is controlled by a hashing function.

We are not aware of scenarios where the plugin would not add consumers to a shard that has least of them. Consumers
can fail, come and go, so consumers won't be perfectly in balance at all times.

Regardless of whether this plugin is used, monitoring and metrics of both RabbitMQ and applications are critically important
for production systems [1].


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Ariel B

unread,
Jun 12, 2018, 5:52:26 AM6/12/18
to rabbitmq-users
Got it, thank you.

Vidhyashankar Madheswaraswamy

unread,
Sep 28, 2021, 3:49:45 PM9/28/21
to rabbitmq-users
Hi Michael.

I read multiple docs and discussions related to sharding plugin. Below is my understanding of current capability of the plugin.

The plugin is responsible for load balancing the consumers on a single node. But it is client's(application) responsibility to ensure that atleast a connection (rmqConection) exists to each node in the cluster. Considering 2 shards per node on a 3 node cluster (total 6 shards), Application should ensure that atleast one connection is established to each node and number of consumers/channels on each connection should be atleast 2.

i.e if i create only one connection and 64 consumers then all consumers will be assigned only to the 2 shards which are local to the connected node. This leaves the other 4 shards unconsumed.

Please clarify on this.

Thanks,
Vidhyashankar M
Reply all
Reply to author
Forward
0 new messages