Random exchange behavior on node failure

86 views
Skip to first unread message

Mário

unread,
May 24, 2023, 6:37:18 AM5/24/23
to rabbitmq-users
Hi,

I have currently setup a rabbitmq cluster with 3 nodes, with a mix of classic queues and quorum queues.
For a specific type of messages I use a random exchange (x-random) to load balance 3 classic queues, each on a different node.
Recently in production one of the 3 nodes went down, and what I saw was that some messages sent to that exchange failed.

I was expecting was that the random exchange would stop trying to send to the queue that was unavailable, and resume when the node got restarted.

Is this expected behavior for this type of exchange? And if so what alternatives do I have to work around this. Any help is appreciated.

Best,
Mário

Michal Kuratczyk

unread,
May 24, 2023, 6:46:45 AM5/24/23
to rabbitm...@googlegroups.com
Routing (the process of deciding where to send a message, performed by the exchange module) is too performance critical to perform a network check
whenever a message is published. Setting the mandatory flag when publishing a message should solve the problem - your publisher would know that the message wasn't delivered. Alternatively, you can set up an alternate exchange.

Can you say a bit about your use case? I think it's the first time I hear anyone actually using the random exchange.

Thanks,

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/54381b1c-0d42-4d85-80cf-749fd40554e4n%40googlegroups.com.


--
Michał
RabbitMQ team

Karl Nilsson

unread,
May 24, 2023, 8:11:03 AM5/24/23
to rabbitm...@googlegroups.com
The random exchange makes a random selection of all bindings and for durable bindings that may include queues on nodes that are not available so this is expected behaviour.

The best option here may be to use something like quorum queues that will remain available even during at least single node failures.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/54381b1c-0d42-4d85-80cf-749fd40554e4n%40googlegroups.com.


--
Karl Nilsson

Mário

unread,
May 24, 2023, 10:05:49 AM5/24/23
to rabbitmq-users
I was actually trying to avoid the performance impact of replicated queues.

I have two types of requests for my specific problem:
The first type is mission critical, speed/latency is key, errors are uncommon, clients will retry (mandatory flag is set), these are sent to the random exchange and "balanced" to a queue to each of my nodes. Each machine has consumers that listen to all of the queues bound to the exchange, so if a host is down (or even two!), at least one is processing requests.

The other type of requests go to quorum queues, typically are called during the processing of the first type, asynchronously. I use these for batching, auditing, etc... I don't want to loose these messages but they can be processed later if it's unavoidable.

If I'm using quorum queues for the first type, even if I could bear the performance penalty they don't survive two down nodes.
This is pertinent as in my infrastructure I have two physical server rooms, so if a specific room is "down" I can loose two of the nodes at once, and quorums don't survive that, I would like to keep processing the mission critical requests in this case.

Regarding alternate exchanges, the problem is deciding where to send the requests to be processed.

Something like asynchronously checking if a queue is available every N seconds to determine if a binding is "active" would solve my case, also, the cluster knows (?) if a queue is available. But I guess something like this would require some custom code.

Am I overthinking this?

Karl Nilsson

unread,
May 24, 2023, 10:27:06 AM5/24/23
to rabbitm...@googlegroups.com
You may be interested in our experiments with a new "local" exchange type. Essentially you ensure you have classic (v2) queues on all nodes, all consumers consume from all queues but published messages are only routed to the queues which are local to the connection.




--
Karl Nilsson

Mário

unread,
May 24, 2023, 10:57:38 AM5/24/23
to rabbitmq-users
That sounds really cool, I really think I could work with that, is it close to a release?

Thank you.

kjnilsson

unread,
May 26, 2023, 4:45:08 AM5/26/23
to rabbitmq-users
It's not quite ready, we need to do some more testing and discuss the exact behaviour but if you wanted an early preview here is a docker image with the exchange included:


Here is the PR if you want to follow or report any findings if you do decide to take it for a spin: https://github.com/rabbitmq/rabbitmq-server/pull/8334

Cheers
Karl

Mário

unread,
May 26, 2023, 5:15:28 AM5/26/23
to rabbitmq-users
Thanks! I am already in the process of testing this, found the process of building rabbitmq quite straightforward.

Best,
Mário

Mário

unread,
Jun 5, 2023, 12:41:39 PM6/5/23
to rabbitmq-users
From what I've seem by testing in my dev machine it looks good. I've built an ez for the local exchange plugin and I'm going forward with trying this in the real environment.
Anything I should be specifically careful about? When (if?) the local exchange plugin actually releases will there be an issue with mixing plugin versions when upgrading?

Thanks,
Mário

kjnilsson

unread,
Jun 7, 2023, 3:54:47 AM6/7/23
to rabbitmq-users
Hi, glad it is working out for you. I wouldn't suggest you deploy to production yet tho! there are still a few things we ought to iron out. for example, if the exchange can find no locally bound queues it will select a random one from the non-local queues. I am not sure this is necessarily what should happen as that means it is not local anymore. Will discuss this with my colleagues today.
Message has been deleted

Mário

unread,
Jun 7, 2023, 5:52:03 AM6/7/23
to rabbitmq-users
I'm not taking this lightly, I understand that you guys are still working on the specifics, I do have a rather grave problem with my current topology though.
The random part is not relevant in my case, my topology is fixed and the queues are all durable, so a node will always have a local queue bound to the exchange.

The random choosing of a locally bound queue is interesting though, would work to make use of more cpu cores.

I'm still accessing this, I've exercised this in a QA environment, I'm trying to cover my bases, maybe it would be better to make a custom version of this so to not clash with a possible (?) official release, time frame is the key here.

kjnilsson

unread,
Jun 7, 2023, 5:57:34 AM6/7/23
to rabbitmq-users
yes we've pretty much settled on the approach that this should be strictly local, so just delivering to local queues with no built-in fallback. If there are more than one locally bound queue it will perform a random selection of one of the queues. so in your use case you _could_ have more than one local queue per node to possibly scale further.

We may consider further restricting the exchange to only allow bindings to classic queues as locality has little meaning for replicated queue types like streams or quorum queues or even other exchanges. In this case the name may change a bit so there would be no clash with your installation. I can't guarantee that however.
Reply all
Reply to author
Forward
0 new messages