Error in quorum queue

130 views
Skip to first unread message

Radu Marian

unread,
Aug 26, 2024, 3:51:13 AM8/26/24
to rabbitmq-users
Hi RabbitMQ community members,

We recently performed some failover tests with RabbitMQ 3.13.6 with Khepri enabled and at some point we got the following error reported by one of our quorum queues:


Aug 21, 2024 @ 22:28:12.350 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> ** State machine '%2F_***' terminating




Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> ** Reason for termination = error:function_clause


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> {ra_server_proc,handle_effects,5,


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> num_waiting_queries => 0,


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> {ra_server_proc,handle_effects,5,


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0>


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> effective_machine_module => rabbit_fifo,


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> effective_handle_aux_fun => {handle_aux,6}}}]
 

Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> ** Stacktrace =


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> {553914,[127717831|1102]},


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> [{file,"lists.erl"},{line,568}]},


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> machine_versions => [{127646284,3}],


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> {553908,[127717825|1121]},


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> {553906,[127717823|1102]},


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> {ra_server_proc,follower,3,[{file,"src/ra_server_proc.erl"},{line,826}]}]


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> {553912,[127717829|1123]},


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> ** [{lists,zipwith,


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> [{file,"src/ra_server_proc.erl"},{line,1289}]},


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> {553909,[127717826|1123]},


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> {553905,[127717822|1123]},


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> fail],


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> {ra_server_proc,handle_effect,5,


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> [{file,"src/ra_server_proc.erl"},{line,1289}]},


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> {553918,[127717835|1121]}],


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> [{553904,[127717821|1102]},


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> {553907,[127717824|1123]},


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> {553915,[127717832|1123]},


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> {553911,[127717828|1102]},


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> [#Fun<rabbit_fifo.64.59258718>,[],


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> {553917,[127717834|1121]},


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> [{file,"src/ra_server_proc.erl"},{line,1373}]},


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> {553910,[127717827|1123]},


Aug 21, 2024 @ 22:28:12.353 +00:00

2024-08-21 22:28:12.347610+00:00 [error] <0.804.0> {553913,[127717830|1121]}, 

Do you know if this error is fatal for the quorum queue? Can it recover from this on its own and still receive and replicate messages? Since the affected node was restarted by our test shortly after this error we are not sure if we should be worried by these logs or not.

BR,
Radu.

Michal Kuratczyk

unread,
Aug 26, 2024, 4:41:38 AM8/26/24
to rabbitm...@googlegroups.com
Hi,

Yes, the queue should recover from this without issues. This can happen with Mnesia as well (it's related
to the fact that some QQ metadata is not written synchronously).

It's great to hear you are playing with Khepri. Let us know if you have any feedback. There are still significant
changes coming to Khepri and I'm afraid at this point 3.13.x doesn't really represent the state of Khepri support we have,
but feedback is welcome nonetheless. There should be a RabbitMQ 4.0 beta with all the new Khepri stuff soon and it'd be
great if you could give it a try.

Lastly, please keep in mind that we are not planning on having a migration from "old Khepri support" (what's currently in 3.13)
to "new Khepri support" (what's coming in 4.0 and might be backported to 3.13 at some point). There are significant changes
that would require implementing a migration that we don't think is necessary given that Khepri is experimental in 3.13
and therefore should not be relied upon for anything serious.

Best,


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/82c05c8d-00df-422f-8bb1-00cdf7264ac9n%40googlegroups.com.


--
Michal
RabbitMQ Team

This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.

Radu Marian

unread,
Aug 26, 2024, 10:01:00 AM8/26/24
to rabbitmq-users
Thanks a lot Michal for the prompt support.

As we are looking to become an early adopter of Khepri, do you know if the non Beta 4.0 will become available soon?

Any timeline on that?

Thanks,
Radu.

Michal Kuratczyk

unread,
Aug 27, 2024, 7:04:49 AM8/27/24
to rabbitm...@googlegroups.com
Release candidates are coming soon. For the 4.0.0 release, we are targeting October.

Any particular reason you are so eager to use Khepri?

Radu Marian

unread,
Aug 29, 2024, 10:25:58 AM8/29/24
to rabbitmq-users
Because of this https://github.com/rabbitmq/rabbitmq-server/discussions/4237

And other issues like cluster going into split brain after net split even with PAUSE_MINORITY configured.

We need good eventual consistency and Mnesia seems not to provide it. But we tested with Khepri and these issues no longer reproduce with it.
Reply all
Reply to author
Forward
0 new messages