Stuck synchronization

Carl Hörberg

unread,

Apr 4, 2018, 4:40:10 AM4/4/18

to rabbitmq-users

Have some problem with queue synchronization. Seems to happen if the master node runs out of RAM during synchronization. Then the syncing can't be cancled, and restarting the second node does nothing, the policy can't be removed and nothing can be consumed or published to the queue. Included some screenshots, note how large the Erlang mailbox and gen_server2 buffers are for the stuck queues. Is there a way to "unstuck" them, without restarting the whole cluster?

RabbitMQ 3.7.4 Erlang 20.1

Michael Klishin

unread,

Apr 4, 2018, 5:47:08 AM4/4/18

to rabbitm...@googlegroups.com

Sync cancel operations are likely in the mailbox or gen_server2 buffer waiting for their turn. You can try

making the queue master process to terminate using `rabbitmqctl eval`.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

MK

Staff Software Engineer, Pivotal/RabbitMQ

Carl Hörberg

unread,

Apr 4, 2018, 10:22:28 AM4/4/18

to rabbitm...@googlegroups.com

Did, unfortunately that makes all (transient) messages in the queue to be lost.

You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/jW36EHf1tQE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

Ayanda

unread,

Apr 4, 2018, 10:52:41 AM4/4/18

to rabbitmq-users

Configurable timeouts for the slave synch receive loop[1] could be useful, for such cases, defaulting to infinity (changed on user preference), or, default to some significantly large timeout value which at least guarantees slaves will eventually, at some point, exit synchronisation (and at least return error/notify & write to logs if timeout occurred). Then user re-attempt synch, if necessary.

[1] https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_mirror_queue_sync.erl#L325-L367

Reply all

Reply to author

Forward