RabbitMQ cannot be recovered if master node has restarted

4,103 views
Skip to first unread message

Ruochao Zheng

unread,
Nov 26, 2019, 4:18:32 AM11/26/19
to rabbitmq-users
I'm using spring boot with RabbitMQ, and also deploy RabbitMQ with k8s. 

So k8s create 3 pods called rabbitmq-0, rabbitmq-1, rabbitmq-2. We assumes rabbitmq-0 is master, and queue is created in this pod.

I saw the queue cannot be recovered when rabbitmq-0 has been restarted, other pods restart will not have issue.

2019-11-26 01:03:21.800  WARN 2957 --- [ntContainer#0-2] o.s.a.r.listener.BlockingQueueConsumer   : Failed to declare queue: queue-job
2019-11-26 01:03:21.801 ERROR 2957 --- [ntContainer#0-2] o.s.a.r.l.SimpleMessageListenerContainer : Consumer received fatal=false exception on startuporg.springframework.amqp.rabbit.listener.QueuesNotAvailableException: Cannot prepare queue for listener. Either the queue doesn't exist or the broker will not allow us to use it.
at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.handleDeclarationException(BlockingQueueConsumer.java:661) ~[spring-rabbit-2.1.7.RELEASE.jar:2.1.7.RELEASE]
at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.passiveDeclarations(BlockingQueueConsumer.java:601) ~[spring-rabbit-2.1.7.RELEASE.jar:2.1.7.RELEASE]
at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.start(BlockingQueueConsumer.java:581) ~[spring-rabbit-2.1.7.RELEASE.jar:2.1.7.RELEASE]
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.initialize(SimpleMessageListenerContainer.java:1196) ~[spring-rabbit-2.1.7.RELEASE.jar:2.1.7.RELEASE]
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1041) ~[spring-rabbit-2.1.7.RELEASE.jar:2.1.7.RELEASE]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_211]
Caused by: org.springframework.amqp.rabbit.listener.BlockingQueueConsumer$DeclarationException: Failed to declare queue(s):[queue-job]
at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.attemptPassiveDeclarations(BlockingQueueConsumer.java:710) ~[spring-rabbit-2.1.7.RELEASE.jar:2.1.7.RELEASE]
at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.passiveDeclarations(BlockingQueueConsumer.java:594) ~[spring-rabbit-2.1.7.RELEASE.jar:2.1.7.RELEASE]
... 4 common frames omitted
Caused by: java.io.IOException: null
at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:126) ~[amqp-client-5.4.3.jar:5.4.3]
at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:122) ~[amqp-client-5.4.3.jar:5.4.3]
at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:144) ~[amqp-client-5.4.3.jar:5.4.3]
at com.rabbitmq.client.impl.ChannelN.queueDeclarePassive(ChannelN.java:1006) ~[amqp-client-5.4.3.jar:5.4.3]
at com.rabbitmq.client.impl.ChannelN.queueDeclarePassive(ChannelN.java:52) ~[amqp-client-5.4.3.jar:5.4.3]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_211]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_211]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_211]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_211]
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$CachedChannelInvocationHandler.invoke(CachingConnectionFactory.java:1140) ~[spring-rabbit-2.1.7.RELEASE.jar:2.1.7.RELEASE]
at com.sun.proxy.$Proxy165.queueDeclarePassive(Unknown Source) ~[na:na]
at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.attemptPassiveDeclarations(BlockingQueueConsumer.java:689) ~[spring-rabbit-2.1.7.RELEASE.jar:2.1.7.RELEASE]
... 5 common frames omitted
Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - home node 'rab...@rabbitmq-0.rabbitmq-headless.default.svc.cluster.local' of durable queue 'queue-job' in vhost '/' is down or inaccessible, class-id=50, method-id=10)
at com.rabbitmq.utility.ValueOrException.getValue(ValueOrException.java:66) ~[amqp-client-5.4.3.jar:5.4.3]
at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:36) ~[amqp-client-5.4.3.jar:5.4.3]
at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:494) ~[amqp-client-5.4.3.jar:5.4.3]
at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:288) ~[amqp-client-5.4.3.jar:5.4.3]
at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:138) ~[amqp-client-5.4.3.jar:5.4.3]
... 14 common frames omitted
Caused by: com.rabbitmq.client.ShutdownSignalException: channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - home node 'rab...@rabbitmq-0.rabbitmq-headless.default.svc.cluster.local' of durable queue 'queue-job' in vhost '/' is down or inaccessible, class-id=50, method-id=10)
at com.rabbitmq.client.impl.ChannelN.asyncShutdown(ChannelN.java:516) ~[amqp-client-5.4.3.jar:5.4.3]
at com.rabbitmq.client.impl.ChannelN.processAsync(ChannelN.java:346) ~[amqp-client-5.4.3.jar:5.4.3]
at com.rabbitmq.client.impl.AMQChannel.handleCompleteInboundCommand(AMQChannel.java:178) ~[amqp-client-5.4.3.jar:5.4.3]
at com.rabbitmq.client.impl.AMQChannel.handleFrame(AMQChannel.java:111) ~[amqp-client-5.4.3.jar:5.4.3]
at com.rabbitmq.client.impl.AMQConnection.readFrame(AMQConnection.java:670) ~[amqp-client-5.4.3.jar:5.4.3]
at com.rabbitmq.client.impl.AMQConnection.access$300(AMQConnection.java:48) ~[amqp-client-5.4.3.jar:5.4.3]
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:597) ~[amqp-client-5.4.3.jar:5.4.3]
... 1 common frames omitted2019-11-26 01:03:21.804 ERROR 2957 --- [ntContainer#0-2] o.s.a.r.l.SimpleMessageListenerContainer : Stopping container from aborted consumer
2019-11-26 01:03:21.804 INFO 2957 --- [ntContainer#0-2] o.s.a.r.l.SimpleMessageListenerContainer : Waiting for workers to finish.
2019-11-26 01:03:21.804 INFO 2957 --- [ntContainer#0-2] o.s.a.r.l.SimpleMessageListenerContainer : Successfully waited for workers to finish.

Wesley Peng

unread,
Nov 26, 2019, 4:21:39 AM11/26/19
to rabbitm...@googlegroups.com


on 2019/11/26 17:18, Ruochao Zheng wrote:
> home node 'rab...@rabbitmq-0.rabbitmq-headless.default.svc.cluster.local' of durable queue 'queue-job' in vhost '/' is down or inaccessible, class-id=50, method-id=10)

This means there are some errors happened on filesystem or other system
level stuff.

Can you check more details from syslog and k8s log?

regards.

Ruochao Zheng

unread,
Nov 26, 2019, 4:32:48 AM11/26/19
to rabbitmq-users
Actually I call kubectl delete pod rabbitmq-0, just try to restart the pod. But the system should auto recovered, right?

Wesley Peng

unread,
Nov 26, 2019, 7:32:39 AM11/26/19
to rabbitm...@googlegroups.com
if durable queues were deleted and you don't have queue mirrors, how
rabbitmq cluster get auto recovered?

Gary Russell

unread,
Nov 26, 2019, 8:55:16 AM11/26/19
to rabbitm...@googlegroups.com
If you are using auto-delete queues, you must set the master locator to client-local (either using a queue argument or a policy) so that an auto-delete queue is created on the node we are connected to. If it's created on the local node and it goes down, Spring will redeclare it on the next node it connects to.

If the queue is created on a different node and it goes down, Spring does not recover from that condition - it is effectively the same as deleting the queue using the management UI.


Ruochao Zheng

unread,
Nov 26, 2019, 3:50:58 PM11/26/19
to rabbitmq-users
spring.rabbitmq.listener.simple.missing-queues-fatal=false

I use this property to make spring boot always retry to connect to the master node which has the queue declaration to solve this issue.

And by the way, the queue is not auto-delete. And the issue is the queue is not being deleted. because of k8s kill one of node, cause it restarted. The node contains queue is missing for a while, other node will not have queue info.

On Tuesday, November 26, 2019 at 5:55:16 AM UTC-8, Gary Russell wrote:
If you are using auto-delete queues, you must set the master locator to client-local (either using a queue argument or a policy) so that an auto-delete queue is created on the node we are connected to. If it's created on the local node and it goes down, Spring will redeclare it on the next node it connects to.

If the queue is created on a different node and it goes down, Spring does not recover from that condition - it is effectively the same as deleting the queue using the management UI.


On Tue, Nov 26, 2019 at 7:32 AM Wesley Peng <postm...@wsly.de> wrote:
if durable queues were deleted and you don't have queue mirrors, how
rabbitmq cluster get auto recovered?


on 2019/11/26 17:32, Ruochao Zheng wrote:
> Actually I call kubectl delete pod rabbitmq-0, just try to restart the
> pod. But the system should auto recovered, right?

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

Gary Russell

unread,
Nov 26, 2019, 3:59:04 PM11/26/19
to rabbitm...@googlegroups.com
spring.rabbitmq.listener.simple.missing-queues-fatal=false

Yes, that is needed for non-HA non-auto-delete queues, if the node is down for more than 15 seconds (5 second intervals * 3 attempts at passive declaration).

The 5 seconds and 3 attempts are configurable, but not using boot properties.


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/5055e174-0fa8-4780-adb1-4ff66d0f7b7f%40googlegroups.com.

Ruochao Zheng

unread,
Nov 26, 2019, 5:26:24 PM11/26/19
to rabbitmq-users
Hi Gary,

Thanks for reply. I create rabbitmq in k8s (https://github.com/helm/charts/tree/master/stable/rabbitmq-ha), I thought it's HA.  So if it's HA, it will not need to config like this, right? Can you explain more?

Ruochao Zheng

unread,
Nov 26, 2019, 5:43:53 PM11/26/19
to rabbitmq-users
Sorry, I take it back. Looks I don't config ha-mode. After config ha-mode=all, it will not throw any exception if any of node disconnected.
Reply all
Reply to author
Forward
0 new messages