deleted queue is restored after restarting the cluster

Yurii

unread,

Oct 15, 2020, 11:56:10 AM10/15/20

to rabbitmq-users

on prod we use Rabbit 3.8.3 and Erlang 22.3.4.1

some time ago we created a queue for testing and deleted it after that

but it recovers after restarting the cluster sometimes, we deleted it again.

It happened a couple of times

why cannot I delete the queue?

we cannot reproduce this in other environments

Michael Klishin

unread,

Oct 15, 2020, 5:04:59 PM10/15/20

to rabbitmq-users

We don't have enough information here to suggest much. It can be that something in your system redeclares it,

or the deletion operation did not actually succeed (this would leave an error in the log if it even reached the node at all).

For quorum queues, nearly all operations require a majority of nodes online in order for the queue to make progress.

Yurii

unread,

Nov 12, 2020, 10:22:40 AM11/12/20

to rabbitmq-users

Hi, this happened again recently. I have logs. Maybe someone helps me now.I can provide more logs if required

Nov 10, 2020 @ 12:55:55.310 <0.28858.5012> Mirrored queue 'q.broker.trash.retry' in vhost '/': Synchronising: 0 messages to synchronise info

Nov 10, 2020 @ 12:55:55.310 <0.28858.5012> Mirrored queue 'q.broker.trash.retry' in vhost '/': Synchronising: batch size: 100 info

Nov 10, 2020 @ 12:55:55.314 <0.7128.4984> Mirrored queue 'q.broker.trash.retry' in vhost '/': Slave <rab...@rabbitmq-rabbitmq-ha-0.rabbitmq-rabbitmq-ha-discovery.esp.svc.cluster.local.2.7128.4984> saw deaths of mirrors <rab...@rabbitmq-rabbitmq-ha-1.rabbitmq-rabbitmq-ha-discovery.esp.svc.cluster.local.1.681.0> info

Nov 10, 2020 @ 12:55:55.315 <0.7189.7334> Mirrored queue 'q.broker.trash.retry' in vhost '/': Synchronising: all slaves already synced info

Nov 10, 2020 @ 13:05:25.441 <0.28858.5012> Mirrored queue 'q.broker.trash.retry' in vhost '/': Synchronising: batch size: 100 info

Nov 10, 2020 @ 13:05:25.441 <0.28858.5012> Mirrored queue 'q.broker.trash.retry' in vhost '/': Synchronising: 0 messages to synchronise info

Nov 10, 2020 @ 13:05:25.502 <0.1002.7335> Mirrored queue 'q.broker.trash.retry' in vhost '/': Synchronising: all slaves already synced info

Nov 10, 2020 @ 13:34:06.331 operation queue.declare caused a channel exception not_found: queue 'q.broker.trash.retry' in vhost '/' process is stopped by supervisor -

Nov 10, 2020 @ 13:34:06.335 operation queue.declare caused a channel exception not_found: queue 'q.broker.trash.retry' in vhost '/' process is stopped by supervisor -

Nov 10, 2020 @ 13:34:06.335 <0.8226.0> Shovel 'Move from q.broker.t_rash.tmp' in virtual host '/' is stopping, reason: {{shutdown,{server_initiated_close,404,<<"NOT_FOUND - queue 'q.broker.trash.retry' in vhost '/' process is stopped by supervisor">>}},{gen_server,call,[<0.8353.0>,{call,{'queue.declare',0,<<"q.broker.trash.retry">>,false,true,false,false,false,[]},none,<0.8226.0>},60000]}} error

Nov 10, 2020 @ 13:34:06.337 <0.8224.0> Supervisor {<0.8224.0>,rabbit_shovel_dyn_worker_sup} had child {<<"/">>,<<"Move from q.broker.t_rash.tmp">>} started with rabbit_shovel_worker:start_link(dynamic, {<<"/">>,<<"Move from q.broker.t_rash.tmp">>}, [{<<"ack-mode">>,<<"on-confirm">>},{<<"dest-add-forward-headers">>,false},{<<"dest-protocol">>,<<"...">>},...]) at <0.8226.0> exit with reason {{shutdown,{server_initiated_close,404,<<"NOT_FOUND - queue 'q.broker.trash.retry' in vhost '/' process is stopped by supervisor">>}},{gen_server,call,[<0.8353.0>,{call,{'queue.declare',0,<<"q.broker.trash.retry">>,false,true,false,false,false,[]},none,<0.8226.0>},60000]}} in context child_terminated error

Nov 10, 2020 @ 13:34:06.337 ** {{shutdown,{server_initiated_close,404,<<"NOT_FOUND - queue 'q.broker.trash.retry' in vhost '/' process is stopped by supervisor">>}},{gen_server,call,[<0.8353.0>,{call,{'queue.declare',0,<<"q.broker.trash.retry">>,false,true,false,false,false,[]},none,<0.8226.0>},60000]}} -

Nov 10, 2020 @ 13:34:06.337 <0.8226.0> CRASH REPORT Process <0.8226.0> with 0 neighbours exited with reason: {{shutdown,{server_initiated_close,404,<<"NOT_FOUND - queue 'q.broker.trash.retry' in vhost '/' process is stopped by supervisor">>}},{gen_server,call,[<0.8353.0>,{call,{'queue.declare',0,<<"q.broker.trash.retry">>,false,true,false,false,false,[]},none,<0.8226.0>},60000]}} in gen_server2:terminate/3 line 1183 error

Nov 10, 2020 @ 13:34:06.337 ** When Server state == {state,undefined,undefined,undefined,undefined,{<<"/">>,<<"Move from q.broker.t_rash.tmp">>},dynamic,#{ack_mode => on_confirm,dest => #{current => {<0.8292.0>,<0.8315.0>,<<"amqp:///%2F">>},dest_queue => <<"q.broker.trash.retry">>,fields_fun => #Fun<rabbit_shovel_parameters.11.85917464>,module => rabbit_amqp091_shovel,props_fun => #Fun<rabbit_shovel_parameters.12.85917464>,resource_decl => #Fun<rabbit_shovel_parameters.10.85917464>,uris => ["amqp:///%2F"]},name => <<"Move from q.broker.t_rash.tmp">>,reconnect_delay => 5,shovel_type => dynamic,source => #{current => {<0.8258.0>,<0.8278.0>,<<"amqp:///%2F">>},delete_after => 'queue-length',module => rabbit_amqp091_shovel,prefetch_count => 1000,queue => <<"q.broker.t_rash.tmp">>,resource_decl => #Fun<rabbit_shovel_parameters.14.85917464>,source_exchange_key => <<>>,uris => ["amqp:///%2F"]}},undefined,undefined,undefined,undefined,undefined}

пятница, 16 октября 2020 г. в 00:04:59 UTC+3, Michael Klishin:

Alexander Gorshkov

unread,

Nov 13, 2020, 9:33:49 AM11/13/20

to rabbitmq-users

Hello,
We've had exactly the same issue during the nodes restart while performing upgrade to 3.8.9. Look's like it's caused by "move messages" and "shovel" function, but we don't have any ideas why it was executed right now after the node restart. We have the following rabbitmq.conf, may something in it cause the issue?:

---

listeners.tcp.default = ***

listeners.ssl.default = ***

ssl_options.cacertfile = ***

ssl_options.certfile = ***

ssl_options.keyfile = ***

ssl_options.verify = verify_none

ssl_options.fail_if_no_peer_cert = false

ssl_options.versions.1 = tlsv1.2

management.tcp.port = ***

vm_memory_high_watermark.relative = 0.85

disk_free_limit.relative = 1.0

cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config

cluster_formation.classic_config.nodes.1 = ***

cluster_formation.classic_config.nodes.2 = ***

cluster_formation.classic_config.nodes.3 = ***

cluster_partition_handling = pause_minority

---

Or, maybe, it's just some kind of bug, when queues, that has been created with "move messages" and manually deleted later automatically, recreated by shovel after node restart.

четверг, 12 ноября 2020 г. в 18:22:40 UTC+3, juri...@gmail.com:

Luke Bakken

unread,

Nov 13, 2020, 12:53:44 PM11/13/20

to rabbitmq-users

Hi Alexander,

Please provide exact instructions to reproduce what you see. We don't have time to guess how you set up your environment using "move messages".

Ideally, you would provide a script that uses rabbitmqadmin or the HTTP API to set up an environment that exactly matches yours.

Thank you,

Luke

Alexander Gorshkov

unread,

Nov 17, 2020, 11:33:39 AM11/17/20

to rabbitmq-users

Hello Luke.
I've finally recreated this issue using the following steps;
---
0. Create a fresh 3-node cluster;
1. Enable management and shovel plugins;
2. Create admin user and access UI on node 1;
# All following actions performed via UI
3. Create the following policy for HA:
ha-mode: exactly, ha-params: 2, ha-sync-mode: automatic, queue-master-locator: min-master
4. Make the queue on each node (ex. test-1 on node 1, test-2 on node 2, test-3 on node 3);
5. Do "move messages" into non-existing queue, for ex. 2 times for each queue (ex. restart-1, restart-2 for queue test-1, restart-3, restart-4 for queue test-2., restart-5, restart-6 for queue test-3.)
6. Delete all automatically created queues.
7. Restart node 2 via systemctl stop/start rabbitmq-server - no restored queues;
8. Restart node 3 via systemctl stop/start rabbitmq-server - no restored queues;
9. Access UI on node 3;
10. Restart node 1 via systemctl stop/start rabbitmq-server - queues restart-2, restart-4 and restart-5 has been recreated
---
It's just for confirmation, that queues can be automatically recreated, but it's not clear for what reason.
Thank you.

пятница, 13 ноября 2020 г. в 20:53:44 UTC+3, Luke Bakken:

Luke Bakken

unread,

Nov 17, 2020, 7:02:40 PM11/17/20

to rabbitmq-users

Hello,

Unfortunately I could not reproduce the problem using your manual steps or the equivalent set of steps in this script:

https://github.com/lukebakken/deleted-queue-restored-k4Qwge4mC08/blob/master/repro.sh

I tested using Erlang 23.1.3 and the latest RabbitMQ code.

If you'd like, review my script to see if I missed anything.

Thanks,

Luke

Alexander Gorshkov

unread,

Nov 18, 2020, 4:00:39 AM11/18/20

to rabbitmq-users

Hello,

Thank you for your reply!
I think it's necessary to use management panel and not declare queues before move messages into it.
I've repeated all my steps and got the same again - queues have been recreated.

Please, check the following screenshots:
1. Create a fresh cluster with admin user and enable management and shovel plugins.

2. Create policy for HA

3. Create queue on each node.

4. Do "move messages"

5. Queues have been automatically created by "move messages"

6. Check cluster status, then delete all "retry*" queues

7. Stop node 3 - no recreated queues

8. Stop Node 2 - no recreated queues

9. Switch to management panel on Node 3 and stop Node 1 - queues "retry_[2,4,6]" have been recreated

Log entries on Node 2 during Node 1 restart:
---

2020-11-18 11:33:50.699 [error] <0.937.0> Channel error on connection <0.896.0> (<rab...@kafka01.1605688375.896.0>, vhost: '/', user: 'none'), channel 2:

operation queue.declare caused a channel exception not_found: no queue 'retry_2' in vhost '/'

2020-11-18 11:33:50.705 [error] <0.970.0> Channel error on connection <0.909.0> (<rab...@kafka01.1605688375.909.0>, vhost: '/', user: 'none'), channel 2:

operation queue.declare caused a channel exception not_found: no queue 'retry_4' in vhost '/'

2020-11-18 11:33:50.706 [error] <0.974.0> Channel error on connection <0.922.0> (<rab...@kafka01.1605688375.922.0>, vhost: '/', user: 'none'), channel 2:

operation queue.declare caused a channel exception not_found: no queue 'retry_6' in vhost '/'

2020-11-18 11:33:52.887 [info] <0.521.0> rabbit on node 'rabbit@elk-tst-01' down

2020-11-18 11:33:52.932 [info] <0.992.0> Mirrored queue 'retry_2' in vhost '/': Adding mirror on node rabbit@kafkanode01: <16671.1428.0>

2020-11-18 11:33:53.906 [info] <0.995.0> Mirrored queue 'retry_4' in vhost '/': Adding mirror on node rabbit@kafkanode01: <16671.1452.0>

2020-11-18 11:33:53.907 [info] <0.999.0> Mirrored queue 'retry_6' in vhost '/': Adding mirror on node rabbit@kafkanode01: <16671.1455.0>

2020-11-18 11:33:53.970 [info] <0.521.0> Node rabbit@elk-tst-01 is down, deleting its listeners

2020-11-18 11:33:54.040 [info] <0.521.0> node 'rabbit@elk-tst-01' down: connection_closed

2020-11-18 11:33:54.132 [info] <0.992.0> Mirrored queue 'retry_2' in vhost '/': Synchronising: 0 messages to synchronise

2020-11-18 11:33:54.132 [info] <0.992.0> Mirrored queue 'retry_2' in vhost '/': Synchronising: batch size: 4096

2020-11-18 11:33:54.137 [info] <0.876.0> Shovel ''Move from test_1' in virtual host '/'' is stopping (it was configured to autodelete and transfer is completed)

2020-11-18 11:33:54.252 [info] <0.999.0> Mirrored queue 'retry_6' in vhost '/': Synchronising: 0 messages to synchronise

2020-11-18 11:33:54.252 [info] <0.999.0> Mirrored queue 'retry_6' in vhost '/': Synchronising: batch size: 4096

2020-11-18 11:33:54.258 [info] <0.869.0> Shovel ''Move from test_3' in virtual host '/'' is stopping (it was configured to autodelete and transfer is completed)

2020-11-18 11:33:54.316 [info] <0.1071.0> Mirrored queue 'retry_2' in vhost '/': Synchronising: all mirrors already synced

2020-11-18 11:33:54.324 [info] <0.871.0> Shovel ''Move from test_2' in virtual host '/'' is stopping (it was configured to autodelete and transfer is completed)

2020-11-18 11:33:54.383 [info] <0.1083.0> Mirrored queue 'retry_6' in vhost '/': Synchronising: all mirrors already synced

2020-11-18 11:33:54.383 [info] <0.995.0> Mirrored queue 'retry_4' in vhost '/': Synchronising: 0 messages to synchronise

2020-11-18 11:33:54.383 [info] <0.995.0> Mirrored queue 'retry_4' in vhost '/': Synchronising: batch size: 4096

2020-11-18 11:33:54.439 [info] <0.1104.0> Mirrored queue 'retry_4' in vhost '/': Synchronising: all mirrors already synced

2020-11-18 11:34:54.018 [info] <0.521.0> node 'rabbit@elk-tst-01' up

2020-11-18 11:35:04.227 [info] <0.521.0> rabbit on node 'rabbit@elk-tst-01' up

---

Sorry for the large post.
RabbitMQ version - 3.8.9, Erlang version - 23.1, OS - CentOS 7, no rabbitmq.conf or rabbitmq-env.conf used.
Thank you.

среда, 18 ноября 2020 г. в 03:02:40 UTC+3, Luke Bakken:

Luke Bakken

unread,

Nov 18, 2020, 8:49:56 AM11/18/20

to rabbitmq-users

Hi Alexander,

Thanks for the comprehensive sequence of steps. I did do these steps manually exactly as you described before automating them with a script.

In step 3, when you create the test_* queues, it appears they are created with queue arguments. Notice the "Args" in the blue box. Can you hover over that to show the arguments?

I used the latest RabbitMQ code and will re-try with version 3.8.9. There is a chance this has been fixed.

Thanks,

Luke

On Wednesday, November 18, 2020 at 1:00:39 AM UTC-8, Alexander Gorshkov wrote:

Hello,

Thank you for your reply!
I think it's necessary to use management panel and not declare queues before move messages into it.
I've repeated all my steps and got the same again - queues have been recreated.

Alexander Gorshkov

unread,

Nov 18, 2020, 9:31:47 AM11/18/20

to rabbitmq-users

It's "queue-type: classic" argument

среда, 18 ноября 2020 г. в 16:49:56 UTC+3, Luke Bakken:

Luke Bakken

unread,

Nov 18, 2020, 11:50:32 AM11/18/20

to rabbitmq-users

Thanks Alexander.

I have re-run the steps you provided using version 3.8.9 and I do see the same behavior you report. It's certainly a bug, but not a serious one.

Like I said, it may already be fixed so I will confirm that first.

Thanks for the detailed reproduction steps!

Luke

Alexander Gorshkov

unread,

Nov 18, 2020, 2:16:44 PM11/18/20

to rabbitmq-users

Thanks for your help, Luke!

среда, 18 ноября 2020 г. в 19:50:32 UTC+3, Luke Bakken:

Luke Bakken

unread,

Nov 22, 2020, 11:08:32 AM11/22/20

to rabbitmq-users

Hi again Alexander,

You can follow this issue if you'd like - https://github.com/rabbitmq/rabbitmq-server/issues/2655

I have confirmed that this is a bug that only affects clustered environments. A HA policy is not necessary to reproduce. I've simplified my script to reproduce the issue a bit here:

https://github.com/lukebakken/deleted-queue-restored-k4Qwge4mC08/blob/master/repro.sh

Reply all

Reply to author

Forward