deleted queue is restored after restarting the cluster

1,042 views
Skip to first unread message

Yurii

unread,
Oct 15, 2020, 11:56:10 AM10/15/20
to rabbitmq-users
on prod we use Rabbit 3.8.3 and Erlang 22.3.4.1

some time ago we created a queue for testing and deleted it after that
but it recovers after restarting the cluster sometimes, we deleted it again. 
It happened a couple of times

why cannot I delete the queue? 

we cannot reproduce this in other environments

Michael Klishin

unread,
Oct 15, 2020, 5:04:59 PM10/15/20
to rabbitmq-users
We don't have enough information here to suggest much. It can be that something in your system redeclares it,
or the deletion operation did not actually succeed (this would leave an error in the log if it even reached the node at all).

For quorum queues, nearly all operations require a majority of nodes online in order for the queue to make progress.

Yurii

unread,
Nov 12, 2020, 10:22:40 AM11/12/20
to rabbitmq-users
Hi, this happened again recently. I have logs. Maybe someone helps me now.I can provide more logs if required


Nov 10, 2020 @ 12:55:55.310 <0.28858.5012> Mirrored queue 'q.broker.trash.retry' in vhost '/': Synchronising: 0 messages to synchronise info
Nov 10, 2020 @ 12:55:55.310 <0.28858.5012> Mirrored queue 'q.broker.trash.retry' in vhost '/': Synchronising: batch size: 100 info
Nov 10, 2020 @ 12:55:55.314 <0.7128.4984> Mirrored queue 'q.broker.trash.retry' in vhost '/': Slave <rab...@rabbitmq-rabbitmq-ha-0.rabbitmq-rabbitmq-ha-discovery.esp.svc.cluster.local.2.7128.4984> saw deaths of mirrors <rab...@rabbitmq-rabbitmq-ha-1.rabbitmq-rabbitmq-ha-discovery.esp.svc.cluster.local.1.681.0> info
Nov 10, 2020 @ 12:55:55.315 <0.7189.7334> Mirrored queue 'q.broker.trash.retry' in vhost '/': Synchronising: all slaves already synced info
Nov 10, 2020 @ 13:05:25.441 <0.28858.5012> Mirrored queue 'q.broker.trash.retry' in vhost '/': Synchronising: batch size: 100 info
Nov 10, 2020 @ 13:05:25.441 <0.28858.5012> Mirrored queue 'q.broker.trash.retry' in vhost '/': Synchronising: 0 messages to synchronise info
Nov 10, 2020 @ 13:05:25.502 <0.1002.7335> Mirrored queue 'q.broker.trash.retry' in vhost '/': Synchronising: all slaves already synced info
Nov 10, 2020 @ 13:34:06.331 operation queue.declare caused a channel exception not_found: queue 'q.broker.trash.retry' in vhost '/' process is stopped by supervisor
Nov 10, 2020 @ 13:34:06.335 operation queue.declare caused a channel exception not_found: queue 'q.broker.trash.retry' in vhost '/' process is stopped by supervisor
Nov 10, 2020 @ 13:34:06.335 <0.8226.0> Shovel 'Move from q.broker.t_rash.tmp' in virtual host '/' is stopping, reason: {{shutdown,{server_initiated_close,404,<<"NOT_FOUND - queue 'q.broker.trash.retry' in vhost '/' process is stopped by supervisor">>}},{gen_server,call,[<0.8353.0>,{call,{'queue.declare',0,<<"q.broker.trash.retry">>,false,true,false,false,false,[]},none,<0.8226.0>},60000]}} error
Nov 10, 2020 @ 13:34:06.337 <0.8224.0> Supervisor {<0.8224.0>,rabbit_shovel_dyn_worker_sup} had child {<<"/">>,<<"Move from q.broker.t_rash.tmp">>} started with rabbit_shovel_worker:start_link(dynamic, {<<"/">>,<<"Move from q.broker.t_rash.tmp">>}, [{<<"ack-mode">>,<<"on-confirm">>},{<<"dest-add-forward-headers">>,false},{<<"dest-protocol">>,<<"...">>},...]) at <0.8226.0> exit with reason {{shutdown,{server_initiated_close,404,<<"NOT_FOUND - queue 'q.broker.trash.retry' in vhost '/' process is stopped by supervisor">>}},{gen_server,call,[<0.8353.0>,{call,{'queue.declare',0,<<"q.broker.trash.retry">>,false,true,false,false,false,[]},none,<0.8226.0>},60000]}} in context child_terminated error
Nov 10, 2020 @ 13:34:06.337 ** {{shutdown,{server_initiated_close,404,<<"NOT_FOUND - queue 'q.broker.trash.retry' in vhost '/' process is stopped by supervisor">>}},{gen_server,call,[<0.8353.0>,{call,{'queue.declare',0,<<"q.broker.trash.retry">>,false,true,false,false,false,[]},none,<0.8226.0>},60000]}}
Nov 10, 2020 @ 13:34:06.337 <0.8226.0> CRASH REPORT Process <0.8226.0> with 0 neighbours exited with reason: {{shutdown,{server_initiated_close,404,<<"NOT_FOUND - queue 'q.broker.trash.retry' in vhost '/' process is stopped by supervisor">>}},{gen_server,call,[<0.8353.0>,{call,{'queue.declare',0,<<"q.broker.trash.retry">>,false,true,false,false,false,[]},none,<0.8226.0>},60000]}} in gen_server2:terminate/3 line 1183 error
Nov 10, 2020 @ 13:34:06.337 ** When Server state == {state,undefined,undefined,undefined,undefined,{<<"/">>,<<"Move from q.broker.t_rash.tmp">>},dynamic,#{ack_mode => on_confirm,dest => #{current => {<0.8292.0>,<0.8315.0>,<<"amqp:///%2F">>},dest_queue => <<"q.broker.trash.retry">>,fields_fun => #Fun<rabbit_shovel_parameters.11.85917464>,module => rabbit_amqp091_shovel,props_fun => #Fun<rabbit_shovel_parameters.12.85917464>,resource_decl => #Fun<rabbit_shovel_parameters.10.85917464>,uris => ["amqp:///%2F"]},name => <<"Move from q.broker.t_rash.tmp">>,reconnect_delay => 5,shovel_type => dynamic,source => #{current => {<0.8258.0>,<0.8278.0>,<<"amqp:///%2F">>},delete_after => 'queue-length',module => rabbit_amqp091_shovel,prefetch_count => 1000,queue => <<"q.broker.t_rash.tmp">>,resource_decl => #Fun<rabbit_shovel_parameters.14.85917464>,source_exchange_key => <<>>,uris => ["amqp:///%2F"]}},undefined,undefined,undefined,undefined,undefined}

пятница, 16 октября 2020 г. в 00:04:59 UTC+3, Michael Klishin:

Alexander Gorshkov

unread,
Nov 13, 2020, 9:33:49 AM11/13/20
to rabbitmq-users
Hello,
We've had exactly the same issue during the nodes restart while performing upgrade to 3.8.9. Look's like it's caused by "move messages" and "shovel" function, but we don't have any ideas why it was executed right now after the node restart. We have the following rabbitmq.conf, may something in it cause the issue?:
---
listeners.tcp.default = ***
listeners.ssl.default = ***
ssl_options.cacertfile = ***
ssl_options.certfile = ***
ssl_options.keyfile = ***
ssl_options.verify = verify_none
ssl_options.fail_if_no_peer_cert = false
ssl_options.versions.1 = tlsv1.2
management.tcp.port = ***
vm_memory_high_watermark.relative = 0.85
disk_free_limit.relative = 1.0
cluster_formation.peer_discovery_backend = rabbit_peer_discovery_classic_config
cluster_formation.classic_config.nodes.1 = ***
cluster_formation.classic_config.nodes.2 = ***
cluster_formation.classic_config.nodes.3 = ***
cluster_partition_handling = pause_minority
---
Or, maybe, it's just some kind of bug, when queues, that has been created with "move messages" and manually deleted later automatically, recreated by shovel after node restart.



четверг, 12 ноября 2020 г. в 18:22:40 UTC+3, juri...@gmail.com:

Luke Bakken

unread,
Nov 13, 2020, 12:53:44 PM11/13/20
to rabbitmq-users
Hi Alexander,

Please provide exact instructions to reproduce what you see. We don't have time to guess how you set up your environment using "move messages".

Ideally, you would provide a script that uses rabbitmqadmin or the HTTP API to set up an environment that exactly matches yours.

Thank you,
Luke

Alexander Gorshkov

unread,
Nov 17, 2020, 11:33:39 AM11/17/20
to rabbitmq-users
Hello Luke.
I've finally recreated this issue using the following steps;
---
0. Create a fresh 3-node cluster;
1. Enable management and shovel plugins;
2. Create admin user and access UI on node 1;
# All following actions performed via UI
3. Create the following policy for HA:
ha-mode: exactly,   ha-params: 2,   ha-sync-mode: automatic,   queue-master-locator: min-master  
4. Make the queue on each node (ex. test-1 on node 1, test-2 on node 2, test-3 on node 3);
5. Do "move messages" into non-existing queue, for ex. 2 times for each queue (ex. restart-1, restart-2 for queue test-1,  restart-3, restart-4 for queue test-2.,  restart-5, restart-6 for queue test-3.)
6. Delete all automatically created queues.
7. Restart node 2 via systemctl stop/start rabbitmq-server - no restored queues;
8. Restart node 3  via systemctl stop/start rabbitmq-server  - no restored queues;
9. Access UI on node 3;
10. Restart node 1 via systemctl stop/start rabbitmq-server - queues restart-2, restart-4 and restart-5 has been recreated
---
It's just for confirmation, that queues can be automatically recreated, but it's not clear for what reason.
Thank you.
пятница, 13 ноября 2020 г. в 20:53:44 UTC+3, Luke Bakken:

Luke Bakken

unread,
Nov 17, 2020, 7:02:40 PM11/17/20
to rabbitmq-users
Hello,

Unfortunately I could not reproduce the problem using your manual steps or the equivalent set of steps in this script:


I tested using Erlang 23.1.3 and the latest RabbitMQ code.

If you'd like, review my script to see if I missed anything.

Thanks,
Luke

Alexander Gorshkov

unread,
Nov 18, 2020, 4:00:39 AM11/18/20
to rabbitmq-users
Hello,

Thank you for your reply!
I think it's necessary to use management panel and not declare queues before move messages into it.
I've repeated all my steps and got the same again - queues have been recreated.

Please, check the following screenshots:
1. Create a fresh cluster with admin user and enable management and shovel plugins.
1.PNG

2. Create policy for HA
2.PNG

3. Create queue on each node.
3.PNG

4. Do "move messages"  
4.PNG

5. Queues have been automatically created by "move messages"
5.PNG

6. Check cluster status, then delete all "retry*" queues
6.PNG

7. Stop node 3 - no recreated queues
7.PNG
8. Stop Node 2 - no recreated queues
8.PNG

9. Switch to management panel on Node 3 and stop Node 1 - queues "retry_[2,4,6]" have been recreated
9.PNG

Log entries on Node 2 during Node 1 restart:
---
2020-11-18 11:33:50.699 [error] <0.937.0> Channel error on connection <0.896.0> (<rab...@kafka01.1605688375.896.0>, vhost: '/', user: 'none'), channel 2:
operation queue.declare caused a channel exception not_found: no queue 'retry_2' in vhost '/'
2020-11-18 11:33:50.705 [error] <0.970.0> Channel error on connection <0.909.0> (<rab...@kafka01.1605688375.909.0>, vhost: '/', user: 'none'), channel 2:
operation queue.declare caused a channel exception not_found: no queue 'retry_4' in vhost '/'
2020-11-18 11:33:50.706 [error] <0.974.0> Channel error on connection <0.922.0> (<rab...@kafka01.1605688375.922.0>, vhost: '/', user: 'none'), channel 2:
operation queue.declare caused a channel exception not_found: no queue 'retry_6' in vhost '/'
2020-11-18 11:33:52.887 [info] <0.521.0> rabbit on node 'rabbit@elk-tst-01' down
2020-11-18 11:33:52.932 [info] <0.992.0> Mirrored queue 'retry_2' in vhost '/': Adding mirror on node rabbit@kafkanode01: <16671.1428.0>
2020-11-18 11:33:53.906 [info] <0.995.0> Mirrored queue 'retry_4' in vhost '/': Adding mirror on node rabbit@kafkanode01: <16671.1452.0>
2020-11-18 11:33:53.907 [info] <0.999.0> Mirrored queue 'retry_6' in vhost '/': Adding mirror on node rabbit@kafkanode01: <16671.1455.0>
2020-11-18 11:33:53.970 [info] <0.521.0> Node rabbit@elk-tst-01 is down, deleting its listeners
2020-11-18 11:33:54.040 [info] <0.521.0> node 'rabbit@elk-tst-01' down: connection_closed
2020-11-18 11:33:54.132 [info] <0.992.0> Mirrored queue 'retry_2' in vhost '/': Synchronising: 0 messages to synchronise
2020-11-18 11:33:54.132 [info] <0.992.0> Mirrored queue 'retry_2' in vhost '/': Synchronising: batch size: 4096
2020-11-18 11:33:54.137 [info] <0.876.0> Shovel ''Move from test_1' in virtual host '/'' is stopping (it was configured to autodelete and transfer is completed)
2020-11-18 11:33:54.252 [info] <0.999.0> Mirrored queue 'retry_6' in vhost '/': Synchronising: 0 messages to synchronise
2020-11-18 11:33:54.252 [info] <0.999.0> Mirrored queue 'retry_6' in vhost '/': Synchronising: batch size: 4096
2020-11-18 11:33:54.258 [info] <0.869.0> Shovel ''Move from test_3' in virtual host '/'' is stopping (it was configured to autodelete and transfer is completed)
2020-11-18 11:33:54.316 [info] <0.1071.0> Mirrored queue 'retry_2' in vhost '/': Synchronising: all mirrors already synced
2020-11-18 11:33:54.324 [info] <0.871.0> Shovel ''Move from test_2' in virtual host '/'' is stopping (it was configured to autodelete and transfer is completed)
2020-11-18 11:33:54.383 [info] <0.1083.0> Mirrored queue 'retry_6' in vhost '/': Synchronising: all mirrors already synced
2020-11-18 11:33:54.383 [info] <0.995.0> Mirrored queue 'retry_4' in vhost '/': Synchronising: 0 messages to synchronise
2020-11-18 11:33:54.383 [info] <0.995.0> Mirrored queue 'retry_4' in vhost '/': Synchronising: batch size: 4096
2020-11-18 11:33:54.439 [info] <0.1104.0> Mirrored queue 'retry_4' in vhost '/': Synchronising: all mirrors already synced
2020-11-18 11:34:54.018 [info] <0.521.0> node 'rabbit@elk-tst-01' up
2020-11-18 11:35:04.227 [info] <0.521.0> rabbit on node 'rabbit@elk-tst-01' up
---

Sorry for the large post.
RabbitMQ version - 3.8.9, Erlang version - 23.1, OS - CentOS 7, no rabbitmq.conf or rabbitmq-env.conf used.
Thank you.
среда, 18 ноября 2020 г. в 03:02:40 UTC+3, Luke Bakken:

Luke Bakken

unread,
Nov 18, 2020, 8:49:56 AM11/18/20
to rabbitmq-users
Hi Alexander,

Thanks for the comprehensive sequence of steps. I did do these steps manually exactly as you described before automating them with a script.

In step 3, when you create the test_* queues, it appears they are created with queue arguments. Notice the "Args" in the blue box. Can you hover over that to show the arguments?

I used the latest RabbitMQ code and will re-try with version 3.8.9. There is a chance this has been fixed.

Thanks,
Luke


On Wednesday, November 18, 2020 at 1:00:39 AM UTC-8, Alexander Gorshkov wrote:
Hello,

Thank you for your reply!
I think it's necessary to use management panel and not declare queues before move messages into it.
I've repeated all my steps and got the same again - queues have been recreated.

Alexander Gorshkov

unread,
Nov 18, 2020, 9:31:47 AM11/18/20
to rabbitmq-users
It's "queue-type: classic" argument
10.PNG
среда, 18 ноября 2020 г. в 16:49:56 UTC+3, Luke Bakken:

Luke Bakken

unread,
Nov 18, 2020, 11:50:32 AM11/18/20
to rabbitmq-users
Thanks Alexander.

I have re-run the steps you provided using version 3.8.9 and I do see the same behavior you report. It's certainly a bug, but not a serious one.

Like I said, it may already be fixed so I will confirm that first.

Thanks for the detailed reproduction steps!
Luke

Alexander Gorshkov

unread,
Nov 18, 2020, 2:16:44 PM11/18/20
to rabbitmq-users
Thanks for your help, Luke!

среда, 18 ноября 2020 г. в 19:50:32 UTC+3, Luke Bakken:

Luke Bakken

unread,
Nov 22, 2020, 11:08:32 AM11/22/20
to rabbitmq-users
Hi again Alexander,

You can follow this issue if you'd like - https://github.com/rabbitmq/rabbitmq-server/issues/2655

I have confirmed that this is a bug that only affects clustered environments. A HA policy is not necessary to reproduce. I've simplified my script to reproduce the issue a bit here:
Reply all
Reply to author
Forward
0 new messages