rabbit_stream_coordinator: failed to stop member

164 views
Skip to first unread message

Arnaud Morin

unread,
Sep 23, 2025, 5:18:31 AMSep 23
to rabbitm...@googlegroups.com
Hello!

We are facing a strange issue on our cluster.
First, the context:
a cluster of 3 nodes:
rabbit-neutron1
rabbit-neutron3
rabbit-neutron4

As you noticed, the number is weird because the cluster evolved and some
nodes were removed in the past (the 2).
The cluster_status is all green.
The version of rabbitmq is 4.1.3 (with erlang 26), it was upgraded from a
3.13 cluster (not sure this is a relevant info, but better giving it).

We have now some stream queues that are working, but the "writter" is
logging errors about it:

rabbit_stream_coordinator: failed to stop member __l3_agent_fanout_1691487324783411676 'rabbit@rabbit-neutron2' Error: {{nodedown,'rabbit@rabbit-neutron2'},{gen_server,call,[{osiris_server_sup,'rabbit@rabbit-neutron2'},{terminate_child,[95,95,108,51,95,97,103,101,110,116,95,102,97,110,111,117,116,95,49,54,57,49,52,56,55,51,50,52,55,56,51,52,49,49,54,55,54]},infinity]}}


The node rabbit-neutron2 is not in the cluster anymore.
But the queue l3_agent_fanout still see it.
What could we do to actually remove the node from this queue?

Thanks for your answers!

Arnaud

jo...@cloudamqp.com

unread,
Sep 24, 2025, 3:50:28 PM (13 days ago) Sep 24
to rabbitmq-users
Hi,

I thought this was fixed in https://github.com/rabbitmq/rabbitmq-server/pull/9293. Was this cluster ever on 3.12.4 or earlier?
Can you delete the stream and re-create it?

/Johan

Arnaud Morin

unread,
Sep 25, 2025, 3:29:07 AM (12 days ago) Sep 25
to 'jo...@cloudamqp.com' via rabbitmq-users
yes, the cluster was in 3.12.12 previously:
- 3.12.12, then
- 3.13.7, then
- 4.1.3

And yes, we tried deleting the queue, but right after it's created
again, the "writter" node still complain with the same exact error.

Arnaud
> --
> You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/rabbitmq-users/3cd5b7bf-b237-4e73-b5ea-ae239ada1567n%40googlegroups.com.

Karl Nilsson

unread,
Sep 25, 2025, 5:40:34 AM (12 days ago) Sep 25
to rabbitm...@googlegroups.com
Are you sure the broker doesn’t still think neutron2 is in the cluster? Can you check what cluster status reports?

Karl Nilsson


Arnaud Morin

unread,
Sep 25, 2025, 11:11:03 AM (12 days ago) Sep 25
to rabbitm...@googlegroups.com

Doing rabbitmqctl cluster_status on the three nodes, and they report:

Disk Nodes

rabbit@rabbit-neutron1
rabbit@rabbit-neutron3
rabbit@rabbit-neutron4

Running Nodes

rabbit@rabbit-neutron1
rabbit@rabbit-neutron3
rabbit@rabbit-neutron4



Is my way of checking relevant?




On 25.09.25 - 10:40, Karl Nilsson wrote:
> Are you sure the broker doesn’t still think neutron2 is in the cluster? Can
> you check what cluster status reports?
>
> *Karl Nilsson*
> > https://groups.google.com/d/msgid/rabbitmq-users/20250925072838.s4sjupg3vmoo7qjt%40sync2
> > .
> >
>
> --
> You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/rabbitmq-users/CAHC35TBndH-%3D1Fy5Muu6RfUbzjkobX8OoTA2QOmi%3DBRW8XxGPw%40mail.gmail.com.

Arnaud Cogoluègnes

unread,
Sep 29, 2025, 8:33:21 AM (8 days ago) Sep 29
to rabbitmq-users
Can you check the status of the stream with rabbitmq-streams stream_status and share the output?

Depending on the output you can control the replicas with rabbitmq-streams add_replica/delete_replica [1].

Arnaud Morin

unread,
Sep 30, 2025, 3:08:41 AM (7 days ago) Sep 30
to 'Arnaud Cogoluègnes' via rabbitmq-users
Here is a partial output:

┌─────────┬────────────────────────┬
│ role │ node │
├─────────┼────────────────────────┼
│ replica │ rabbit@rabbit-neutron1 │
├─────────┼────────────────────────┼
│ writer │ rabbit@rabbit-neutron3 │
├─────────┼────────────────────────┼
│ replica │ rabbit@rabbit-neutron4 │
└─────────┴────────────────────────┴


There is no trace of rabbit-neutron2 here

I am lost ...
> To view this discussion visit https://groups.google.com/d/msgid/rabbitmq-users/bfb2a137-c563-4af5-b903-be3ea76ee880n%40googlegroups.com.

Arnaud Cogoluègnes

unread,
Oct 1, 2025, 7:21:10 AM (6 days ago) Oct 1
to rabbitmq-users
OK, thanks.

Can you run rabbitmqctl eval 'rabbit_stream_coordinator:state().' and share the output?

Just share the part about the l3_agent_fanout queue if the output is too large. Make sure to redact anything you consider sensitive, e.g. node names.

Arnaud Morin

unread,
Oct 3, 2025, 8:47:21 AM (4 days ago) Oct 3
to 'Arnaud Cogoluègnes' via rabbitmq-users
Nice!
I think we hit something, I have multiple l3_agent_fanout in there:
https://plik.ovh/file/u9JSnUT10STnKfjH/SBggcnLwsqoyXi6R/b

Only the last one seems to be the active one.
How can I get rid of this?

Regards,
Arnaud
> To view this discussion visit https://groups.google.com/d/msgid/rabbitmq-users/3bd74ec6-512b-45ad-923a-e6a46fd93d43n%40googlegroups.com.

Arnaud Cogoluègnes

unread,
Oct 6, 2025, 8:38:44 AM (yesterday) Oct 6
to rabbitmq-users
The state is not consistent, I wonder how it ended up like this.

It should be possible to fix it, I'll try to come up with a command.

Do you have an environment where you can test commands or is it just the production cluster?

Do you think you could reproduce the issue?

Arnaud Morin

unread,
Oct 6, 2025, 8:48:26 AM (yesterday) Oct 6
to 'Arnaud Cogoluègnes' via rabbitmq-users
Thanks for the help.
This is a production cluster and I do not have any way to reproduice
unfortunately...
> To view this discussion visit https://groups.google.com/d/msgid/rabbitmq-users/b9b35654-4185-4916-9d83-5d5d3b2f901fn%40googlegroups.com.

Arnaud Cogoluègnes

unread,
Oct 6, 2025, 9:55:22 AM (yesterday) Oct 6
to rabbitmq-users
OK, here is a command that deletes a stream in the stream coordinator state. I tested it locally, but I can't guarantee it will work on your environment. Use it at your own risk, especially on a production cluster. Note you may need to delete the stream directory on each node as well, we'll get to this later.

rabbitmqctl eval 'rabbit_stream_coordinator:process_command({delete_stream, "__l3_agent_fanout_1691487324783411676", #{}}).'

It should return something like {ok, ok, ...}.

You can run rabbitmqctl eval 'rabbit_stream_coordinator:state().' to see if the corresponding entry is gone.

Then search for a directory with the same name as the stream ID (e.g. __l3_agent_fanout_1691487324783411676) in the "stream" sub-directory in the Mnesia directory (e.g. /var/lib/rabbitmq/mnesia/$NODENAME/stream under Debian) and delete it if it exists (so stream/__l3_agent_fanout_1691487324783411676, but not stream). Do this for each node of the cluster.

Then follow the same procedure for all the streams that still refer to the rabbit-neutron2 node (I counted 4 of them).

Arnaud Morin

unread,
Oct 6, 2025, 5:25:29 PM (19 hours ago) Oct 6
to 'Arnaud Cogoluègnes' via rabbitmq-users
Thank you Arnaud for this tip, I think that worked like a charm!
I still don't know how I end-up in that situation though.

And for the record, the delete_stream command was executed only on one
node and that triggered the deletion on other nodes (including the data
directory).

Cheers,
Arnaud.
> To view this discussion visit https://groups.google.com/d/msgid/rabbitmq-users/5a613c87-bb08-4f97-bce6-4bdbd8fe1e70n%40googlegroups.com.

Arnaud Cogoluègnes

unread,
1:57 AM (11 hours ago) 1:57 AM
to rabbitmq-users
Thanks for the follow-up, glad it worked as expected.

Determining whether the Erlang processes listed in the state were still alive or not would have required more commands, this is why I suggested to make sure the directories were gone. These ghost streams were still operational apparently, even if they were referring to a decommissioned node, so they managed to clean their own directories.
Reply all
Reply to author
Forward
0 new messages