offline members of quroum queue

484 views
Skip to first unread message

M

unread,
Nov 4, 2022, 4:18:02 AM11/4/22
to rabbitmq-users
Hi there,

I am looking into a "problem" we have on our rabbitmq cluster, hosted in kubernetes. It seems that ocasionally, some members of a quorum queue become offline. I'm not sure what causes this. Also, what is the imposed risk if in this case, node 6 was to go offline?

It's a 7 node cluster, and it seems that this occured after a reboot of 2 nodes (1 hour difference between each). Our quorum queue initial size is 3, is this too low maybe?
offline members rabbitmq.png

Hopefully anyone has seen this before!

Kind regards,

Mathijs

Karl Nilsson

unread,
Nov 4, 2022, 5:24:44 AM11/4/22
to rabbitm...@googlegroups.com
The quorum queue should have recovered only the server logs will be able to tell you why that may not happen.

Which RabbitMQ version is this?

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/5a19e751-6fe4-4c10-be37-f7674925ed37n%40googlegroups.com.


--
Karl Nilsson

M

unread,
Nov 4, 2022, 5:58:03 AM11/4/22
to rabbitmq-users
Here are the logs that mention a specific queue that has this problem:

2022-11-01 17:44:05.045198+00:00 [info] <0.1401.2> queue 'c***blurred-out-queue-name***e' in vhost '/': follower did not have entry at 51 in 6. Requesting {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-5.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} from 51
rabbitmq-cockpit-6
2022-11-01 17:44:05.045605+00:00 [info] <0.1401.2> queue 'c***blurred-out-queue-name***e' in vhost '/': detected a new leader {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-5.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} in term 6
rabbitmq-cockpit-6
2022-11-01 17:45:07.447874+00:00 [info] <0.1401.2> queue 'c***blurred-out-queue-name***e' in vhost '/': Leader monitor down with shutdown, setting election timeout
rabbitmq-cockpit-6
2022-11-01 17:45:07.508254+00:00 [info] <0.1401.2> queue 'c***blurred-out-queue-name***e' in vhost '/': granting vote for {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-1.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} with last indexterm {52,6} for term 7 previous term was 6
rabbitmq-cockpit-6
2022-11-01 17:45:07.526700+00:00 [info] <0.1401.2> queue 'c***blurred-out-queue-name***e' in vhost '/': detected a new leader {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-1.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} in term 7
rabbitmq-cockpit-6
2022-11-01 17:46:18.093030+00:00 [info] <0.1552.2> queue 'c***blurred-out-queue-name***e' in vhost '/': follower did not have entry at 54 in 7. Requesting {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-1.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} from 53
rabbitmq-cockpit-5
2022-11-01 17:46:18.093445+00:00 [info] <0.1552.2> queue 'c***blurred-out-queue-name***e' in vhost '/': detected a new leader {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-1.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} in term 7
rabbitmq-cockpit-5
2022-11-01 17:56:17.550988+00:00 [info] <0.1552.2> queue 'c***blurred-out-queue-name***e' in vhost '/': Leader monitor down with noconnection, setting election timeout
rabbitmq-cockpit-5
2022-11-01 17:56:17.541542+00:00 [info] <0.1401.2> queue 'c***blurred-out-queue-name***e' in vhost '/': Leader monitor down with noconnection, setting election timeout
rabbitmq-cockpit-6
2022-11-01 17:56:17.944493+00:00 [info] <0.1552.2> queue 'c***blurred-out-queue-name***e' in vhost '/': detected a new leader {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-1.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} in term 7
rabbitmq-cockpit-5
2022-11-01 17:56:18.117480+00:00 [info] <0.1552.2> queue 'c***blurred-out-queue-name***e' in vhost '/': granting vote for {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-6.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} with last indexterm {55,7} for term 8 previous term was 7
rabbitmq-cockpit-5
2022-11-01 17:56:18.194122+00:00 [notice] <0.1401.2> queue 'c***blurred-out-queue-name***e' in vhost '/': candidate -> leader in term: 8 machine version: 2
rabbitmq-cockpit-6
2022-11-01 17:56:18.227776+00:00 [info] <0.1552.2> queue 'c***blurred-out-queue-name***e' in vhost '/': detected a new leader {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-6.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} in term 8
rabbitmq-cockpit-5
2022-11-01 17:57:23.911705+00:00 [notice] <0.1401.2> queue 'c***blurred-out-queue-name***e' in vhost '/': leader -> terminating_leader in term: 8 machine version: 2
rabbitmq-cockpit-6
2022-11-01 17:57:23.914643+00:00 [notice] <0.1552.2> queue 'c***blurred-out-queue-name***e' in vhost '/': terminating with reason 'delete'
rabbitmq-cockpit-5
2022-11-01 17:57:28.974909+00:00 [info] <0.30876.16> queue 'c***blurred-out-queue-name***e' in vhost '/': leader call - leader not known. Command will be forwarded once leader is known.
rabbitmq-cockpit-2
2022-11-01 17:57:28.990134+00:00 [info] <0.30121.41> queue 'c***blurred-out-queue-name***e' in vhost '/': granting vote for {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-6.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} with last indexterm {0,0} for term 1 previous term was 0
rabbitmq-cockpit-4
2022-11-01 17:57:28.990849+00:00 [info] <0.30876.16> queue 'c***blurred-out-queue-name***e' in vhost '/': granting vote for {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-6.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} with last indexterm {0,0} for term 1 previous term was 0
rabbitmq-cockpit-2
2022-11-01 17:57:28.995953+00:00 [notice] <0.16083.67> queue 'c***blurred-out-queue-name***e' in vhost '/': candidate -> leader in term: 1 machine version: 2
rabbitmq-cockpit-6
2022-11-01 17:57:28.996140+00:00 [info] <0.30876.16> queue 'c***blurred-out-queue-name***e' in vhost '/': detected a new leader {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-6.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} in term 1
rabbitmq-cockpit-2
2022-11-01 17:57:28.996528+00:00 [info] <0.30121.41> queue 'c***blurred-out-queue-name***e' in vhost '/': detected a new leader {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-6.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} in term 1
rabbitmq-cockpit-4
2022-11-02 12:11:37.719861+00:00 [info] <0.30876.16> queue 'c***blurred-out-queue-name***e' in vhost '/': detected a new leader {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-6.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} in term 1
rabbitmq-cockpit-2
2022-11-02 12:11:38.343868+00:00 [info] <0.30876.16> queue 'c***blurred-out-queue-name***e' in vhost '/': granting vote for {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-4.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} with last indexterm {11,1} for term 2 previous term was 2
rabbitmq-cockpit-2
2022-11-02 12:11:39.110207+00:00 [info] <0.30876.16> queue 'c***blurred-out-queue-name***e' in vhost '/': granting vote for {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-4.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} with last indexterm {11,1} for term 3 previous term was 2
rabbitmq-cockpit-2
2022-11-02 12:11:39.412752+00:00 [info] <0.30876.16> queue 'c***blurred-out-queue-name***e' in vhost '/': detected a new leader {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-4.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} in term 3
rabbitmq-cockpit-2
2022-11-02 12:13:07.969246+00:00 [info] <0.7879.2> queue 'c***blurred-out-queue-name***e' in vhost '/': follower did not have entry at 13 in 3. Requesting {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-4.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} from 12
rabbitmq-cockpit-6
2022-11-02 12:13:07.969830+00:00 [info] <0.7879.2> queue 'c***blurred-out-queue-name***e' in vhost '/': detected a new leader {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-4.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} in term 3
rabbitmq-cockpit-6
2022-11-02 14:59:17.531109+00:00 [info] <0.30876.16> queue 'c***blurred-out-queue-name***e' in vhost '/': granting vote for {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-6.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} with last indexterm {13,3} for term 4 previous term was 3
rabbitmq-cockpit-2
2022-11-02 14:59:17.546606+00:00 [notice] <0.7879.2> queue 'c***blurred-out-queue-name***e' in vhost '/': candidate -> leader in term: 4 machine version: 2
rabbitmq-cockpit-6
2022-11-02 14:59:17.549332+00:00 [info] <0.30876.16> queue 'c***blurred-out-queue-name***e' in vhost '/': detected a new leader {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-6.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} in term 4
rabbitmq-cockpit-2
2022-11-02 14:59:25.737714+00:00 [notice] <0.30121.41> queue 'c***blurred-out-queue-name***e' in vhost '/': leader -> follower in term: 4 machine version: 2
rabbitmq-cockpit-4
2022-11-02 14:59:27.624989+00:00 [info] <0.30121.41> queue 'c***blurred-out-queue-name***e' in vhost '/': detected a new leader {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-6.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} in term 4
rabbitmq-cockpit-4
2022-11-03 15:17:34.495878+00:00 [info] <0.30121.41> queue 'c***blurred-out-queue-name***e' in vhost '/': detected a new leader {'%2F_c***blurred-out-queue-name***e','rab...@rabbitmq-cockpit-6.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local'} in term 4
rabbitmq-cockpit-4

We did have the option "force_boot" enabled, not sure if this is relevant or not.

Op vrijdag 4 november 2022 om 10:24:44 UTC+1 schreef kjnilsson:

M

unread,
Nov 4, 2022, 6:00:39 AM11/4/22
to rabbitmq-users
sry I forgot to mention the rabbitmq version:
RabbitMQ 3.10.7 Erlang 24.3.4

Op vrijdag 4 november 2022 om 10:24:44 UTC+1 schreef kjnilsson:
The quorum queue should have recovered only the server logs will be able to tell you why that may not happen.

M

unread,
Nov 8, 2022, 9:11:11 AM11/8/22
to rabbitmq-users
┌────────────────────────────────────────────────────────────────────────────────────────────┬────────────┬───────────┬──────────────┬────────────────┬──────┬─────────────────┐
│ Node Name                                                                                  │ Raft State │ Log Index │ Commit Index │ Snapshot Index │ Term │ Machine Version │
├────────────────────────────────────────────────────────────────────────────────────────────┼────────────┼───────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rab...@rabbitmq-cockpit-3.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local │ noproc     │           │              │                │      │                 │
├────────────────────────────────────────────────────────────────────────────────────────────┼────────────┼───────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rab...@rabbitmq-cockpit-4.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local │ follower   │ 0         │ 0            │ undefined      │ 0    │ 2               │
├────────────────────────────────────────────────────────────────────────────────────────────┼────────────┼───────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rab...@rabbitmq-cockpit-2.rabbitmq-cockpit-headless.rabbitmq-cockpit-new.svc.cluster.local │ pre_vote   │ 52        │ 52           │ undefined      │ 1    │ 2               │
└────────────────────────────────────────────────────────────────────────────────────────────┴────────────┴───────────┴──────────────┴────────────────┴──────┴─────────────────┘
when I check a specific queue, I get that there is no process running for that specific queue. Anybody know how I can repair a queue in this state?
Op vrijdag 4 november 2022 om 11:00:39 UTC+1 schreef M:

kjnilsson

unread,
Nov 9, 2022, 6:40:30 AM11/9/22
to rabbitmq-users
Without seeing the full logs covering when this event started.

Have you manually removed any data from the disk for the restarted nodes or changed any disk related configuration?
Reply all
Reply to author
Forward
0 new messages