Hey folks,
We're using RMQ in production, on k8s, with cluster of 3 nodes. Recently we switched to quorum queues instead of mirroring.
In our use case, we're using a lot of queues, currently the cluster holds around 850.
Currently our main issue is when one of the pods restarts, it takes more than un hour to fulfilling the bootstrap. the bootstrap used to be much faster, and it looks like its degrading as time goes by.
Restarting logs for example:
2021-07-04 10:03:23.337 [info] <0.31825.47> queue 'production...' in vhost '/': detected a new leader {'_production...','rab...@rabbitmq-cluster-1.rabbitmq-cluster-discovery.rabbitmq.svc.cluster.local'} in term 10
2021-07-04 10:03:30.821 [info] <0.2196.48> queue 'production...' in vhost '/': detected a new leader {'_production...','rab...@rabbitmq-cluster-0.rabbitmq-cluster-discovery.rabbitmq.svc.cluster.local'} in term 15
2021-07-04 10:03:36.351 [info] <0.3615.12> queue 'production...' in vhost '/': follower did not have entry at 79749 in 14. Requesting {'_production...','rab...@rabbitmq-cluster-0.rabbitmq-cluster-discovery.rabbitmq.svc.cluster.local'} from 78892
2021-07-04 10:09:08.755 [info] <0.8319.47> queue 'production...' in vhost '/': granting vote for {'_production...','rab...@rabbitmq-cluster-1.rabbitmq-cluster-discovery.rabbitmq.svc.cluster.local'} with last indexterm {66986,14} for term 14094 previous term was 14093
Any help would be appreciated