We are using RabbitMQ 3.8.2 version and we have a consumer microservice which reads from a quorum queue and publishes the messages to S3. We have 3 instances of the microservice(MS) consuming from the same quorum queue but often while consuming one of the nodes stops consuming and the load is balanced between the other 2 consumers but at times another instance also goes down after a few hours. Once we restart the microservices they will start consuming messages and working fine.
We don't have any error or exception in the micro service logs but the RabbitMQ logs says acknowledgement timed out after 18000ms(
https://www.rabbitmq.com/consumers.html#acknowledgement-timeout). We are using Manual Acknowledgement of the messages and we are using the Reactive RabbitMQ client library. We can see the acknowledgement is happening fine for both acks and nack scenarios in the code.
Is there any way which we can try to identify which message is missing the manual acknowledgement as we don't see any messages in RabbitMQ Management UI as pending acknowledgement when the consumer disconnect happens(
https://www.rabbitmq.com/confirms.html#automatic-requeueing). We don't see any clue from the MS, the connection seems to be terminated by the RabbitMQ broker and is there any way to identify details on what went wrong or troubleshoot this issue further to identify and fix the root cause of the problem with the connection getting disconnected from the broker. We are not able to recreate the same problem as it happens sporadically and not consistently.
Please let us know if there are any ways we can try and troubleshoot this issue to resolve them.
Thank you,
Regards,
Ramalingam.V