Hi Luke,
Thanks for your response.
I just received one information from support guys that due to the continuous issues they restarted rabbitmq nodes. So, maybe some logs belong to that occurrences.
Today morning we encountered the network partitioning issue again. I have attached the splunk messages of the same.
If possible please try to provide any resolution/configuration changes of the problem that we are facing so frequently.
If you need some additional information then please let me know as well.
PS: With 3.6.10 network partitioning issues used to come once a while (at a frequency of 6 month approx).
Just one thing comes to my mind, is there some evident issue in Erlang (22.2.8) or any new configuration introduced which will work in combination with auto-heal and that can solve the problem.
I have seen tick time somewhere, is it playing any role?
For my local debugging, is there a way by which I can reproduce this behavior on my local server? Maybe on my local deployment I can try to apply your suggested solution for quick turn-around.
Pardon me for raising so many questions in a single email, but since this is a production blocker so it is getting critical for us.