Guys, for the past four to three months my team had the same issue - empty queues w/ disks almost full. The mnesia folder was full of messages that were not being GCed. What seems to have fixed was splitting our single Rabbit into N separate instances (not clustering).
How it used to happen was:
- One queue would build up to 500k messages (small messages - usually a DL/Failure queue)
- After consuming it, the disk would start to behave abnormally
- It would only go up until it reached a Disk Alarm
- No abnormal load
Our load is usually:
- 1k/s W/R
- Spikes to 5-6K Writes
- Messages size: 4Kb to 20-30MB messages (depends on which service, which queue, which task)
- Rabbit 3.12.16 (All running Classic V2 Queues, Persistent Messages and Durable Queues)
- ~2k Connections
- ~4k Channels
- 2TB of Disk
About your suggestions to move Mnesia files Michal, since our data has PHI and PII, is there any tool we can use internally to debug it - if it ever happens again? Or any points of concerns? We've been on a rabbit hole for quite a while.