> by *lowering* vm_memory_high_watermark. Our flow problems only occurred
> when we had a burst of a million messages or so about 160k in size
> instead of our usual much smaller sizes like 4k, 6k and 12k. If your
> cause is similar to ours, vm_memory_high_watermark of 0.03 (yes, that's
> 3 hundredths) will get rid of flow problems.
My reading of why this happened:
The "flow control problem" could be more accurately described as "the
queue is too busy paging out transient messages to accept new ones at
anything other than quite a low rate".
When the burst of larger messages comes in, the queue will fill memory
with them, up until the moment when it goes over the paging ratio. Then
it will start to push them to disk.
When vm_memory_high_watermark is normal, that means it has a lot of
messages to page out at once, and this problem becomes quite visible.
When vm_memory_high_watermark is low, that means it gets to this state
much earlier, and thus has less of a backlog of messages to page out. So
it rather sooner reaches a steady state of messages getting paged out as
they arrive.
For the same effect you might reduce
vm_memory_high_watermark_paging_ratio to a tiny number instead of
vm_memory_high_watermark - that will give the same early-paging effect
but give RabbitMQ more memory for everything else.
Also, the persister performance improvements in 3.5.0 might help you -
but they mostly target smaller messages so I don't know how much it
would help.
Ultimately, queues should have a limit on how much time they devote to
paging out messages versus everything else that's going on. In an ideal
world that might be in 3.6.0, but I don't want to give any guarantees.
Cheers, Simon