Hi Michael,
Apologies for using channel and connection a bit loosely. I have translated our POC to perftest commands so that things are a bit more clear.
Here is the use case I tried to describe yesterday. We have a producer which produces 5 million messages.
bin/runjava com.rabbitmq.perf.PerfTest -x 1 -y 0 -u "non-lazy" --id "test 1" -pmessages 5000000
After this has completed we can see that there are 224,013 in memory and 4,775,987 paged out. So far so good!
We now run the consumer with the following settings.
bin/runjava com.rabbitmq.perf.PerfTest -x 0 -y 1 -u "non-lazy" --id "test 1" -q 0 -A 5000001
We are not crazy :) we specifically run the consumer with a QoS of 0 and to acknowledge after 5000001 so that we simulate a miscofigured consumer or a misbehaving one.
What we can observe is that the paged out elements start moving back memory (as expected) and rabbit eventually dies (hmmm!). On easy way to solve this is by setting the QoS (something > 0) but why does this happen? Why would a server chose death over say not handing out further messages?
This can be reproduced by running the following docker image:
version: '3'
services:
rabbitmq:
image: rabbitmq:3.6.12-management
hostname: rabbit
ports:
- "5672:5672"
- "15672:15672"
environment:
- RABBITMQ_ERLANG_COOKIE='secret cookie here'
on a host with 8 CPUs and 7.2 Gb of memory.
# Publish observations
During the publish we can observe that we hit the memory high water alert quite a number of times (and frequently) - in some cases, the memory goes up to 3.7Gb (the high-level watermark is at 2.8Gb). In order to control the high water mark fluctuations we found that the following configuration gives better results:
# rabbit.config
```
[
{ rabbit, [
{ loopback_users, [ ] },
{ tcp_listeners, [ 5672 ] },
{ ssl_listeners, [ ] },
{ hipe_compile, false },
{ vm_memory_high_watermark_paging_ratio, 0.1 },
{ vm_memory_high_watermark, 0.3 }
] },
{ rabbitmq_management, [ { listener, [
{ port, 15672 },
{ ssl, false }
] } ] }
].
```
{ vm_memory_high_watermark_paging_ratio, 0.5 },
{ vm_memory_high_watermark, 0.4 }
based on a particular configuration? Do you have any other insights which we might be missing?
During the investigation, we have also tried lazy queues and there seems to be a general feeling that these should be enabled when the producer produces at a faster rate than the consumer? (our use case) We have experimented with lazy queues and we cannot perceivable performance degradation - on the contrary, the memory is way more stable (as expected). In our case we typically have a 4:1 ratio between producer and consumer rates, are there any recommendations on when one would enable lazy queues?
Thanks a lot for your time
--
Mark