Hi
Luke
Just for readers to recap: there are 2 current questions.
Question №1: I/O Performance
Question №2: Proper Flow Control under pressure
So regarding Question №1
We used to measure RAID performance using bonnie++ utility (plus plain old dd with oflag direct). I included results in bonnie.bench file
There is a software RAID10 and XFS on top of it.
In a runtime, we use iotop and iostat to check I/O:
Here is how it looks like:
Please find more information attached. Relevant command > file mapping:
cat /etc/fstab > fstab.conf
sysctl -a > sysctl.conf
mdadm --detail /dev/md127 > mdadm.conf
cat /etc/rabbitmq.config > rmq.conf
cat /etc/rabbitmq-env.conf >> rmq.conf
xfs_info /dev/md127 > xfs.conf
rabbitmqctl status > rmq.status
In regard to Question №2
We are observing the following lifecycle:
1) RabbitMQ enqueues messages into lazy queue (for the sake of clarity we experiment with 1 queue, as described previously) at high rate (3-7 Gbps), but actual disk writes are ~3Gbps
2) RabbitMQ reports that it has written 100GB to disk. But in fact we can see that almost all data resides in cached RAM
3) In background size of /data (message store location) increases, so Rabbit writes to disk
4) At some point in time, when amount of occupied cached RAM approaches to total available RAM, size of RabbitMQ process starts growing to high watermark (current 38GB RSS) and only at that point it applies Flow Control
5) Under these conditions, amount of free RAM drops down to 600-700 MB
6) the outcome in the majority of cases is erlang VM crashes with "Slogan: binary_alloc: Cannot allocate 1XXXXXX bytes of memory (of type "binary")." So it can't even
Here is usual layout, when resources are exhausted. RMQ Server process occupies only 16GB:, but uses 104GB of cache in RAM
# ps aux | grep beam
rabbitmq 7510 335 12.2 49440656 16187832 ? Sl 06:29 385:10 /usr/lib/erlang/erts-9.3/bin/beam.smp -W w -A 192 -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 512 -MMmcs 30 -P 1048576 -t 5000000 -stbt db -zdbbl 1280000 -K true -A 1024 -sub true -B i -- -root /usr/lib/erlang
-progname erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.7.6/ebin -noshell -noinput -s rabbit boot -sname rabbit@rmq-14-cl-1 -boot start_sasl -config /etc/rabbitmq/rabbitmq -kernel inet_default_connect_options [{nodelay,true}] -sasl errlog_type error
-sasl sasl_error_logger false -rabbit lager_log_root "/var/log/rabbitmq" -rabbit lager_default_file "/var/log/rabbitmq/rab...@rmq-14-cl-1.log" -rabbit lager_upgrade_file "/var/log/rabbitmq/rabbit@rmq-14-cl-1_upgrade.log" -rabbit enabled_plugins_file "/etc/rabbitmq/enabled_plugins"
-rabbit plugins_dir "/usr/lib/rabbitmq/plugins:/usr/lib/rabbitmq/lib/rabbitmq_server-3.7.6/plugins" -rabbit plugins_expand_dir "/data/rabbit@rmq-14-cl-1-plugins-expand" -os_mon start_cpu_sup false -os_mon start_disksup false -os_mon start_memsup false -mnesia dir "/data/rabbit@rmq
-14-cl-1" -kernel inet_dist_listen_min 25672 -kernel inet_dist_listen_max 25672
# root@rmq-14-cl-1:~# free -hm
total used free shared buffers cached
Mem: 125G 125G 673M 880K 20M 104G
-/+ buffers/cache: 20G 105G
Swap: 15G 39M 15G