Thanks Michael. In an effort to understand the cause for RMQ memory
growth, I monitored the rx/tx bytes per second of the system (broker) as
well as the top output. Before I explain my findings, I¹d like to go over
our design a bit. We have a management web-application collocated with the
broker process on the virtual machine (12 GB RAM). Web-app talks to the
broker on loopback interface. There are also about 500 other external
applications that talk to the broker on the public interface Œmgmt¹. Now,
on to my findings.
1) RMQ memory use hovers around 400-600 MB during normal messaging, until
the web-app publishes a big message (~ 15 MB) to broker on the loopback
interface. This message is to be fanned out to all the 500 external
applications over the Œmgmt¹ interface.
RMQ Memory:
Thu Apr 16 21:06:27 UTC 2015 24724 urabbit 20 0 798m 383m 3436 S 51
3.2 1:19.01 beam.smp
Thu Apr 16 21:06:30 UTC 2015 24724 urabbit 20 0 4839m 4.3g 3436 S 219
37.0 1:23.43 beam.smp
Thu Apr 16 21:06:32 UTC 2015 24724 urabbit 20 0 5790m 5.2g 3436 S 268
44.8 1:30.27 beam.smp
Thu Apr 16 21:06:35 UTC 2015 24724 urabbit 20 0 6597m 6.0g 3436 S 257
51.3 1:36.63 beam.smp
Thu Apr 16 21:06:37 UTC 2015 24724 urabbit 20 0 7402m 6.8g 3436 S 249
58.0 1:43.31 beam.smp
Thu Apr 16 21:06:40 UTC 2015 24724 urabbit 20 0 8071m 7.4g 3436 S 221
63.5 1:49.49 beam.smp
Loopback interface:
Thu Apr 16 21:06:27 UTC 2015 TX lo: 145 kB/s RX lo: 145 kB/s
Thu Apr 16 21:06:30 UTC 2015 TX lo: 11699 kB/s RX lo: 11699 kB/s
Thu Apr 16 21:06:32 UTC 2015 TX lo: 15 kB/s RX lo: 15 kB/s
Thu Apr 16 21:06:34 UTC 2015 TX lo: 11 kB/s RX lo: 11 kB/s
Œmgmt¹ interface:
Thu Apr 16 21:06:38 UTC 2015 TX mgmt: 1513 kB/s RX mgmt: 12 kB/s
Thu Apr 16 21:06:40 UTC 2015 TX mgmt: 101721 kB/s RX mgmt: 854 kB/s
Thu Apr 16 21:06:42 UTC 2015 TX mgmt: 885220 kB/s RX mgmt: 7594 kB/s
Thu Apr 16 21:06:44 UTC 2015 TX mgmt: 1758334 kB/s RX mgmt: 14969 kB/s
Thu Apr 16 21:06:46 UTC 2015 TX mgmt: 1133121 kB/s RX mgmt: 9737 kB/s
Thu Apr 16 21:06:48 UTC 2015 TX mgmt: 2402 kB/s RX mgmt: 61 kB/s
Thu Apr 16 21:06:50 UTC 2015 TX mgmt: 46463 kB/s RX mgmt: 363 kB/s
Thu Apr 16 21:06:52 UTC 2015 TX mgmt: 23265 kB/s RX mgmt: 201 kB/s
Thu Apr 16 21:06:54 UTC 2015 TX mgmt: 1 kB/s RX mgmt: 0 kB/s
Thu Apr 16 21:06:56 UTC 2015 TX mgmt: 190 kB/s RX mgmt: 69 kB/s
2) RMQ memory use goes back to about 400-600 MB range until another big
message (~15 MB) is published by web-app. We see a similar pattern as
before:
RMQ Memory:
Thu Apr 16 21:09:26 UTC 2015 24724 urabbit 20 0 838m 435m 3436 S 2
3.6 2:23.60 beam.smp
Thu Apr 16 21:09:29 UTC 2015 24724 urabbit 20 0 1144m 730m 3436 S 83
6.1 2:24.34 beam.smp
Thu Apr 16 21:09:31 UTC 2015 24724 urabbit 20 0 6265m 5.7g 3436 S 292
48.9 2:32.14 beam.smp
Thu Apr 16 21:09:34 UTC 2015 24724 urabbit 20 0 7555m 7.0g 3436 S 380
59.5 2:41.31 beam.smp
Thu Apr 16 21:09:37 UTC 2015 24724 urabbit 20 0 8498m 7.9g 3436 S 341
67.2 2:50.59 beam.smp
Thu Apr 16 21:09:39 UTC 2015 24724 urabbit 20 0 8867m 8.2g 1140 S 237
70.2 2:57.13 beam.smp
Thu Apr 16 21:09:42 UTC 2015 24724 urabbit 20 0 8955m 8.3g 1228 S 204
70.8 3:01.53 beam.smp
Thu Apr 16 21:09:45 UTC 2015 24724 urabbit 20 0 9063m 8.4g 1260 S 184
71.5 3:06.42 beam.smp
Thu Apr 16 21:09:48 UTC 2015 24724 urabbit 20 0 9137m 8.4g 1260 S 84
72.0 3:09.83 beam.smp
Thu Apr 16 21:09:51 UTC 2015 24724 urabbit 20 0 9297m 8.6g 1232 S 251
73.0 3:15.27 beam.smp
Thu Apr 16 21:09:54 UTC 2015 24724 urabbit 20 0 9342m 8.6g 1232 S 26
73.4 3:17.07 beam.smp
Thu Apr 16 21:09:56 UTC 2015 24724 urabbit 20 0 9376m 8.6g 1244 S 49
73.5 3:18.60 beam.smp
Thu Apr 16 21:09:59 UTC 2015 24724 urabbit 20 0 9404m 8.6g 1212 S 41
73.7 3:20.05 beam.smp
Thu Apr 16 21:10:02 UTC 2015 24724 urabbit 20 0 9446m 8.7g 1224 S 71
73.9 3:21.64 beam.smp
Thu Apr 16 21:10:05 UTC 2015 24724 urabbit 20 0 9470m 8.7g 1224 S 55
74.1 3:22.98 beam.smp
Thu Apr 16 21:10:07 UTC 2015 24724 urabbit 20 0 9498m 8.7g 1236 S 20
74.3 3:23.91 beam.smp
Thu Apr 16 21:10:10 UTC 2015 24724 urabbit 20 0 9512m 8.7g 1236 S 20
74.4 3:24.64 beam.smp
Thu Apr 16 21:10:13 UTC 2015 24724 urabbit 20 0 9531m 8.7g 1236 S 79
74.5 3:25.67 beam.smp
Thu Apr 16 21:10:15 UTC 2015 24724 urabbit 20 0 9547m 8.7g 1236 S 28
74.7 3:26.40 beam.smp
Thu Apr 16 21:10:18 UTC 2015 24724 urabbit 20 0 9559m 8.8g 1236 S 22
74.8 3:27.18 beam.smp
Thu Apr 16 21:10:21 UTC 2015 24724 urabbit 20 0 9565m 8.8g 1236 S 12
74.8 3:27.81 beam.smp
Thu Apr 16 21:10:24 UTC 2015 24724 urabbit 20 0 9571m 8.8g 1236 S 45
74.9 3:28.34 beam.smp
Thu Apr 16 21:10:27 UTC 2015 24724 urabbit 20 0 9583m 8.8g 1248 S 16
75.0 3:29.11 beam.smp
Thu Apr 16 21:10:30 UTC 2015 24724 urabbit 20 0 9589m 8.8g 1248 S 22
75.1 3:29.84 beam.smp
Thu Apr 16 21:10:33 UTC 2015 24724 urabbit 20 0 9606m 8.8g 1256 S 46
75.2 3:30.78 beam.smp
Thu Apr 16 21:10:36 UTC 2015 24724 urabbit 20 0 9621m 8.8g 1256 S 29
75.3 3:31.57 beam.smp
Thu Apr 16 21:10:39 UTC 2015 24724 urabbit 20 0 9635m 8.8g 1280 S 24
75.5 3:32.44 beam.smp
Thu Apr 16 21:10:42 UTC 2015 24724 urabbit 20 0 9638m 8.8g 1280 S 75
75.5 3:33.77 beam.smp
Thu Apr 16 21:10:44 UTC 2015 24724 urabbit 20 0 9673m 8.9g 1280 S 30
75.8 3:35.18 beam.smp
Thu Apr 16 21:10:47 UTC 2015 24724 urabbit 20 0 9687m 8.9g 1280 S 95
75.9 3:36.00 beam.smp
Thu Apr 16 21:10:49 UTC 2015 24724 urabbit 20 0 9709m 8.9g 1256 S 86
76.0 3:37.94 beam.smp
Thu Apr 16 21:10:53 UTC 2015 24724 urabbit 20 0 9746m 8.9g 1272 S 91
76.2 3:40.29 beam.smp
Thu Apr 16 21:10:56 UTC 2015 24724 urabbit 20 0 9764m 8.9g 1276 S 8
76.3 3:41.74 beam.smp
Thu Apr 16 21:10:58 UTC 2015 24724 urabbit 20 0 9775m 9.0g 1292 S 85
76.4 3:42.91 beam.smp
Thu Apr 16 21:11:01 UTC 2015 24724 urabbit 20 0 9799m 9.0g 1292 S 79
76.6 3:44.57 beam.smp
Thu Apr 16 21:11:04 UTC 2015 24724 urabbit 20 0 9829m 9.0g 1316 S 107
76.8 3:46.36 beam.smp
Thu Apr 16 21:11:07 UTC 2015 24724 urabbit 20 0 0 0 0 Z 82
0.0 3:49.24 beam.smp <defunct>
Loopback interface:
Thu Apr 16 21:09:28 UTC 2015 TX lo: 0 kB/s RX lo: 0 kB/s
Thu Apr 16 21:09:30 UTC 2015 TX lo: 11581 kB/s RX lo: 11581 kB/s
Thu Apr 16 21:09:32 UTC 2015 TX lo: 0 kB/s RX lo: 0 kB/s
Œmgmt¹ interface:
Thu Apr 16 21:09:38 UTC 2015 TX mgmt: 132494 kB/s RX mgmt: 1156 kB/s
Thu Apr 16 21:09:41 UTC 2015 TX mgmt: 141215 kB/s RX mgmt: 1286 kB/s
Thu Apr 16 21:09:43 UTC 2015 TX mgmt: 138487 kB/s RX mgmt: 1222 kB/s
Thu Apr 16 21:09:45 UTC 2015 TX mgmt: 147107 kB/s RX mgmt: 1282 kB/s
Thu Apr 16 21:09:48 UTC 2015 TX mgmt: 61329 kB/s RX mgmt: 633 kB/s
Thu Apr 16 21:09:50 UTC 2015 TX mgmt: 40657 kB/s RX mgmt: 373 kB/s
Thu Apr 16 21:09:52 UTC 2015 TX mgmt: 93964 kB/s RX mgmt: 772 kB/s
Thu Apr 16 21:09:54 UTC 2015 TX mgmt: 26843 kB/s RX mgmt: 230 kB/s
Thu Apr 16 21:09:56 UTC 2015 TX mgmt: 13467 kB/s RX mgmt: 121 kB/s
Thu Apr 16 21:09:59 UTC 2015 TX mgmt: 7103 kB/s RX mgmt: 51 kB/s
Thu Apr 16 21:10:01 UTC 2015 TX mgmt: 17099 kB/s RX mgmt: 145 kB/s
Thu Apr 16 21:10:03 UTC 2015 TX mgmt: 16268 kB/s RX mgmt: 140 kB/s
Thu Apr 16 21:10:05 UTC 2015 TX mgmt: 7208 kB/s RX mgmt: 66 kB/s
Thu Apr 16 21:10:08 UTC 2015 TX mgmt: 14068 kB/s RX mgmt: 119 kB/s
Thu Apr 16 21:10:10 UTC 2015 TX mgmt: 2798 kB/s RX mgmt: 26 kB/s
Thu Apr 16 21:10:13 UTC 2015 TX mgmt: 168 kB/s RX mgmt: 2 kB/s
Thu Apr 16 21:10:15 UTC 2015 TX mgmt: 3250 kB/s RX mgmt: 32 kB/s
Thu Apr 16 21:10:17 UTC 2015 TX mgmt: 1320 kB/s RX mgmt: 41 kB/s
Thu Apr 16 21:10:19 UTC 2015 TX mgmt: 2154 kB/s RX mgmt: 30 kB/s
Thu Apr 16 21:10:22 UTC 2015 TX mgmt: 2443 kB/s RX mgmt: 21 kB/s
Thu Apr 16 21:10:24 UTC 2015 TX mgmt: 1616 kB/s RX mgmt: 15 kB/s
Thu Apr 16 21:10:27 UTC 2015 TX mgmt: 3395 kB/s RX mgmt: 30 kB/s
Thu Apr 16 21:10:29 UTC 2015 TX mgmt: 3862 kB/s RX mgmt: 24 kB/s
Thu Apr 16 21:10:31 UTC 2015 TX mgmt: 3058 kB/s RX mgmt: 30 kB/s
Thu Apr 16 21:10:33 UTC 2015 TX mgmt: 316 kB/s RX mgmt: 3 kB/s
Thu Apr 16 21:10:36 UTC 2015 TX mgmt: 8004 kB/s RX mgmt: 98 kB/s
Thu Apr 16 21:10:38 UTC 2015 TX mgmt: 8548 kB/s RX mgmt: 84 kB/s
Thu Apr 16 21:10:40 UTC 2015 TX mgmt: 3427 kB/s RX mgmt: 30 kB/s
Thu Apr 16 21:10:42 UTC 2015 TX mgmt: 22626 kB/s RX mgmt: 206 kB/s
Thu Apr 16 21:10:44 UTC 2015 TX mgmt: 24337 kB/s RX mgmt: 223 kB/s
Thu Apr 16 21:10:47 UTC 2015 TX mgmt: 13415 kB/s RX mgmt: 126 kB/s
Thu Apr 16 21:10:49 UTC 2015 TX mgmt: 9635 kB/s RX mgmt: 143 kB/s
Thu Apr 16 21:10:51 UTC 2015 TX mgmt: 58734 kB/s RX mgmt: 523 kB/s
Thu Apr 16 21:10:54 UTC 2015 TX mgmt: 56883 kB/s RX mgmt: 435 kB/s
Thu Apr 16 21:10:56 UTC 2015 TX mgmt: 38141 kB/s RX mgmt: 358 kB/s
Thu Apr 16 21:10:58 UTC 2015 TX mgmt: 16653 kB/s RX mgmt: 156 kB/s
Thu Apr 16 21:11:01 UTC 2015 TX mgmt: 23908 kB/s RX mgmt: 207 kB/s
Thu Apr 16 21:11:04 UTC 2015 TX mgmt: 39644 kB/s RX mgmt: 386 kB/s
Thu Apr 16 21:11:08 UTC 2015 TX mgmt: 110407 kB/s RX mgmt: 1015 kB/s
Thu Apr 16 21:11:10 UTC 2015 TX mgmt: 222 kB/s RX mgmt: 438 kB/s
Thu Apr 16 21:11:12 UTC 2015 TX mgmt: 24 kB/s RX mgmt: 32 kB/s
This time, RMQ gets killed before it can empty the queues and clean up its
memory?
My question for you is - why does the RMQ memory jump up by about 5-6 GB,
when a single message of ~15MB is published to be fanned out to ~500
queues. Do you think my analysis of why RMQ crashes is on the mark? In
addition, does anything else strike you as unusual or interesting in the
above data?
Regards
Kapil
On 4/15/15, 8:43 PM, "Michael Klishin" <
mkli...@pivotal.io> wrote:
> On 16 April 2015 at 04:41:09, Kapil Goyal (
goy...@vmware.com) wrote:
>> Trouble is that when broker gets
>> into this state, rabbitmqctl stops responding and I cannot get
>> any
>> information, for example the list of queues or the report. How
>> do you
>> recommend I go about troubleshooting this?
>
>This suggests the OS is swapping. Resource alarms do not affect
>`rabbitmqctl`. Tools
>such as iostat and vmstat will provide more details.
>
>If OS swapping is indeed going on, your best bet is to monitor queue size
>before swapping happens.
>
>There are multiple features that can be used via policies (so, without
>modifying any apps) to limit
>queue growth:
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.rabbitmq.com_blog
>_2014_01_23_preventing-2Dunbounded-2Dbuffers-2Dwith-2Drabbitmq_&d=AwIFaQ&c
>=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=jI39iGhJSMsophpPzgwoqWd6xw0
>5tZ5QPHjlmT5c7Tw&m=_EwRThzIenPLzlS05F308FMhedxkZoK8xWnkIQsnDqo&s=rWllzbrX_
>j3QBE7-mGcuERMrnRD_EBt8nU7nHywgYlw&e= .