vm_memory_high_watermark set: RabbitMQ server is out of memory without clear cause

Mark Engelsman

unread,

Jun 6, 2018, 4:43:02 AM6/6/18

to rabbitmq-users

Hi,

We are running a three node cluster of RabbitMQ and have issues with keeping it up and running regularly. Here is some information about the volumes of usage:

At the most busy time of the day, we have a publish count of around 1300 m/s, with all other numbers (connections, channels, consumers etc.) staying the same.

Last sunday, we had a vm_memory_high_watermark set, so all publishers were blocked from publishing to the cluster.

The memory went up like crazy (from 0.5GB to over 9GB within 10 minutes), but there were not any spikes in other numbers like channels, number of connections, message rate etc..

Here is the log of osmrabbit03 (the server with memory issues):

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

=INFO REPORT==== 3-Jun-2018::16:53:13 ===

connection <0.18188.5690> (10.10.0.210:60616 -> 10.10.0.123:5672): user 'gbadmin' authenticated and granted access to vhost '/'

=WARNING REPORT==== 3-Jun-2018::17:14:09 ===

closing AMQP connection <0.18188.5690> (10.10.0.210:60616 -> 10.10.0.123:5672, vhost: '/', user: 'gbadmin'):

client unexpectedly closed TCP connection

=ERROR REPORT==== 3-Jun-2018::17:15:31 ===

Ranch listener rabbit_web_dispatch_sup_15672 had connection process started with cowboy_protocol:start_link/4 at <0.25283.5808> exit with reason: [{reason,{timeout,{gen_server,call,[<0.572.0>,{fetch,#Fun<rabbit_mgmt_db.22.104142656>, ... (all queues and exchanges)

(... About 10 error reports as the one above)

=ERROR REPORT==== 3-Jun-2018::17:19:31 ===

Ranch listener rabbit_web_dispatch_sup_15672 had connection process started with cowboy_protocol:start_link/4 at <0.20964.5822> exit with reason: [{reason,{timeout,{gen_server,call,[<0.572.0>,{fetch,#Fun<rabbit_mgmt_db.22.104142656>, ... (all queues and exchanges)=INFO REPORT==== 3-Jun-2018::17:19:38 ===

vm_memory_high_watermark set. Memory used:7125663744 allowed:6871733043

=WARNING REPORT==== 3-Jun-2018::17:19:38 ===

memory resource limit alarm set on node rabbit@osmrabbit03.

**********************************************************

*** Publishers will be blocked until this alarm clears ***

**********************************************************

=INFO REPORT==== 3-Jun-2018::17:19:39 ===

vm_memory_high_watermark clear. Memory used:3993567232 allowed:6871733043

=WARNING REPORT==== 3-Jun-2018::17:19:39 ===

memory resource limit alarm cleared on node rabbit@osmrabbit03

=WARNING REPORT==== 3-Jun-2018::17:19:39 ===

memory resource limit alarm cleared across the cluster

=ERROR REPORT==== 3-Jun-2018::17:20:01 ===

Ranch listener rabbit_web_dispatch_sup_15672 had connection process started with cowboy_protocol:start_link/4 at <0.14631.5824> exit with reason: [{reason,{timeout,{gen_server,call,[<0.572.0>,{fetch,#Fun<rabbit_mgmt_db.22.104142656>, ... (all queues and exchanges)

=ERROR REPORT==== 3-Jun-2018::17:20:24 ===

Ranch listener rabbit_web_dispatch_sup_15672 had connection process started with cowboy_protocol:start_link/4 at <0.22413.5825> exit with reason: [{reason,{timeout,{gen_server,call,[<0.567.0>,{fetch,#Fun<rabbit_mgmt_db.21.104142656>,[]},60000]}}},{mfa,{rabbit_mgmt_wm_connections,to_json,2}},{stacktrace,[{gen_server,call,3,[{file,"gen_server.erl"},{line,214}]},{rabbit_mgmt_db,submit_cached,2,[{file,"src/rabbit_mgmt_db.erl"},{line,707}]},{rabbit_mgmt_wm_connections,augmented,2,[{file,"src/rabbit_mgmt_wm_connections.erl"},{line,55}]},{rabbit_mgmt_wm_connections,to_json,2,[{file,"src/rabbit_mgmt_wm_connections.erl"},{line,43}]},{cowboy_rest,call,3,[{file,"src/cowboy_rest.erl"},{line,976}]},{cowboy_rest,set_resp_body,2,[{file,"src/cowboy_rest.erl"},{line,858}]},{cowboy_protocol,execute,4,[{file,"src/cowboy_protocol.erl"},{line,442}]}]},{req,[{socket,#Port<0.2067080>},{transport,ranch_tcp},{connection,close},{pid,<0.22413.5825>},{method,<<"GET">>},{version,'HTTP/1.1'},{peer,{{10,10,0,252},38085}},{host,<<"10.10.0.123">>},{host_info,undefined},{port,15672},{path,<<"/api/connections">>},{path_info,undefined},{qs,<<>>},{qs_vals,[]},{bindings,[]},{headers,[{<<"te">>,<<"deflate,gzip;q=0.3">>},{<<"connection">>,<<"TE, close">>},{<<"authorization">>,<<"Basic bW9uaXRvcmluZzpQN0tLTVo2V2tnSHpkUTF2">>},{<<"host">>,<<"10.10.0.123:15672">>},{<<"user-agent">>,<<"check_rabbitmq_connections libwww-perl/6.08">>}]},{p_headers,[{<<"if-modified-since">>,undefined},{<<"if-none-match">>,undefined},{<<"if-unmodified-since">>,undefined},{<<"if-match">>,undefined},{<<"accept">>,undefined},{<<"connection">>,[<<"te">>,<<"close">>]}]},{cookies,undefined},{meta,[{media_type,{<<"application">>,<<"json">>,[]}},{charset,undefined}]},{body_state,waiting},{buffer,<<>>},{multipart,undefined},{resp_compress,true},{resp_state,waiting},{resp_headers,[{<<"vary">>,[<<"accept">>,[<<", ">>,<<"accept-encoding">>],[<<", ">>,<<"origin">>]]},{<<"content-type">>,[<<"application">>,<<"/">>,<<"json">>,<<>>]},{<<"vary">>,<<"origin">>}]},{resp_body,<<>>},{onresponse,#Fun<rabbit_cowboy_middleware.onresponse.4>}]},{state,{context,{user,<<"monitoring">>,[monitoring],[{rabbit_auth_backend_internal,none}]},<<"P7KKMZ6WkgHzdQ1v">>,undefined}}]

=ERROR REPORT==== 3-Jun-2018::17:20:31 ===

Ranch listener rabbit_web_dispatch_sup_15672 had connection process started with cowboy_protocol:start_link/4 at <0.2776.5826> exit with reason: [{reason,{timeout,{gen_server,call,[<0.572.0>,{fetch,#Fun<rabbit_mgmt_db.22.104142656>, ... (all queues and exchanges)

=ERROR REPORT==== 3-Jun-2018::17:21:01 ===

Ranch listener rabbit_web_dispatch_sup_15672 had connection process started with cowboy_protocol:start_link/4 at <0.27697.5827> exit with reason: [{reason,{timeout,{gen_server,call,[<0.572.0>,{fetch,#Fun<rabbit_mgmt_db.22.104142656>, ... (all queues and exchanges)

=ERROR REPORT==== 3-Jun-2018::17:21:31 ===

Ranch listener rabbit_web_dispatch_sup_15672 had connection process started with cowboy_protocol:start_link/4 at <0.15965.5829> exit with reason: [{reason,{timeout,{gen_server,call,[<0.572.0>,{fetch,#Fun<rabbit_mgmt_db.22.104142656>, ... (all queues and exchanges)

=ERROR REPORT==== 3-Jun-2018::17:21:36 ===

Ranch listener rabbit_web_dispatch_sup_15672 had connection process started with cowboy_protocol:start_link/4 at <0.24446.5829> exit with reason: [{reason,{timeout,{gen_server,call,[<0.567.0>,{fetch,#Fun<rabbit_mgmt_db.21.104142656>,[]},60000]}}},{mfa,{rabbit_mgmt_wm_connections,to_json,2}},{stacktrace,[{gen_server,call,3,[{file,"gen_server.erl"},{line,214}]},{rabbit_mgmt_db,submit_cached,2,[{file,"src/rabbit_mgmt_db.erl"},{line,707}]},{rabbit_mgmt_wm_connections,augmented,2,[{file,"src/rabbit_mgmt_wm_connections.erl"},{line,55}]},{rabbit_mgmt_wm_connections,to_json,2,[{file,"src/rabbit_mgmt_wm_connections.erl"},{line,43}]},{cowboy_rest,call,3,[{file,"src/cowboy_rest.erl"},{line,976}]},{cowboy_rest,set_resp_body,2,[{file,"src/cowboy_rest.erl"},{line,858}]},{cowboy_protocol,execute,4,[{file,"src/cowboy_protocol.erl"},{line,442}]}]},{req,[{socket,#Port<0.2041008>},{transport,ranch_tcp},{connection,close},{pid,<0.24446.5829>},{method,<<"GET">>},{version,'HTTP/1.1'},{peer,{{10,10,0,252},38680}},{host,<<"10.10.0.123">>},{host_info,undefined},{port,15672},{path,<<"/api/connections">>},{path_info,undefined},{qs,<<>>},{qs_vals,[]},{bindings,[]},{headers,[{<<"te">>,<<"deflate,gzip;q=0.3">>},{<<"connection">>,<<"TE, close">>},{<<"authorization">>,<<"Basic bW9uaXRvcmluZzpQN0tLTVo2V2tnSHpkUTF2">>},{<<"host">>,<<"10.10.0.123:15672">>},{<<"user-agent">>,<<"check_rabbitmq_connections libwww-perl/6.08">>}]},{p_headers,[{<<"if-modified-since">>,undefined},{<<"if-none-match">>,undefined},{<<"if-unmodified-since">>,undefined},{<<"if-match">>,undefined},{<<"accept">>,undefined},{<<"connection">>,[<<"te">>,<<"close">>]}]},{cookies,undefined},{meta,[{media_type,{<<"application">>,<<"json">>,[]}},{charset,undefined}]},{body_state,waiting},{buffer,<<>>},{multipart,undefined},{resp_compress,true},{resp_state,waiting},{resp_headers,[{<<"vary">>,[<<"accept">>,[<<", ">>,<<"accept-encoding">>],[<<", ">>,<<"origin">>]]},{<<"content-type">>,[<<"application">>,<<"/">>,<<"json">>,<<>>]},{<<"vary">>,<<"origin">>}]},{resp_body,<<>>},{onresponse,#Fun<rabbit_cowboy_middleware.onresponse.4>}]},{state,{context,{user,<<"monitoring">>,[monitoring],[{rabbit_auth_backend_internal,none}]},<<"P7KKMZ6WkgHzdQ1v">>,undefined}}]

=ERROR REPORT==== 3-Jun-2018::17:22:01 ===

Ranch listener rabbit_web_dispatch_sup_15672 had connection process started with cowboy_protocol:start_link/4 at <0.5944.5831> exit with reason: [{reason,{timeout,{gen_server,call,[<0.572.0>,{fetch,#Fun<rabbit_mgmt_db.22.104142656>, ... (all queues and exchanges)

=INFO REPORT==== 3-Jun-2018::17:22:25 ===

vm_memory_high_watermark set. Memory used:10770513920 allowed:6871733043

=WARNING REPORT==== 3-Jun-2018::17:22:25 ===

memory resource limit alarm set on node rabbit@osmrabbit03.

**********************************************************

*** Publishers will be blocked until this alarm clears ***

**********************************************************

=INFO REPORT==== 3-Jun-2018::17:22:26 ===

vm_memory_high_watermark clear. Memory used:6206062592 allowed:6871733043

=WARNING REPORT==== 3-Jun-2018::17:22:26 ===

memory resource limit alarm cleared on node rabbit@osmrabbit033

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

After this, we restarted the node that had memory issues:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

=INFO REPORT==== 3-Jun-2018::17:40:14 ===

Starting RabbitMQ 3.6.15 on Erlang 20.0

Licensed under the MPL. See http://www.rabbitmq.com/

=INFO REPORT==== 3-Jun-2018::17:40:14 ===

node : rabbit@osmrabbit03

home dir : C:\Windows

config file(s) : c:/Users/ADMINI~1/AppData/Roaming/RabbitMQ/rabbitmq.config

cookie hash : k4DIRENyx5vP/nEsXKZKKg==

log : C:/Users/ADMINI~1/AppData/Roaming/RabbitMQ/log/RABBIT~1.LOG

sasl log : C:/Users/ADMINI~1/AppData/Roaming/RabbitMQ/log/RABBIT~2.LOG

database dir : c:/Users/ADMINI~1/AppData/Roaming/RabbitMQ/db/RABBIT~1

=INFO REPORT==== 3-Jun-2018::17:40:20 ===

Memory high watermark set to 6553 MiB (6871733043 bytes) of 16383 MiB (17179332608 bytes) total

=INFO REPORT==== 3-Jun-2018::17:40:20 ===

Enabling free disk space monitoring

=INFO REPORT==== 3-Jun-2018::17:40:20 ===

Disk free limit set to 50MB

=INFO REPORT==== 3-Jun-2018::17:40:20 ===

Limiting to approx 8092 file handles (7280 sockets)

=INFO REPORT==== 3-Jun-2018::17:40:20 ===

FHC read buffering: OFF

FHC write buffering: ON