High amount of erlang processes after a while

1,975 views
Skip to first unread message

Alexander Birkner

unread,
Apr 26, 2015, 8:49:24 AM4/26/15
to rabbitm...@googlegroups.com
Hello,

I've running a RabbitMQ cluster with 3 nodes at the moment. After a while I have always the same strange behaviour. One of the nodes has many erlang processes running. Memory limit is also reached which means no connection is more allowed. 

The cluster replicates all queues to all nodes. I've seen there must be be something wrong with the replication. If I restart bunny02 and bunny03 the problem on bunny01 get's resolved after a few seconds. Really strange. 

Does anyone of you have a idea why this case happens? I can't really find something helpfull in the log files of the nodes.

Best regards
Alexander 




Michael Klishin

unread,
Apr 26, 2015, 12:00:59 PM4/26/15
to rabbitm...@googlegroups.com, Alexander Birkner
On 26 April 2015 at 15:49:26, Alexander Birkner (alex.b...@gmail.com) wrote:
> I've running a RabbitMQ cluster with 3 nodes at the moment. After
> a while I have always the same strange behaviour. One of the nodes
> has many erlang processes running. Memory limit is also reached
> which means no connection is more allowed.
>
> The cluster replicates all queues to all nodes. I've seen there
> must be be something wrong with the replication. If I restart
> bunny02 and bunny03 the problem on bunny01 get's resolved after
> a few seconds. Really strange.

You haven't specified what version you are running. If it is pre-3.5, I suspect it may be lack of inter-node
flow control, introduced in 3.5.0. However, that doesn't typically result in higher number of process.

Another reason may be that statistics DB gets overloaded — it has a known bottleneck — it tries
to drop enqueued collected events to cope. That is the case when you have 10s or 100s of thousands of
stats-emitting things, primarily connections, channels, and queues. Can that be the case? 

Can you post `rabbitmqctl status`?
--
MK

Staff Software Engineer, Pivotal/RabbitMQ


Alexander Birkner

unread,
Apr 26, 2015, 12:36:02 PM4/26/15
to rabbitm...@googlegroups.com, alex.b...@gmail.com
Hello Michael,

I'm currently using RabbitMQ in version 3.3.4, I will try a newer version. 

This is the current status of the bunny01 node, but I've restarted it because the memory limit was reached (dirty fix):
 {running_applications,
     [{rabbitmq_management,"RabbitMQ Management Console","3.3.4"},
      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.3.4"},
      {webmachine,"webmachine","1.10.3-rmq3.3.4-gite9359c7"},
      {mochiweb,"MochiMedia Web Server","2.7.0-rmq3.3.4-git680dba8"},
      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.3.4"},
      {rabbit,"RabbitMQ","3.3.4"},
      {os_mon,"CPO  CXC 138 46","2.2.14"},
      {inets,"INETS  CXC 138 49","5.9.7"},
      {mnesia,"MNESIA  CXC 138 12","4.11"},
      {amqp_client,"RabbitMQ AMQP Client","3.3.4"},
      {xmerl,"XML parser","1.3.5"},
      {sasl,"SASL  CXC 138 11","2.3.4"},
      {stdlib,"ERTS  CXC 138 10","1.19.4"},
      {kernel,"ERTS  CXC 138 10","2.16.4"}]},
 {os,{unix,linux}},
 {erlang_version,
     "Erlang R16B03 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:30] [kernel-poll:true]\n"},
 {memory,
     [{total,558885808},
      {connection_procs,305227720},
      {queue_procs,27174408},
      {plugins,496376},
      {other_proc,17877368},
      {mnesia,3727992},
      {mgmt_db,11840},
      {msg_index,7479656},
      {other_ets,16781960},
      {binary,133763016},
      {code,19878051},
      {atom,703377},
      {other_system,25764044}]},
 {alarms,[]},
 {listeners,[{clustering,25672,"::"},{amqp,5672,"::"}]},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,3349222195},
 {disk_free_limit,50000000},
 {disk_free,39997566976},
 {file_descriptors,
     [{total_limit,102300},
      {total_used,1250},
      {sockets_limit,92068},
      {sockets_used,315}]},
 {processes,[{limit,1048576},{used,54762}]},
 {run_queue,1},
 {uptime,9566}]
...done.

So I will update the instances to the latest version from your website. If this happens again I will send you a newer `rabbitmqctl status` output. 
The stats DB thing is possible but currently there is no high load on the cluster. Maybe its really the first thing with the inter-node lack. 

I will give you an update if the upgrade was the solution or not.

Thank you very much!

Best regards,
Alexander

Michael Klishin

unread,
Apr 26, 2015, 4:14:18 PM4/26/15
to rabbitm...@googlegroups.com, Alexander Birkner
 On 26 April 2015 at 19:36:03, Alexander Birkner (alex.b...@gmail.com) wrote:
> I'm currently using RabbitMQ in version 3.3.4, I will try a newer
> version.
>
> This is the current status of the bunny01 node, but I've restarted
> it because the memory limit was reached (dirty fix)

Yes, we need memory use breakdown when the issue happens.

3.3.4 is multiple releases behind, take a look at the change log:
http://rabbitmq.com/changelog.html

Of course, trying to guess which one of the fixed problems it may be without seeing
memory use breakdown or having some information on what those processes are (their "types")
wouldn't be very effective.

Alexander Birkner

unread,
Apr 27, 2015, 6:34:47 PM4/27/15
to rabbitm...@googlegroups.com, alex.b...@gmail.com
Hello Michael,

seems like the upgrade of the RabbitMQ version has resolved this issue.
No problems with increasing of processes/memory any more.

If they are coming back (hopefully not), I will let you know. :)

Thank you very much! 

Best regards
Alexander

Alexander Birkner

unread,
Apr 29, 2015, 5:42:48 PM4/29/15
to rabbitm...@googlegroups.com, alex.b...@gmail.com
Hello Michael,

it happened again. I've created a status dump of all 3 instances for you. Seems like the software update doesn't helped. I can imagine it comes from the replication. The cluster was running fine since a few weeks. After I've enabled the replication of all queues last week I have this problems.

Maybe there is something wrong? I can't see any error messages.

Or does the cluster needs more memory for ~ 1000 consumers? There are currently only ~ 5 messages / second in the complete cluster.
The cluster is sleeping the most of the time :) So I can't imagine that 3gb / node is to less.


Best regards,
Alexander


Am Sonntag, 26. April 2015 22:14:18 UTC+2 schrieb Michael Klishin:
node03.txt
node02.txt
node01.txt

Michael Klishin

unread,
Apr 29, 2015, 6:08:19 PM4/29/15
to rabbitm...@googlegroups.com, Alexander Birkner
On 30 April 2015 at 00:42:49, Alexander Birkner (alex.b...@gmail.com) wrote:
> Or does the cluster needs more memory for ~ 1000 consumers? There
> are currently only ~ 5 messages / second in the complete cluster.
> The cluster is sleeping the most of the time :) So I can't imagine
> that 3gb / node is to less.

If this only happens when mirroring is enabled, the issue is unlikely to be consumers.

However, *connections* can take a fair share of RAM with default TCP buffer sizes. Those
can be tuned, e.g. to 8 or 16 KB, to significantly reduce RAM cost per connection. This will negatively
affect throughput but for low volume workloads this is quite reasonable:
https://github.com/rabbitmq/rabbitmq-server/blob/master/docs/rabbitmq.config.example#L163-173

So this is something that is probably a good idea for your case regardless of this issue.

In your case connection channels and writers use most of the RAM on node1.

On node3, queue processes use about 0.1 GB. Binary data use is also about the same amount.
About 0.6 GB is used by plugins (!) We've fixed an issue with high memory use by STOMP several versions ago.
I'm not sure how mirroring can be related, especially since RAM use by queue processes seems to be low.

Can you send me log files off list? There's a STOMP plugin issue that may leave connections around
when certain I/O exceptions happen:
https://github.com/rabbitmq/rabbitmq-stomp/issues/7 — I can send you a STOMP plugin .ez build from the stable
branch. 

Michael Klishin

unread,
Apr 29, 2015, 6:23:52 PM4/29/15
to rabbitm...@googlegroups.com, Alexander Birkner
On 30 April 2015 at 01:08:15, Michael Klishin (mkli...@pivotal.io) wrote:
> On node3, queue processes use about 0.1 GB. Binary data use is
> also about the same amount.
> About 0.6 GB is used by plugins (!)

On node2, 70% of RAM is used by connection_channels and connection_other. 
Again, I'm curious if mirroring is not the root cause here.

I'm attaching a STOMP plugin build that will be in 3.5.2. Simply replace the STOMP .ez in the plugins
directory and restart the node. Please give it a try.
rabbitmq_stomp-3.5.1.99.ez

Michael Klishin

unread,
Apr 29, 2015, 6:39:04 PM4/29/15
to rabbitm...@googlegroups.com, Alexander Birkner
On 30 April 2015 at 01:23:50, Michael Klishin (mkli...@pivotal.io) wrote:
> On node2, 70% of RAM is used by connection_channels and connection_other.

On node3 that is 37% for mgmt_db, 15% for plugins, connection_channels and connection_other
collectively take 39% of RAM.

So at least the breakdown is fairly similar on nodes 2 and 3.

I've got logs from Alexander, asking more questions off-list. 

Alexander Birkner

unread,
May 1, 2015, 9:10:14 AM5/1/15
to rabbitm...@googlegroups.com, alex.b...@gmail.com
Hello,

we've found the reason for the memory/process increase.
I had a channel leak in my app. Channels now getting closed correctly after use. 

Everything fine now :)

Thank you again Michael for helping me finding the reason. 

Best regards,
Alexander 

MARVIN JOSUE CORTEZ RODAS

unread,
Apr 27, 2023, 10:07:34 AM4/27/23
to rabbitmq-users
How did you solve it?
Reply all
Reply to author
Forward
0 new messages