RabbitMq - erlang crash on windows server

457 views
Skip to first unread message

stefaan van hoof

unread,
Nov 21, 2014, 10:12:19 AM11/21/14
to rabbitm...@googlegroups.com
I'm encountering this crash

=erl_crash_dump:0.3
Fri Nov 21 15:33:24 2014
Slogan: eheap_alloc: Cannot allocate 441485240 bytes of memory (of type "heap").
System version: Erlang R16B03 (erts-5.10.4) [64-bit] [smp:4:4] [async-threads:30]
Compiled: Mon Dec  9 20:22:33 2013
Taints: 
Atoms: 23161
=memory
total: 4250698832
processes: 3862648616
processes_used: 3862648096




And here is some background of  how it happened 

We as a company are switching from Msmq to RabbitMq.
We have 50.000 embedded devices that are constantly sending and receiving messages.
So i made a setup that has 50.000 queues for ' To embedded messages'.
Next i made a Load test that basically sends a couple of messages to such a  queue every 10 min or so.

The trouble i have is making something that simulates 50.000 embedded devices.
What i did was make a (C#) parallel for loop that reuses the same connection but creates a channel per queue and reads from it until empty, and than destroys the channel.
The problem with this approach is that it sucks CPU like crazy. And i suppose that is what it is suppose to do in a tight loop.
Then things start becoming weird. The time it takes for creating the channel en dequeueing  becomes really really slow (5 -9 sec instead of 50 ms) and basically  if i wait long enough the whole thing crashes, management portal stops working.
So what i'm seeing is the erl.exe process is taking a lot of cpu to a 100% and even when i stop the test it takes a long time for erl.exe process cpu to recover. 

Like i said maby  its because of the way i'm doing the load test, parallel looping and making channels.  but i have no idea how otherwise to simulate 50.000 devices dequeing messages.

Anyway if i can not solve this. Our idiotic management will not switch to rabbitmq and will try to find a different solution. And that makes me a sad bunny :<.
That been said a full blown crash of a our messaging system on production would be a disaster.

I read some post this is because of windows (yes the old its windows excuse.) in combination with an older version of erlang.
What i did not read in the posts was a clear defined solution.

Some help would be appreciated. Thx



Michael Klishin

unread,
Nov 21, 2014, 1:08:11 PM11/21/14
to rabbitm...@googlegroups.com, stefaan van hoof
On 21 November 2014 at 15:12:20, stefaan van hoof (stefaan....@gmail.com) wrote:
> I read some post this is because of windows (yes the old its windows
> excuse.) in combination with an older version of erlang.
> What i did not read in the posts was a clear defined solution.
>
> Some help would be appreciated. Thx

A common thing on Windows is when people install 32 bit Erlang on a 64 bit OS. That seems to
not be the case for you. However, I can see some issue reports where people say that
exactly the same setup works on Linux. 

Have you tried Erlang 17.1 or 17.3?
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Simon MacMullen

unread,
Nov 24, 2014, 4:59:49 AM11/24/14
to Michael Klishin, rabbitm...@googlegroups.com, stefaan van hoof
On 21/11/2014 18:07, Michael Klishin wrote:
> On 21 November 2014 at 15:12:20, stefaan van hoof (stefaan....@gmail.com) wrote:
>> I read some post this is because of windows (yes the old its windows
>> excuse.) in combination with an older version of erlang.
>> What i did not read in the posts was a clear defined solution.
>>
>> Some help would be appreciated. Thx
>
> A common thing on Windows is when people install 32 bit Erlang on a 64 bit OS. That seems to
> not be the case for you.

Yes, but the underlying issue is that the Erlang VM was simply unable to
get more memory.

The RabbitMQ memory alarm tries to prevent memory exhaustion by stopping
accepting messages, since messages are usually the biggest user of
memory. But it doesn't attempt to stop anything else when running short
of memory. So if you were creating tons of connections and/or channels
it would be possible to simply exhaust memory like that, leading to a
crash like the above.

I am also interested to see the broker using 3.8 billion processes. That
implies either the OP has found a process leak or is creating a truly
enormous number of connections, channels or queues.

Cheers, Simon

Simon MacMullen

unread,
Nov 24, 2014, 5:40:01 AM11/24/14
to Michael Klishin, rabbitm...@googlegroups.com, stefaan van hoof
On 24/11/2014 09:59, Simon MacMullen wrote:
> I am also interested to see the broker using 3.8 billion processes. That
> implies either the OP has found a process leak or is creating a truly
> enormous number of connections, channels or queues.

Oops, no, this is wrong. The VM is reporting 3.8GB memory used by
processes - which is much more plausible.

Cheers, Simon

Message has been deleted

stefaan van hoof

unread,
Nov 24, 2014, 5:59:19 AM11/24/14
to rabbitm...@googlegroups.com
I will try to update Erlang first and post results here. But its end of release so no more experimenting using RabbitMq this week. 

stefaan van hoof

unread,
Nov 24, 2014, 10:24:14 AM11/24/14
to rabbitm...@googlegroups.com
Is erlang or RabbitMq limiting my memory usage or something ?
i have 16 of Ram available. and the last time i checked during the loas tests, ram was no issue. Cpu however was running up to 100 %

stefaan van hoof

unread,
Nov 24, 2014, 10:35:00 AM11/24/14
to rabbitm...@googlegroups.com
i also noticed that my Server is already using around 80 % of its 16 GB Ram to other processes. 
meaning i only have 20% left. I think rabbitMq had a default of 40 % on 64X machines.
wich would mean that on the high load i'm trying. things could go wrong.

could this be the problem or is rabbit/erlang smarter then that ? 
 

Simon MacMullen

unread,
Nov 24, 2014, 10:37:53 AM11/24/14
to stefaan van hoof, rabbitm...@googlegroups.com
On 24/11/14 15:35, stefaan van hoof wrote:
> i also noticed that my Server is already using around 80 % of its 16 GB
> Ram to other processes.
> meaning i only have 20% left. I think rabbitMq had a default of 40 % on
> 64X machines.
> wich would mean that on the high load i'm trying. things could go wrong.

Sounds very plausible - we only detect the total physical memory, not
the memory available at any time.

Cheers, Simon
Reply all
Reply to author
Forward
0 new messages