Nodes of a cluster crashing randomly but frequently

278 views
Skip to first unread message

Neeraj Bhargav

unread,
Sep 19, 2019, 6:03:08 PM9/19/19
to rabbitm...@googlegroups.com
Hi Team,

Configuration of cluster:
All the nodes are physical machines, having RabbitMQ version 3.7.9 and Erlang 21.1.1 from  the very first day  when this cluster was created. 
Memory is 260 GB, CPU 48 Cores. 

Issue:
Nodes of this cluster are crashing randomly and frequently. We have got around 4 crashes in last 3 weeks.
From the crash dump, we are seeing errors like following:
Slogan: binary_alloc: Cannot allocate 25165855 bytes of memory (of type "binary").
Slogan: Absurdly large distribution output data buffer (2696898942 bytes) passed
Crash dump is huge and i am not sure how can i share it with you guys. Screenshot of memory tab from one of crash dump which clearly suggests binary_alloc is using memory above 20GB. I have seen this going above 30GBs as well in other crashes.  I can provide additional screenshots if needed.
image.png
We have a good amount of system resources(configuration below) on all the nodes of these machines. We have set monitoring and we don't get any alerts for memory or disk when this happens. Memory utilization remains normal and there seems nothing from system side that could cause this. High watermark is at 40%.
It is not evident from our investigation that the nodes only crashes when we have large traffic coming on this cluster. 

We have other other 3 nodes(physical boxes) cluster with exact similar configuration and have never faced this issue there. We also have other clusters on VM and have not faced this issue there either. 

Additional details:
  • This cluster has been running fine for more than a year without any issues and we have started seeing above issues from last month.
  • All Queues are durable and messages are persistent.
  • It gets traffic mostly from US datacenters and some small quantity from non-US datacenters as well. 
  • All publishers publishes to an additional layer of servers called "Shovels". These servers have shovel plugin installed on it and we have configured static shovels to shovel published these messages to clusters.
  • This cluster has couple of vhosts  and limited amount of queues like 5-10 in each vhost. Out of these queues, there are like 3-4 queues in all vhosts which receives high traffic.
  • These queues receives around 5-6k msgs/sec at peak times and queues become large at that time, like around 5-10 millions. On avg it is 2-3k msgs/sec. 

Any help in this regard will be very helpful. This cluster is one of our most critical cluster and gets very important messages which we can't loose and consitency and durability of these messages is very important

Thanks,
NB

Michael Klishin

unread,
Sep 19, 2019, 7:39:41 PM9/19/19
to rabbitmq-users
The message comes from the runtime. RabbitMQ does not allocate  inter-node communication buffers directly.

Erlang 22 introduced chunking of inter-node messages. 22.1 even has metrics on the state of the buffer exposed to the new Prometheus plugin [3].

Please upgrade [1] to  3.7.18 (which would allow you to use Erlang 22.x, including 22.1) and a supported Erlang version [2].

If the cluster truly is critical to your operations you should not be asking for help on the public mailing list. You should get yourself
a commercial support subscription [4].

Michael Klishin

unread,
Sep 19, 2019, 7:44:23 PM9/19/19
to rabbitmq-users
[1] is one relevant PR I could quickly find. It's started the work that shipped in Erlang 22.

However, there were other changes going back to at least Erlang 17 that gradually reduced the frequency with which
we see this specific error. I can't remember seeing it on 21.3 and 22 (the latter makes sense).


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/5ea55adc-4038-47dc-a16d-698890a9b9bb%40googlegroups.com.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Neeraj Bhargav

unread,
Sep 19, 2019, 9:52:29 PM9/19/19
to rabbitm...@googlegroups.com
Hi Michael,
Thanks for your reply. 
image.png
Regarding above, we have more than 50 clusters running and all the clusters have been pretty stable for a long time. Only this one cluster has started throwing these errors that too for last couple of weeks and it surprises us because we have not changed anything on this cluster since more than a year.

I will go through the options you suggested and also through the commercial support article to understand if that suits our needs.

Also regarding https://github.com/erlang/otp/pull/1569, we looked at it and understood that it talks about a large size message greater than 2 GB which is crashing the erlang. But we monitored our message size and none of the message is even greater than 30 MB(a very very small quantity of message goes above 10MB). 

Michael Klishin

unread,
Sep 19, 2019, 10:03:48 PM9/19/19
to rabbitmq-users
Those are Erlang messages, not RabbitMQ (messaging protocol) messages. A single Erlang message sent from node A to node B can
include multiple messaging protocol messages and other commands. RabbitMQ will reject messages from clients if  they are greater than 2 GB,
and in 3.8, greater than 256 MiB (configurable up to 512 MiB)

In the process of contributing metrics for inter-node communication to  Erlang we have found out some specifics of
how it likely reaches that stage which would be too much inside baseball fort his list.

Chunking in Erlang 22 addresses the problem in a fundamental way [1].

1. http://blog.erlang.org/OTP-22-Highlights/, see "Fragmented Distribution Messages"

Neeraj Bhargav

unread,
Sep 19, 2019, 10:07:27 PM9/19/19
to rabbitm...@googlegroups.com
This is a very useful information for us. Thanks for your response Michael, really appreciated.


Michael Klishin

unread,
Sep 19, 2019, 10:21:46 PM9/19/19
to rabbitmq-users
Glad it was useful.

I'll try to explain roughly how the inter-node "command message" can grow to 2 GB.

 * A RabbitMQ component (Erlang process) such as a channel that has to send a bunch of messages to another node tries to do that
 * The buffer is full, the process is suspended by the runtime
 * It accumulates more messages coming from the connection
 * Now it has even more data to send
 * The cycle repeats itself

Of course in a highly concurrent system you would often see the cycle "break" but nonetheless the amount of unsent
data can grow. With chunking the problem is addressed fundamentally.

Note that you can adjust buffer size, even though it is not exactly small by default :) [1][2]


Reply all
Reply to author
Forward
0 new messages