RabbitMQ persistence puts publishers in flow until they die

262 views
Skip to first unread message

Chad Greenburg

unread,
Jan 16, 2017, 1:09:10 PM1/16/17
to rabbitmq-users
Server: 
- AWS EC2 shared r4.2xlarge
- Ubuntu 14.04.5
- 8 vCPU
- 61GB of memory

Software:
- Erlang 18.3.4
- RabbitMQ 3.6.6
- vm_memory_high_watermark = 0.85 (51.85GB)
- vm_memory_high_watermark_paging_ratio = 0.75 (38.8875GB)

I'm using the RabbitMQ .NET client. I've tried two versions of the client: 3.6.6 & 4.1.1. I'm only publishing messages to the server, there are no consumers. Only 1 queue exists on the server, set to durable. All messages published to the server are 18,180 bytes, consisting of a pointless string ("lalalal...etc").

The problem:
Once the server reaches the vm_memory_high_watermark_paging_ratio setting (~39GB), the server puts all client publisher connections into flow state, which is expected. The server starts persisting messages to disk. Memory spikes to 45GB. The server persists about 500 messages/s for ~5-10 seconds (according to the management plugin) and then stops persisting. All client publisher connections remain in flow state after message persistence is done. During flow state, the publishers are still sending messages to the server even though the management plugin shows all connections in flow with a publishing rate of 0/s. All publishers die. I tested this about 20 minutes ago and the server is still doing something with the memory because it continues to fluctuate between 39-45GB of memory every 30 seconds even though there are no connections.

The following are pictures from the management plugin during this whole event. At 11:45, all publishers are put into flow state halting message rates, persistence kicks in for a rough 10 seconds or so and stops, the memory spikes to 45GB, and some time after that (15 seconds maybe) the publishers all die.


I'd like to point out that I can run this application safely on a RabbitMQ server with version 3.5.4 and everything works fine. Any version after that release, though, this exact situation happens. If there is any more information you need, feel free to ask for it.

Message has been deleted

Chad Greenburg

unread,
Jan 16, 2017, 1:16:33 PM1/16/17
to rabbitmq-users
Ignore the drop and spike in the message rates picture before 11:40, as I shut down the apps and restarted them and they spiked in speeds for a little bit.

Chad Greenburg

unread,
Jan 16, 2017, 1:21:38 PM1/16/17
to rabbitmq-users
The exception the publishers get are:

Unhandled Exception: RabbitMQ.Client.Exceptions.AlreadyClosedException: Already closed: The AMQP operation was interrupted: AMQP close-reason, initiated by Library, code=0, text="End of stream", classId=0, methodId=0, cause=
System.IO.EndOfStreamException: Unable to read beyond the end of the stream.
   at RabbitMQ.Client.Impl.Frame.ReadFrom(NetworkBinaryReader reader)
   at RabbitMQ.Client.Impl.SocketFrameHandler.ReadFrame()
   at RabbitMQ.Client.Framing.Impl.Connection.MainLoopIteration()
   at RabbitMQ.Client.Framing.Impl.Connection.MainLoop()
   at RabbitMQ.Client.Impl.SessionBase.Transmit(Command cmd)
   at RabbitMQ.Client.Impl.ModelBase.ModelSend(MethodBase method, ContentHeaderBase header, Byte[] body)
   at RabbitMQ.Client.Framing.Impl.Model._Private_BasicPublish(String exchange, String routingKey, Boolean mandatory, IBasicProperties basicProperties, Byte[] body)
   at RabbitMQ.Client.Impl.ModelBase.BasicPublish(String exchange, String routingKey, Boolean mandatory, IBasicProperties basicProperties, Byte[] body)
   at RabbitMQTest.Program.Main(String[] args) in C:\Source\tests\RabbitMQTest\RabbitMQTest\Program.cs:line 235

Michael Klishin

unread,
Jan 16, 2017, 1:27:32 PM1/16/17
to rabbitm...@googlegroups.com
The message means that a connection tried to read from a TCP socket and got an EOF.
That's because when publishers are blocked, RabbitMQ does not read from the socket
and (in this case) does not send anything until alarms clear.

If you have publishers that constantly outpace consumers, you have 4 options listed
http://www.rabbitmq.com/lazy-queues.html can help prolong the inevitable but at some point
something will have to consume messages or mark them for deletion (e.g. TTL).

If you want to store data without consumers on an ongoing basis, use a data store (or Kafka, which is a messaging
broker and data store hybrid in a way).


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Chad Greenburg

unread,
Jan 17, 2017, 3:37:07 PM1/17/17
to rabbitmq-users
Okay. so I decided to run my production system against version 3.6.6 of RabbitMQ and the persistence/memory stuff if acting quite strange.

As soon as the limit is reached, memory goes on a roller coaster ride spiking up to 51GB, then back down to 45, then up to 54, then back down. After around 7 minutes of going back and forth over the limit and under the limit, the server finally went back down to 34GB and my producers could continue on as normal.



I must be missing a setting that needs to be updated to account for the newest version because I've had the software in production on version 3.5.4 for shortly over a year and once the memory threshold is reached the server persists messages to disk to free up memory as it should while putting publishers into flow mode and then continues on until memory hits the limit again.

Any advice or help would be great. Thanks.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Chad Greenburg

unread,
Jan 17, 2017, 3:40:50 PM1/17/17
to rabbitmq-users
In the time it took me to write this message, the limit has been reached again and the memory is spiking up and down between 45 and 55GB. It did one large write to the store in the beginning as shown in the pictures above and then does small writes every few seconds.

Michael Klishin

unread,
Jan 17, 2017, 3:47:12 PM1/17/17
to rabbitm...@googlegroups.com
There is one known problem that causes really excessive GC runs. It was addressed for 3.6.7.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages