RabbitMQ with 2,000,000 concurrent connections

11,519 views
Skip to first unread message

Jan Granstam

unread,
Feb 19, 2016, 7:32:01 AM2/19/16
to rabbitmq-users

Hi,

We currently use RabbitMQ for sending messages from a few publishers (Windows services) to many consumer clients running on Windows, Android and iOS. The clients can reply to these messages by publishing their response in a particular response queue (the same queue for all clients).

Our RabbitMQ server uses one virtual host per customer and each virtual host can have thousands of connected clients. The clients are connected to their own durable queue and are ready to consume messages.

Each client is identified with a unique id that is also the username and the name of the queue. Each message is about 1KB and we send (using direct exchange) a few messages per day and client. Each message is in itself unique but often a message is sent out to most of the clients on a host simultaneously. This means we will get a shorter peak when thousands of messages are published within a virtual host. The messages are persisted and the clients acknowledge each message.

 

The number of connections is more important than throughput, i.e. if it takes a few seconds for a message to be delivered, this is no problem. Right now, we have one single RabbitMQ node with only a few thousand connected clients and this works without any problems. We used the default config and have just set the ERL_MAX_PORTS environment variable to a higher value then default.

 

The system is designed to cope with 20,000 connected clients, but we have been asked if it is possible to scale the system to handle up to 2,000,000 concurrent connections. 100 times the load the system originally were designed for. In addition to the requirement for the number of concurrent connections, we have also a requirement that we maximum may use three physical servers to accomplish this.

 

We have therefore started to run tests on our system to determine whether this is possible and based on the tests, we can see that we already seem be close to the ceiling of what we can handle.

We have run the test on a single RabbitMQ node. In our production system we are also using SSL when communicating between the server and the clients. In our test system we haven't used SSL.

 

Up to 30,000 connected clients (and as many queues) the systems runs smoothly but at around 35,000 clients we seem to reach the roof and we starting to get this message: "The Management statistics database currently has a queue of XXXX events to process" in the RabbitMQ Management Interface.

Around 40,000 concurrent connections and queues, the RabbitMQ server basically stops and we start getting timeouts.

 

The physical server seems, however, not overloaded. We have lots of available memory; the processor runs at about 50% with an occasional PEK up to 80%. We also have no disk queues and we have plenty of spare bandwidth between the server and the clients.

 

The server we have run our tests on is a virtual machine (the host machine is a lot bigger) that is set up as follows:

Windows Server 2012 R2 Standard

AMD Opteron(TM) Processor 6238 2,60GHz (2 processors)

Virtual Processors: 16

Installed memory: 30,0 GB 

 

We have tried to tweak the default rabbitmq.config and added this:

 

[

 {rabbit, [{tcp_listen_options, [

                        {backlog,   100000},

                        {nodelay,   true}, 

                        {sndbuf,    32768},

                        {recbuf,    32768}

                       ]},

   {handshake_timeout, 60000},

   {channel_max, 0},

   {vm_memory_high_watermark_paging_ratio, 0.9},

   {vm_memory_high_watermark, 0.8},

   {disk_free_limit, "50GB"},

   {collect_statistics_interval, 20000}

  ]},

{rabbitmq_management,

  [ {rates_mode, none}

 ]},

 

All other values are copies of the default config.

 

This is the status off the server when it basically has stopped responding:

 

C:\Program Files\RabbitMQ Server\rabbitmq_server-3.6.0\sbin>rabbitmqctl status

Status of node 'rabbit@WIN-AIE5K0LAFPP' ...

[{pid,5100},

 {running_applications,

     [{rabbitmq_management,"RabbitMQ Management Console","3.6.0"},

      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.0"},

      {rabbit,"RabbitMQ","3.6.0"},

      {mnesia,"MNESIA  CXC 138 12","4.13.2"},

      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.0"},

      {webmachine,"webmachine","git"},

      {mochiweb,"MochiMedia Web Server","2.13.0"},

      {compiler,"ERTS  CXC 138 10","6.0.2"},

      {ssl,"Erlang/OTP SSL application","7.2"},

      {public_key,"Public key infrastructure","1.1"},

      {crypto,"CRYPTO","3.6.2"},

      {ranch,"Socket acceptor pool for TCP protocols.","1.2.1"},

      {amqp_client,"RabbitMQ AMQP Client","3.6.0"},

      {rabbit_common,[],"3.6.0"},

      {os_mon,"CPO  CXC 138 46","2.4"},

      {syntax_tools,"Syntax tools","1.7"},

      {asn1,"The Erlang ASN1 compiler version 4.0.1","4.0.1"},

      {xmerl,"XML parser","1.3.9"},

      {inets,"INETS  CXC 138 49","6.1"},

      {sasl,"SASL  CXC 138 11","2.6.1"},

      {stdlib,"ERTS  CXC 138 10","2.7"},

      {kernel,"ERTS  CXC 138 10","4.1.1"}]},

 {os,{win32,nt}},

 {erlang_version,

     "Erlang/OTP 18 [erts-7.2.1] [64-bit] [smp:16:16] [async-threads:128]\n"},

 {memory,

     [{total,7391943376},

      {connection_readers,1174696816},

      {connection_writers,117726272},

      {connection_channels,311297792},

      {connection_other,1487932728},

      {queue_procs,1229591448},

      {queue_slave_procs,0},

      {plugins,234357544},

      {other_proc,91444128},

      {mnesia,78161776},

      {mgmt_db,820079392},

      {msg_index,34847432},

      {other_ets,80839064},

      {binary,1489783648},

      {code,27369945},

      {atom,992409},

      {other_system,212822982}]},

 {alarms,[]},

 {listeners,[{clustering,25672,"::"},{amqp,5672,"::"},{amqp,5672,"0.0.0.0"}]},

 {vm_memory_high_watermark,0.8},

 {vm_memory_limit,25769426944},

 {disk_free_limit,50000000000},

 {disk_free,122271981568},

 {file_descriptors,

     [{total_limit,524188},

      {total_used,59403},

      {sockets_limit,471767},

      {sockets_used,43570}]},

 {processes,[{limit,16777216},{used,565644}]},

 {run_queue,0},

 {uptime,1285},

 {kernel,{net_ticktime,60}}]

 

C:\Program Files\RabbitMQ Server\rabbitmq_server-3.6.0\sbin>

 

Now over to my questions:

1) Is it possible to accomplish this with RabbitMQ, is there anyone who attempted something similar before?

2) If possible, how should we design the hardware to cope with the load and still meet the requirement of a maximum of three servers?

3) Can we do anything in the configuration / setting to increase performance?

4) Should we switch OS from Windows Server to Linux?

5) Any other considerations in this use case?

 

Any help is much appreciated!

Best regards,

Jan

Michael Klishin

unread,
Feb 19, 2016, 7:36:52 AM2/19/16
to rabbitm...@googlegroups.com, Jan Granstam
On 19 February 2016 at 15:32:04, Jan Granstam (enera.int...@gmail.com) wrote:
> The system is designed to cope with 20,000 connected clients,
> but we have been asked if it is possible to scale the system to handle
> up to 2,000,000 concurrent connections. 100 times the load the
> system originally were designed for. In addition to the requirement
> for the number of concurrent connections, we have also a requirement
> that we maximum may use three physical servers to accomplish
> this

I believe this is largely answered by http://rabbitmq.com/networking.html. Yes, you can do that,
but it will take effort and experimentation with multiple RabbitMQ and OS settings.

Those 3 nodes will have to have quite a bit of RAM (or you'd have to shrink TCP buffer
size quite a bit), so perhaps distributing connections between nodes reasonably evenly using
a proxy such as HAproxy to more less powerful nodes is an easier way. 
--
MK

Staff Software Engineer, Pivotal/RabbitMQ


Jan Granstam

unread,
Feb 19, 2016, 7:55:07 AM2/19/16
to rabbitmq-users, enera.int...@gmail.com

Thanks for the fast reply.

Yes, I realize that we have to have a lot of memory if we are to meet the 2 000 0000 connection requirement, but that in itself is no problem.

Right now we do not handle more than 30,000 connections before RabbitMQ crash and then we still have a lot of available memory left so obviously we do something wrong?

Best Regards,

Jan

Michael Klishin

unread,
Feb 19, 2016, 7:57:30 AM2/19/16
to rabbitm...@googlegroups.com, Jan Granstam
On 19 February 2016 at 15:55:11, Jan Granstam (enera.int...@gmail.com) wrote:
> Right now we do not handle more than 30,000 connections before
> RabbitMQ crash and then we still have a lot of available memory
> left so obviously we do something wrong?

1. Read the doc guide. It really does explain every important knob that can be tweaked.
2. If you expect a response to "before RabbitMQ crash", consider posting your log files,
    syslog messages, and so on. 

Jan Granstam

unread,
Feb 19, 2016, 8:47:46 AM2/19/16
to rabbitmq-users, enera.int...@gmail.com

I have read the doc guide (several times) but obviously missed something…


I have redone the test and attached the log-files from RabbitMQ. There is nothing strange in Windows event log.


Anything else I can provide?


Best Regards,

Jan

rabbit@WIN-AIE5K0LAFPP.zip
test_server.png

Michael Klishin

unread,
Feb 19, 2016, 8:54:28 AM2/19/16
to rabbitm...@googlegroups.com, Jan Granstam
On 19 February 2016 at 16:47:53, Jan Granstam (enera.int...@gmail.com) wrote:
> I have redone the test and attached the log-files from RabbitMQ.
> There is nothing strange in Windows event log.

Oh, you're trying to support 2M concurrent connections on Windows. That's a road much less traveled. 

> Anything else I can provide?

The only unhandled exception in the logs that I could find is
https://github.com/rabbitmq/rabbitmq-server/issues/530, which is harmless (it simply
means a socket was closed before all channels were shut down) and is fixed in 3.6.1.

So if RabbitMQ VM terminates, it is entirely unaware of it. The most common reason
for this on Windows is people running 32 bit Erlang but your screenshot shows your
node uses 5 GB of RAM, so on 32 bit it would be long killed by the OS.

I can't suggest anything from these log files.

Jan Granstam

unread,
Feb 19, 2016, 9:09:11 AM2/19/16
to rabbitmq-users, enera.int...@gmail.com

Yes, if you read my question I specified that we run our RabbitMQ Server on Windows Server 2012 and that was why I also asked if it was worth changing OS ...

Perhaps there is a limit, that RabbitMQ cannot handle high loads on Windows. Is there anyone other than Michael who has suggestions on possible solutions to improve the performance (when running on Windows)?


Any help is much appreciated!

Best regards,

Jan

Michael Klishin

unread,
Feb 19, 2016, 9:18:59 AM2/19/16
to rabbitm...@googlegroups.com, Jan Granstam
On 19 February 2016 at 17:09:14, Jan Granstam (enera.int...@gmail.com) wrote:
> Yes, if you read my question I specified that we run our RabbitMQ
> Server on Windows Server 2012 and that was why I also asked if it
> was worth changing OS

OK, my bad. Perhaps folks on erlang-questions can help find what might be bringing the VM down.

It's worth switching to Linux for one reason: a lot more people run systems with similar workloads
on it. You'd have a much easier time finding information (and hiring folks who've done it before)
on this subject if you deploy to Linux or *BSD.

We certainly know of teams that managed to reach more than 500K connections per node on Linux
(or maybe more, their stated goal was 500K per node).

The runtime can do over 2M connections per node — WhatsApp does that — and while that
doesn't at all mean that RabbitMQ can or that it's gonna be easy (WhatsApp has engineers
who improved the runtime for their specific needs to get to that kind of numbers), it's
a proof worth a dozen of scalability consultants.

You can find more about what they had to do (again, they use Erlang without RabbitMQ
but many points are just as relevant), circa 2012:

http://blog.whatsapp.com/196/1-million-is-so-2011?
http://www.erlang-factory.com/upload/presentations/558/efsf2012-whatsapp-scaling.pdf

Randall Richard

unread,
Feb 19, 2016, 12:00:51 PM2/19/16
to rabbitmq-users, Jan Granstam
FWIW, an alternative for consideration is to have a proxy server that's suited for the high number of connections.  I used Vertx to handle web facing connections that are proxied to a RabbitMQ connection via channels.

-Randall

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send an email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jan Granstam

unread,
Feb 23, 2016, 1:46:41 AM2/23/16
to rabbitmq-users, enera.int...@gmail.com
Thanks Randall, we will probably use some kind of proxy solution, assuming that we can get a higher number of concurrent connections than we can today.

 

I did as Michael suggested and switched to Ubuntu Server 15.10. My configuration is basically the same (I lowered sndbuf and recbuf to 16348) but unfortunately, I get the same behavior. 

Somewhere around 40,000 to 45,000 concurrent connections, RabbitMQ starts having problems and work more slowly. As before, there are a lot of memory left, the CPU (16 virtual processors) operating at around 60-70% and there are no disk queues.

The node is not completely dead, it only runs very very slowly, a response from ”rabbitmqctl status” takes over 5 minutes and the clients get timeouts when trying to connect to the server. The RabbitMQ management plugin is basically dead and has several hundred thousand events in queue. There are no errors in the logs.


These teams Michael states have managed over 500K concurrent connections on a single node, have they described their work anywhere? Which version of RabbitMQ and Erlang did they use? How did their configuration look like? 
Best Regards,
Jan

dfed...@pivotal.io

unread,
Feb 23, 2016, 5:03:13 AM2/23/16
to rabbitmq-users, enera.int...@gmail.com
You could hit some erlang VM limits. You can try tuning +A, +P and +s... flags described here: http://erlang.org/doc/man/erl.html They can be passed to rabbitmq VM using RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS

Randall Richard

unread,
Feb 23, 2016, 8:23:37 AM2/23/16
to rabbitmq-users, Jan Granstam
Not sure it's clear in my last post, but just to clarify, the proxy solution (among other advantages) enables you to use a much lower number of connections on the RabbitMQ side by using multiple channels on a single connection.  

Channels are very lightweight and you can create a lot per connection.  So you'd be able to reduce your RabbitMQ connection requirement significantly (i.e. in the 100 to 1000 channels per connection range).

Just wanted to make sure you're aware of that.

-Randall

To post to this group, send email to rabbitm...@googlegroups.com.

Jan Granstam

unread,
Feb 24, 2016, 1:32:10 AM2/24/16
to rabbitmq-users, enera.int...@gmail.com

Thank you, we haven’t seen anything that indicates that it is the Erlang VM that runs out of resources, but it's worth a try. I will try tuning the Erlang VM with the parameters that you suggested and give Erlang some additional resources and see if it changes the test result.

Best Regards,

Jan

Jan Granstam

unread,
Feb 24, 2016, 1:34:48 AM2/24/16
to rabbitmq-users, enera.int...@gmail.com

Randall Thank you for clarifying, I missed that a proxy can be used in this way. This is something I will try immediately. Provided that it is really the number of connections that are the problem for RabbitMQ and not something else, this would then be able to solve the problem for us.

Once again, thank you.

 

By the way have anyone run this much load on a RabbitMQ node install on a virtual server hosted in Hyper-V?

 

Best Regards,

Jan

Jan Granstam

unread,
Feb 24, 2016, 9:15:51 AM2/24/16
to rabbitmq-users, enera.int...@gmail.com

Randall, just to be clear; is this the proxy you mentioned that you used: https://github.com/vert-x3/vertx-service-proxy

If it was, does this means that you wrote a proxy yourself that received the client connections and opened channels to the RabbitMQ node?


Do you know if there are any "smart proxy" that supports this i.e. multiple client connections => multiple channels with few connections to a RabbitMQ node out of the box so to speak or is this something that we have to write ourselves?


Best Regards,

Jan

 

Do you know if there are any "smart proxy" that supports this i.e. multiple client connections => multiple channels with few connections to a RabbitMQ node out of the box so to speak or is this something that we have to write ourselves?

Randall Richard

unread,
Feb 24, 2016, 5:24:04 PM2/24/16
to rabbitmq-users, Jan Granstam
I wrote my own Vertx based AmqpProxy server -- proxy servers are pretty straight-forward to implement in vertx.  I'm not aware of an out-of-the-box solution, but there may be something out there. 

-Randall

Fred Wang

unread,
Aug 8, 2016, 2:23:34 AM8/8/16
to rabbitmq-users, enera.int...@gmail.com
Sounds awesome!!!  Do you have your Vertx based AmqpProxy server source opened?

在 2016年2月25日星期四 UTC+8上午6:24:04,Randall Richard写道:

gigi paul

unread,
Jan 26, 2018, 3:36:36 PM1/26/18
to rabbitmq-users
Hi Jan, have you ever resolved this ? I have similar situation, more than 10,000 active connections, and running the RabbitMq on a cloud VM. The performance is getting degraded, when more connections. So I am looking at options to scale, to accept more connections. If you could share your learning/solutions, it would be very helpful and much approciated!

Thanks
Gigi

Michael Klishin

unread,
Jan 26, 2018, 11:16:56 PM1/26/18
to rabbitm...@googlegroups.com
Please start new threads for new questions.

There are existing threads about sustaining a large number of connections in this list's archives
and a dedicated section in the docs:
https://www.rabbitmq.com/networking.html#tuning-for-large-number-of-connections.

Consider providing more specifics than "the performance is getting degraded", too.


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Reply all
Reply to author
Forward
0 new messages