Hi,
We currently use RabbitMQ for sending messages from a few publishers (Windows services) to many consumer clients running on Windows, Android and iOS. The clients can reply to these messages by publishing their response in a particular response queue (the same queue for all clients).
Our RabbitMQ server uses one virtual host per customer and each virtual host can have thousands of connected clients. The clients are connected to their own durable queue and are ready to consume messages.
Each client is identified with a unique id that is also the username and the name of the queue. Each message is about 1KB and we send (using direct exchange) a few messages per day and client. Each message is in itself unique but often a message is sent out to most of the clients on a host simultaneously. This means we will get a shorter peak when thousands of messages are published within a virtual host. The messages are persisted and the clients acknowledge each message.
The number of connections is more important than throughput, i.e. if it takes a few seconds for a message to be delivered, this is no problem. Right now, we have one single RabbitMQ node with only a few thousand connected clients and this works without any problems. We used the default config and have just set the ERL_MAX_PORTS environment variable to a higher value then default.
The system is designed to cope with 20,000 connected clients, but we have been asked if it is possible to scale the system to handle up to 2,000,000 concurrent connections. 100 times the load the system originally were designed for. In addition to the requirement for the number of concurrent connections, we have also a requirement that we maximum may use three physical servers to accomplish this.
We have therefore started to run tests on our system to determine whether this is possible and based on the tests, we can see that we already seem be close to the ceiling of what we can handle.
We have run the test on a single RabbitMQ node. In our production system we are also using SSL when communicating between the server and the clients. In our test system we haven't used SSL.
Up to 30,000 connected clients (and as many queues) the systems runs smoothly but at around 35,000 clients we seem to reach the roof and we starting to get this message: "The Management statistics database currently has a queue of XXXX events to process" in the RabbitMQ Management Interface.
Around 40,000 concurrent connections and queues, the RabbitMQ server basically stops and we start getting timeouts.
The physical server seems, however, not overloaded. We have lots of available memory; the processor runs at about 50% with an occasional PEK up to 80%. We also have no disk queues and we have plenty of spare bandwidth between the server and the clients.
The server we have run our tests on is a virtual machine (the host machine is a lot bigger) that is set up as follows:
Windows Server 2012 R2 Standard
AMD Opteron(TM) Processor 6238 2,60GHz (2 processors)
Virtual Processors: 16
Installed memory: 30,0 GB
We have tried to tweak the default rabbitmq.config and added this:
[
{rabbit, [{tcp_listen_options, [
{backlog, 100000},
{nodelay, true},
{sndbuf, 32768},
{recbuf, 32768}
]},
{handshake_timeout, 60000},
{channel_max, 0},
{vm_memory_high_watermark_paging_ratio, 0.9},
{vm_memory_high_watermark, 0.8},
{disk_free_limit, "50GB"},
{collect_statistics_interval, 20000}
]},
{rabbitmq_management,
[ {rates_mode, none}
]},
All other values are copies of the default config.
This is the status off the server when it basically has stopped responding:
C:\Program Files\RabbitMQ Server\rabbitmq_server-3.6.0\sbin>rabbitmqctl status
Status of node 'rabbit@WIN-AIE5K0LAFPP' ...
[{pid,5100},
{running_applications,
[{rabbitmq_management,"RabbitMQ Management Console","3.6.0"},
{rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.0"},
{rabbit,"RabbitMQ","3.6.0"},
{mnesia,"MNESIA CXC 138 12","4.13.2"},
{rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.0"},
{webmachine,"webmachine","git"},
{mochiweb,"MochiMedia Web Server","2.13.0"},
{compiler,"ERTS CXC 138 10","6.0.2"},
{ssl,"Erlang/OTP SSL application","7.2"},
{public_key,"Public key infrastructure","1.1"},
{crypto,"CRYPTO","3.6.2"},
{ranch,"Socket acceptor pool for TCP protocols.","1.2.1"},
{amqp_client,"RabbitMQ AMQP Client","3.6.0"},
{rabbit_common,[],"3.6.0"},
{os_mon,"CPO CXC 138 46","2.4"},
{syntax_tools,"Syntax tools","1.7"},
{asn1,"The Erlang ASN1 compiler version 4.0.1","4.0.1"},
{xmerl,"XML parser","1.3.9"},
{inets,"INETS CXC 138 49","6.1"},
{sasl,"SASL CXC 138 11","2.6.1"},
{stdlib,"ERTS CXC 138 10","2.7"},
{kernel,"ERTS CXC 138 10","4.1.1"}]},
{os,{win32,nt}},
{erlang_version,
"Erlang/OTP 18 [erts-7.2.1] [64-bit] [smp:16:16] [async-threads:128]\n"},
{memory,
[{total,7391943376},
{connection_readers,1174696816},
{connection_writers,117726272},
{connection_channels,311297792},
{connection_other,1487932728},
{queue_procs,1229591448},
{queue_slave_procs,0},
{plugins,234357544},
{other_proc,91444128},
{mnesia,78161776},
{mgmt_db,820079392},
{msg_index,34847432},
{other_ets,80839064},
{binary,1489783648},
{code,27369945},
{atom,992409},
{other_system,212822982}]},
{alarms,[]},
{listeners,[{clustering,25672,"::"},{amqp,5672,"::"},{amqp,5672,"0.0.0.0"}]},
{vm_memory_high_watermark,0.8},
{vm_memory_limit,25769426944},
{disk_free_limit,50000000000},
{disk_free,122271981568},
{file_descriptors,
[{total_limit,524188},
{total_used,59403},
{sockets_limit,471767},
{sockets_used,43570}]},
{processes,[{limit,16777216},{used,565644}]},
{run_queue,0},
{uptime,1285},
{kernel,{net_ticktime,60}}]
C:\Program Files\RabbitMQ Server\rabbitmq_server-3.6.0\sbin>
Now over to my questions:
1) Is it possible to accomplish this with RabbitMQ, is there anyone who attempted something similar before?
2) If possible, how should we design the hardware to cope with the load and still meet the requirement of a maximum of three servers?
3) Can we do anything in the configuration / setting to increase performance?
4) Should we switch OS from Windows Server to Linux?
5) Any other considerations in this use case?
Any help is much appreciated!
Best regards,
Jan
Thanks for the fast reply.
Yes, I realize that we have to have a lot of memory if we are to meet the 2 000 0000 connection requirement, but that in itself is no problem.
Right now we do not handle more than 30,000 connections before RabbitMQ crash and then we still have a lot of available memory left so obviously we do something wrong?
Best Regards,
Jan
I have read the doc guide (several times) but obviously missed something…
I have redone the test and attached the log-files from RabbitMQ. There is nothing strange in Windows event log.
Anything else I can provide?
Best Regards,
Jan
Yes, if you read my question I specified that we run our RabbitMQ Server on Windows Server 2012 and that was why I also asked if it was worth changing OS ...
Perhaps there is a limit, that RabbitMQ cannot handle high loads on Windows. Is there anyone other than Michael who has suggestions on possible solutions to improve the performance (when running on Windows)?
Any help is much appreciated!
Best regards,
Jan
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send an email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I did as Michael suggested and switched to Ubuntu Server 15.10. My configuration is basically the same (I lowered sndbuf and recbuf to 16348) but unfortunately, I get the same behavior.
Somewhere around 40,000 to 45,000 concurrent connections, RabbitMQ starts having problems and work more slowly. As before, there are a lot of memory left, the CPU (16 virtual processors) operating at around 60-70% and there are no disk queues.
The node is not completely dead, it only runs very very slowly, a response from ”rabbitmqctl status” takes over 5 minutes and the clients get timeouts when trying to connect to the server. The RabbitMQ management plugin is basically dead and has several hundred thousand events in queue. There are no errors in the logs.
To post to this group, send email to rabbitm...@googlegroups.com.
Thank you, we haven’t seen anything that indicates that it is the Erlang VM that runs out of resources, but it's worth a try. I will try tuning the Erlang VM with the parameters that you suggested and give Erlang some additional resources and see if it changes the test result.
Best Regards,
Jan
Randall Thank you for clarifying, I missed that a proxy can be used in this way. This is something I will try immediately. Provided that it is really the number of connections that are the problem for RabbitMQ and not something else, this would then be able to solve the problem for us.
Once again, thank you.
By the way have anyone run this much load on a RabbitMQ node install on a virtual server hosted in Hyper-V?
Best Regards,
Jan
Randall, just to be clear; is this the proxy you mentioned that you used: https://github.com/vert-x3/vertx-service-proxy
If it was, does this means that you wrote a proxy yourself that received the client connections and opened channels to the RabbitMQ node?
Do you know if there are any "smart proxy" that supports this i.e. multiple client connections => multiple channels with few connections to a RabbitMQ node out of the box so to speak or is this something that we have to write ourselves?
Best Regards,
Jan
Do you know if there are any "smart proxy" that supports this i.e. multiple client connections => multiple channels with few connections to a RabbitMQ node out of the box so to speak or is this something that we have to write ourselves?
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.