Performance problem with RabbitMQ Management Plugin with a large number of concurrent connections

614 views
Skip to first unread message

Kim Nygren

unread,
Dec 1, 2017, 2:53:30 AM12/1/17
to rabbitmq-users

We have a system with a lot of concurrent connected devices (one queue and user per device), but only a few messages is sent per device/day. Average about 5 message per device and day with peak values around 50 messages per hour per device.

Depending on which customer the devices belongs to, the devices is connected to different virtual hosts, some host with 10 000 devices and some with only a few.

We are running this on a single RabbitMQ server node (and yes we are going to run this as a cluster in the future but we aren't doing it right now).

We are using the RabbitMQ Management plugin mainly for monitoring and for creating/deleting users when new devices is added/removed from the system, so we can't disable the RabbiMQ Managment Plugin without having to develop something else.


The test we runned was to figure out how many concurrent connections we could run on on single RabbitMQ node. All the test-devices did connect to the same virtual host.

The server is a quite large virtual machine (Hyper-V) with 46GB memory and 16 cores running at 2.00 GHz see screen shoot. During the test it didn't seems to be any resource problem on the machine. There was a lot of memory and

CPU left. The machine were responsive and there was no indication of problems. 




Upto around 40 000 concurrent connections there wasn't any problem at all.
The server were getting slower and slower around 50 000 connection and around 60000 we were getting serious problem and the clients start losing their connections and had to reconnect. The reconnect often timed out during the connection attempt.

In the the RabbitMQ server logs we were getting a lot of:
ERROR REPORT==== 29-Nov-2017::09:48:10 ===
closing AMQP connection <0.13752.52> (192.168.2.112:5287 -> 192.168.1.15:5672):
{handshake_timeout,frame_header}

The whole RabbitMQ Server were getting slower and slower, running rabbitmqctl status could take 1-2 min.

We are now running the server with around with about 50 000 connection to see if there is any problem in the long run.


We also did the same test without running the Management plugin and then we were getting at least 50% more connections before we get the same problems, but that is not an option right now.

To reach the number of connections we got, we did some changes in rabbitmq.config for instance tcp_listen_options, handshake_timout and collect_statistics_interval, see rabbitmq.config below.

We have alse added some environment variables:
ERL_MAX_PORTS = 1000000
RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS = +A 256
RABBITMQ_SERVER_ERL_ARGS = +P 2097152

We suspect that we could get higher performance in Linux but that also isn't an option right now.


Two questions:

1) Do you see anything obvious wrong with our configuration/setup?
2) Is there anything else we can try to better use the hardware?


Looking forward to hear from the community.
Best regards,
Kim Nygren




rabbitmq.config
[
  {mnesia,
 [
   {dump_log_write_threshold, 10000},
   {dc_dump_limit, 40}
    ]
  },
  {rabbit,
    [
   {tcp_listen_options, [
          {backlog,    16384},
          {nodelay,    true},
          {sndbuf,     32768},
          {recbuf,     32768}
        ]},
      {handshake_timout,    30000},
      {vm_memory_high_watermark, 0.7},
      {vm_memory_high_watermark_paging_ratio, 0.9},
      {collect_statistics_interval, 90000},
      {log_levels,[{connection, error}]},
      {tcp_listeners, [5672]},
      {ssl_listeners, [5671]},
      {num_ssl_acceptors, 10},
      {ssl_handshake_timeout, 30000},
      {ssl_options,
        [
          {cacertfile,"./SSL/*****.pem"},
          {certfile,"./SSL/*****.pem"},
          {keyfile,"./SSL/*****.pem"},
          {verify,verify_none},
          {fail_if_no_peer_cert,false},
          {ciphers,
            [
              "ECDHE-RSA-AES256-SHA384",
              "ECDHE-RSA-AES256-SHA",
              "ECDHE-RSA-AES128-SHA256",
              "ECDHE-RSA-AES128-SHA",
              "AES256-SHA256",
              "AES256-SHA",
              "AES128-SHA256",
              "AES128-SHA",
              "DES-CBC3-SHA"
            ]
          },
          {honor_cipher_order, true},
          {password,"*****"}
        ]
      }
    ]
  },
  {rabbitmq_management,
    [
      {listener,
        [
          {port,15672},
          {ssl, false},
          {ssl_opts,
            [
              {cacertfile,"./SSL/*****.pem"},
              {certfile,"./SSL/*****.pem"},
              {keyfile,"./SSL/*****.pem"},
              {password,"*****"}
            ]
          }
        ]
      }
    ]
  }
].


C:\Program Files\RabbitMQ Server\rabbitmq_server-3.6.14\sbin>rabbitmqctl status
Status of node 'rabbit@RMQ-Test-02'
[{pid,1744},
 {running_applications,
     [{rabbitmq_management,"RabbitMQ Management Console","3.6.14"},
      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.14"},
      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.14"},
      {rabbit,"RabbitMQ","3.6.14"},
      {os_mon,"CPO  CXC 138 46","2.4.2"},
      {amqp_client,"RabbitMQ AMQP Client","3.6.14"},
      {rabbit_common,
          "Modules shared by rabbitmq-server and rabbitmq-erlang-client",
          "3.6.14"},
      {syntax_tools,"Syntax tools","2.1.1"},
      {cowboy,"Small, fast, modular HTTP server.","1.0.4"},
      {ranch,"Socket acceptor pool for TCP protocols.","1.3.0"},
      {ssl,"Erlang/OTP SSL application","8.1.1"},
      {public_key,"Public key infrastructure","1.4"},
      {cowlib,"Support library for manipulating Web protocols.","1.0.2"},
      {crypto,"CRYPTO","3.7.3"},
      {inets,"INETS  CXC 138 49","6.3.6"},
      {compiler,"ERTS  CXC 138 10","7.0.4"},
      {xmerl,"XML parser","1.3.13"},
      {asn1,"The Erlang ASN1 compiler version 4.0.4","4.0.4"},
      {recon,"Diagnostic tools for production use","2.3.2"},
      {mnesia,"MNESIA  CXC 138 12","4.14.3"},
      {sasl,"SASL  CXC 138 11","3.0.3"},
      {stdlib,"ERTS  CXC 138 10","3.3"},
      {kernel,"ERTS  CXC 138 10","5.2"}]},
 {os,{win32,nt}},
 {erlang_version,
     "Erlang/OTP 19 [erts-8.3] [64-bit] [smp:16:16] [async-threads:256]\n"},
 {memory,
     [{connection_readers,1345624880},
      {connection_writers,73039488},
      {connection_channels,302561128},
      {connection_other,3188278504},
      {queue_procs,1817899784},
      {queue_slave_procs,0},
      {plugins,3510367672},
      {other_proc,4331879248},
      {metrics,490333456},
      {mgmt_db,3260583712},
      {mnesia,357038704},
      {other_ets,136962552},
      {binary,2130599936},
      {msg_index,81372872},
      {code,24992450},
      {atom,1033401},
      {other_system,286947525},
      {allocated_unused,1994380880},
      {reserved_unallocated,0},
      {total,23134666752}]},
 {alarms,[]},
 {listeners,
     [{clustering,25672,"::"},
      {amqp,5672,"::"},
      {amqp,5672,"0.0.0.0"},
      {'amqp/ssl',5671,"::"},
      {'amqp/ssl',5671,"0.0.0.0"},
      {http,15672,"::"},
      {http,15672,"0.0.0.0"}]},
 {vm_memory_calculation_strategy,rss},
 {vm_memory_high_watermark,0.7},
 {vm_memory_limit,34574157004},
 {disk_free_limit,50000000},
 {disk_free,77834797056},
 {file_descriptors,
     [{total_limit,1048476},
      {total_used,62110},
      {sockets_limit,943626},
      {sockets_used,49983}]},
 {processes,[{limit,2097152},{used,802579}]},
 {run_queue,0},
 {uptime,80152},
 {kernel,{net_ticktime,60}}]

C:\Program Files\RabbitMQ Server\rabbitmq_server-3.6.14\sbin>

Luke Bakken

unread,
Dec 1, 2017, 10:56:11 AM12/1/17
to rabbitmq-users
Hi Kim -

When connection count reached 50,000 - 60,000 you say the server is getting "slower and slower". Is CPU utilization at maximum at this time? What is memory consumption like?

You may wish to add +zdbbl 32000 to RABBITMQ_SERVER_ERL_ARGS.

Another configuration option to try is to further reduce the size of sndbuf and recbuf, and benchmark the impact on memory use and performance.

Thanks -
Luke
Reply all
Reply to author
Forward
0 new messages