Consul performance is only 10K QPS, while my logic server is 100K QPS.

919 views
Skip to first unread message

wil zhang

unread,
Nov 2, 2016, 8:00:53 AM11/2/16
to Consul
I'm doing performance test on Consul cluster recently.
I got the test result from consul/bench/results-0.3.md , and my own benchmark is close to that.
 =====  machine specs =====
 * 8 CPU Cores, 2Ghz
 * 16GB RAM
 * 160GB SSD disk
 * 1Gbps NIC
   
 ===== GET stale test =====
 GOMAXPROCS=4 boom -n 20480 -c 64 http://localhost:8500/v1/kv/bench?stale
 20480 / 20480 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00 %

 Summary:
      Total:    1.8706 secs.
      Slowest:  0.0271 secs.
      Fastest:  0.0011 secs.
      Average:  0.0058 secs.
      Requests/sec: 10948.2819
      Total Data Received:  2867200 bytes.
      Response Size per Request:    140 bytes.

My question is 10K QPS is quite slow for me because my logic server has a 100K QPS, currently i have several options for this problem:
1. Add cache on my logic server, and cache will be refreshed every 10 secs. But this will reduce the sensitivity of service down.
2. Add more Consul servers. but in fact it doesn't help because consul agent keep connecting to one same consul server in 30 secs, other servers doesn't handle requests at the meantime.

I appreciate if someone will enlighten me, thanks a lot.

David Adams

unread,
Nov 2, 2016, 8:53:34 AM11/2/16
to consu...@googlegroups.com
20480 total queries is hardly enough to correctly measure speed on the scale of 10kqps. What happens when you increase the values of -n, -c, and GOMAXPROCS?

It's not clear, are the "machine specs" given for the Consul server or the box where you're running the load generation?

What's the network latency between the two boxes? A 6 millisecond latency would entirely explain your result, as merely waiting for the responses--assuming established TCP connections--with zero server processing time expended would take 1.9 seconds to process 20480 requests on 64 simultaneous connections (ie the specs you provided).

Most importantly, what does the CPU, memory, and network throughput look like on the Consul server and on the load generation box while running the test? What happens if you run the same load generation from more agent boxes simultaneously?

What tuning have you done based on this documentation? https://www.consul.io/docs/guides/performance.html

I doubt you are saturating Consul yet, but instead that your load test is simply hitting its own limits. But it's impossible to be sure without more info like server statistics, a more sustained test, and a multi-step progressive load generation until we hit actual saturation of the network or the server CPU/memory resources.

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/d29a98fc-5c68-4530-a2f9-cc378f629e7f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

wil zhang

unread,
Nov 3, 2016, 5:16:05 AM11/3/16
to Consul
20480 total queries is hardly enough to correctly measure speed on the scale of 10kqps. What happens when you increase the values of -n, -c, and GOMAXPROCS?
>More test result using increasing -c number and max GOMAXPROCS number:

Summary:
  Total:        10.6989 secs
  Slowest:      0.0430 secs
  Fastest:      0.0002 secs
  Average:      0.0041 secs
  Requests/sec: 15313.7677
  Total data:   286556160 bytes
  Size/request: 1749 bytes
  
  Summary:
  Total:        10.4613 secs
  Slowest:      0.2456 secs
  Fastest:      0.0002 secs
  Average:      0.0320 secs
  Requests/sec: 15661.5184
  Total data:   286556160 bytes
  Size/request: 1749 bytes
  
Summary:
  Total:        9.7305 secs
  Slowest:      3.2540 secs
  Fastest:      0.0002 secs
  Average:      0.0555 secs
  Requests/sec: 16837.8352
  Total data:   286556160 bytes
  Size/request: 1749 bytes

  
It's not clear, are the "machine specs" given for the Consul server or the box where you're running the load generation?
>The topologies: 1 VM for load generator and consul client, 3 VMs for consul server, which are all in same LAN network, the Ping latency is about 0.18ms.
Load generator machine specs:
* 8 CPU Cores, 2.53GHz
* 8GB RAM
* 400GB DISK
Consul server machine specs:
* 8 CPU Cores, 2.53GHz
* 8GB RAM
* 400GB DISK


What's the network latency between the two boxes? A 6 millisecond latency would entirely explain your result, as merely waiting for the responses--assuming established TCP connections--with zero server processing time expended would take 1.9 seconds to process 20480 requests on 64 simultaneous connections (ie the specs you provided).
>The ping result shows a 0.15ms latency:
64 bytes from 10.197.1.42: icmp_seq=480 ttl=64 time=0.155 ms
64 bytes from 10.197.1.42: icmp_seq=481 ttl=64 time=0.185 ms
64 bytes from 10.197.1.42: icmp_seq=482 ttl=64 time=0.208 ms
64 bytes from 10.197.1.42: icmp_seq=483 ttl=64 time=0.155 ms

Most importantly, what does the CPU, memory, and network throughput look like on the Consul server and on the load generation box while running the test? What happens if you run the same load generation from more agent boxes simultaneously?
>If run the test from more agent boxes, the throughput in all should increase because more consul servers will handle requests. But the speed in one box(about 15K QPS) still doesnot meet my request.

What tuning have you done based on this documentation? https://www.consul.io/docs/guides/performance.html
>Yes, i have read this document, and cannot find answer except replacing the harddisk with SSD. 

I doubt you are saturating Consul yet, but instead that your load test is simply hitting its own limits. But it's impossible to be sure without more info like server statistics, a more sustained test, and a multi-step progressive load generation until we hit actual saturation of the network or the server CPU/memory resources.
>Right, i just start to use Consul and hope to integrate it in my product, which has 10 million active users daily. But currently the QPS is not good enough, and i'm seeking the reason.


在 2016年11月2日星期三 UTC+8下午8:53:34,David Adams写道:
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool...@googlegroups.com.

Armon Dadgar

unread,
Nov 3, 2016, 1:10:18 PM11/3/16
to consu...@googlegroups.com, wil zhang
Wil,

In general, Consul read performance is bottlenecked by CPU. I would investigate the CPU
utilization both on the server side and the client side which is generating the load.

It’s not clear from this information what version of Consul you are running, or the setup
of the larger cluster. It looks like you are querying the “CMS” service, but how many instances
does that have with associated health checks? A lookup of a health check is more expensive
than K/V lookup because of filtering of health checks.

In either case, the performance may be off by a constant (I would expect more than 15K QPS
on that grade of hardware), however I would not expect Consul to do 100K QPS on a single
machine. If your use case calls for that, usually making use of a long polling and edge triggering
updates or applying a small amount of cache will dramatically reduce the QPS required.

For example, instead of doing 100K lookups of the CMS service, you could do a blocking
query to just get the set of live servers and then do client side routing. In general, unless the
services are flapping, you would expect to reduce lookups to ~1 QPS.

Hope that helps!

Best Regards,
Armon Dadgar

David Adams

unread,
Nov 3, 2016, 1:45:30 PM11/3/16
to consu...@googlegroups.com
On Thu, Nov 3, 2016 at 4:16 AM, wil zhang <zhangw...@gmail.com> wrote:
Most importantly, what does the CPU, memory, and network throughput look like on the Consul server and on the load generation box while running the test? What happens if you run the same load generation from more agent boxes simultaneously?
>If run the test from more agent boxes, the throughput in all should increase because more consul servers will handle requests. But the speed in one box(about 15K QPS) still doesnot meet my request.

Can you collect CPU statistics from the agent and server? Are all 8 cores maxing out?
 
I doubt you are saturating Consul yet, but instead that your load test is simply hitting its own limits. But it's impossible to be sure without more info like server statistics, a more sustained test, and a multi-step progressive load generation until we hit actual saturation of the network or the server CPU/memory resources.
>Right, i just start to use Consul and hope to integrate it in my product, which has 10 million active users daily. But currently the QPS is not good enough, and i'm seeking the reason.

Armon makes some good points about this. Setting watches on health checks and KV keys to be notified when they change will allow you to achieve the same consistency factor without requiring queries nearly as often.

wil zhang

unread,
Nov 4, 2016, 7:37:04 AM11/4/16
to Consul, zhangw...@gmail.com
Thanks a lot, Armon, much appreciated.

在 2016年11月4日星期五 UTC+8上午1:10:18,Armon Dadgar写道:

wil zhang

unread,
Nov 4, 2016, 7:39:10 AM11/4/16
to Consul
I will collect investigate the CPU utilities, thanks for your advice.

在 2016年11月4日星期五 UTC+8上午1:45:30,David Adams写道:
Reply all
Reply to author
Forward
0 new messages