Performance tuning of Consul

1,117 views
Skip to first unread message

dreieck...@gmail.com

unread,
Sep 12, 2016, 11:23:44 AM9/12/16
to Consul
Hi!

I'm currently tasked with evaluating consul as a kv-store for a project where we need a throughput of at least 120k requests per second.
So far I managed to get to about 23k req/s for reading while our etcd test system got about 130k req/s so I though I may have overlooked something.
The consul data dir is placed on a ramdisk, the server is a 24 core dual-processor xeon system and as network adapters there is infiniband as well as 10gig available. The values will have a size around 200 bytes, so the ip stack got optimized for latency. Also, GOMAXPROCS was set to 24 and no other programs are running, beside consul and the basic system (debian linux). Also enabled enabled "-dev"-mode for testing.
I tried 0.6.4 as well as 0.7.0rc2 (with 1 for the raft multiplier) and different node setups (from 1 client, 1 server to 3 clients and 3 servers with every client directed at a different server).
The 23k is an average for reading, for writing it is at about 14k per second. Also, latency spikes are a common thing (ranges from 370 to 190.000 microseconds, with an average of 680 microseconds).

I would be really grateful if anyone can point me in a direction how I may increase the throughput. Maybe I'm overlooking something obvious.

Greetings,
          Martin Zahn


Lowe Schmidt

unread,
Sep 13, 2016, 6:47:37 PM9/13/16
to consu...@googlegroups.com
Is a service discovery tool really what you should be using for high throughput K/V operations? There is probably a lot of better options out there than Consul for that use case.

--
Lowe Schmidt | +46 723 867 157

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/consul/issues
IRC: #consul on Freenode
---
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/consul-tool/1041d012-0a9a-4df1-a547-7bd289bfdd86%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Adams

unread,
Sep 13, 2016, 6:47:37 PM9/13/16
to consu...@googlegroups.com
I would expect "-dev" to slow things down (not consul-specific knowledge, but just intuition), so try turning that off. Writing by nature will be slower. But you will see a big jump in throughput if you add the `?stale` flag to your API queries. This allows the server you're connected to to service the request instead of deferring to the leader (which is the default mode). In my testing (which was many months ago), the difference in throughput was about 50x.

Message has been deleted
Message has been deleted

James Phillips

unread,
Sep 20, 2016, 11:07:23 PM9/20/16
to consu...@googlegroups.com
Using -dev would definitely affect write throughput - it uses the
InmemStore which is basically a map and a lock, so it's not designed
for high performance at all. If you drop -dev and point -data-dir at a
disk with high write throughput you should see a big improvement on
the write side. David's right on the read side, the ?stale argument
should allow reads to happen from any server, otherwise you are
possibly doing an RPC forward under the hood to the leader in your
multi-server setups.

We owe the community some more formal benchmarks and we've got that in
the queue to do, but please let me know how these work and I'll try to
help you get some good numbers.
>> email to consul-tool...@googlegroups.com.
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/consul/issues
> IRC: #consul on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Consul" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to consul-tool...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/consul-tool/CAN3s8zYt%3DHTBzYfgdkF8oqs%2BfpgVr%3DSj0end7UgCbX8b60ofSQ%40mail.gmail.com.

Martin Zahn

unread,
Sep 21, 2016, 4:42:27 AM9/21/16
to consu...@googlegroups.com
Hello,

thank you for your email. I could not try other runs because our load balancing team is saturating our network for some days now. But using 0.7.0rc2 dev and a raft multiplier of 1 I was able to get 98k reads per second with 128 bytes of data per key, one server and 10k clients using the ycsb benchmark. Stale reads will probably not be tested as our final setup will need non-stale blues, but I will see how much time I get when the network is available again.
The only thing baffling me is the wide area of latency I'm getting (in the aforementioned run I got a minimal latency of 170 microsecond and a maximum of 590 milliseconds, with an average of abut 120 milliseconds). But that might be an Issue of the benchmark, which I will investigate. When I got my final numbers I will forward you our results and setup.

Greeting,
       Martin Zahn
Reply all
Reply to author
Forward
0 new messages