Tuning/sizing vault for performance

Keshava Bharadwaj

unread,

Nov 27, 2017, 4:01:20 AM11/27/17

to Vault

Hi,

We are trying to benchmark the PKI backend by getting a certificate against a role.

We are using approle backend for authentication(role_id and secret_id). All this runs on AWS.

Use-case: Get a certificate from pki secret backed on a role using approle auth backend.

Storage backend: Consul running on same node.

Our deployment is something like this -

Aws route 53 -> elb -> vault node.

(vault.local.com) -> (elb address) -> (vault IP address)

I am using wrk to run benchmarks and here are some results.

The benchmarks are against two instances of vault running on - one using t2.small(1vcpu, 2GB ram) and another m4.large(2vcpu, 8GB ram).

$ cat post.lua

wrk.method = "POST"

wrk.body = '{"common_name": "exampleservice.local.com"}'

wrk.headers["Content-Type"] = "application/json"

wrk.headers["X-Vault-Token"] = "7748fe79-61d7-81bc-03c9-8890918a9473"

0. Running with 1 thread and 1 connection for 1 second -

$ wrk -t1 -c1 -d1s --timeout 1s -s post.t2small.lua https://vault-t2small.local.com:8200/v1/pki/issue/exampleservice

Running 1s test @ https://vault-t2small.local.com:8200/v1/pki/issue/exampleservice

1 threads and 1 connections

Thread Stats Avg Stdev Max +/- Stdev

Latency 249.09ms 69.83ms 310.67ms 66.67%

Req/Sec 3.67 1.15 5.00 66.67%

3 requests in 1.00s, 14.15KB read

Requests/sec: 3.00

Transfer/sec: 14.14KB

$ wrk -t1 -c1 -d1s --timeout 1s -s post.lua https://vault-m4large.local.com:8200/v1/pki/issue/exampleservice

Running 1s test @ https://vault-m4large.local.com:8200/v1/pki/issue/exampleservice

1 threads and 1 connections

Thread Stats Avg Stdev Max +/- Stdev

Latency 239.77ms 103.09ms 381.69ms 75.00%

Req/Sec 5.00 3.56 10.00 75.00%

4 requests in 1.00s, 18.91KB read

Requests/sec: 3.99

Transfer/sec: 18.86KB

1. Running with 20 threads and 20 connections for 20 seconds-

$ wrk -t20 -c20 -d20s --timeout 20s -s post.t2small.lua https://vault-t2small.local.com:8200/v1/pki/issue/exampleservice

Running 20s test @ https://vault-t2small.local.com:8200/v1/pki/issue/exampleservice

20 threads and 20 connections

Thread Stats Avg Stdev Max +/- Stdev

Latency 5.68s 2.40s 12.19s 65.45%

Req/Sec 0.00 0.00 0.00 100.00%

55 requests in 20.10s, 259.53KB read

Requests/sec: 2.74

Transfer/sec: 12.91KB

$ wrk -t20 -c20 -d20s --timeout 20s -s post.lua https://vault-m4large.local.com:8200/v1/pki/issue/exampleservice

Running 20s test @ https://vault-m4large.local.com:8200/v1/pki/issue/exampleservice

20 threads and 20 connections

Thread Stats Avg Stdev Max +/- Stdev

Latency 4.74s 2.50s 16.15s 76.81%

Req/Sec 0.00 0.00 0.00 100.00%

68 requests in 20.05s, 321.41KB read

Requests/sec: 3.39

Transfer/sec: 16.03KB

2. Running with 50 threads and 50 connections for 20 seconds -

$ wrk -t50 -c50 -d20s --timeout 20s -s post.t2small.lua https://vault-t2small.local.com:8200/v1/pki/issue/exampleservice

Running 20s test @ https://vault-t2small.local.com:8200/v1/pki/issue/exampleservice

50 threads and 50 connections

Thread Stats Avg Stdev Max +/- Stdev

Latency 10.20s 3.36s 17.46s 65.22%

Req/Sec 0.00 0.00 0.00 100.00%

46 requests in 20.10s, 217.07KB read

Requests/sec: 2.29

Transfer/sec: 10.80KB

$ wrk -t50 -c50 -d20s --timeout 20s -s post.lua https://vault-m4large.local.com:8200/v1/pki/issue/exampleservice

Running 20s test @ https://vault-m4large.local.com:8200/v1/pki/issue/exampleservice

50 threads and 50 connections

Thread Stats Avg Stdev Max +/- Stdev

Latency 8.14s 3.10s 15.36s 71.43%

Req/Sec 0.00 0.00 0.00 100.00%

42 requests in 20.10s, 198.52KB read

Socket errors: connect 29300, read 50, write 0, timeout 0

Requests/sec: 2.09

Transfer/sec: 9.88KB

Findings from results:

1. There is no proportional increase in throughput or decrease in latency with increase of cpus or memory on vault nodes.(t2.small -> m4.large)

2. A single pki request<->response approximately takes ~250ms - 400ms, while on 50 concurrent requests take upto 18seconds.

Questions:

1. How can one scale vault nodes(Higher flavor not helping?) to get the response time in sub second interval, even on loads of 50 concurrent requests?

2. Are there some recommended Vault specific linux tunings that needs to be done on nodes running vault?

3. Any other recommendations or anything to look into?

Thanks,

Keshava

Matt Button

unread,

Nov 27, 2017, 9:38:34 AM11/27/17

to vault...@googlegroups.com

Hey Keshava,

I don't use the PKI backend so can't help too much there, but here are a few general notes:

- Vault servers run in active/standby mode. Adding extra servers won't improve performance, as only one vault node will be able to service requests. I think Vault enterprise can scale differently, but it sounds like you're using the standard open source version.

- t2 instances have unpredictable CPU performance by design, you probably shouldn't be using them for benchmarking vault. Our vault cluster uses C3 machines, and comfortably handles >400 requests/second to the transit backend. Our workload is almost entirely CPU bound and few of our requests hit the storage backend.

- It'd probably help if you could share how the ELB is configured, e.g. if it's a TCP/HTTP LB, and what healthchecks are configured. If the ELB is routing requests to the standby servers vault will have to proxy, or redirect requests to the active node.

- I'd suggest looking into the performance of your storage backend. You can configure vault to emit timings for how long it takes to get/set data in the underlying storage backend (the metrics will probably be called something like `vault.consul.*`)

Matt

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.

GitHub Issues: https://github.com/hashicorp/vault/issues
IRC: #vault-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Vault" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vault-tool+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vault-tool/8476a11f-7b15-43d0-844f-d779ce218ae5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Keshava Bharadwaj

unread,

Nov 27, 2017, 10:20:44 PM11/27/17

to Vault

Hi Matt,

Thanks for the responses.

Some responses:

1. Correct. I'm aware of the fact that vault is active-standby and hence its scale up rather than scale out. Hence i kept on increasing the system sizes.(t2.small -> m4.large -> m4.xlarge). But i did not see any improvements in throughput.

Yes we are using vault opensource version. I agree on t2 not used for benchmark. The numbers were to show that there's not increase in throughput even with m4.large.

2. The ELB is a TCP LB, with healthchecks configured to /v1/sys/health so that all requests always go to active node.(and not redirect from standby to active), and only one is active at a given time.

I'll check on the vault.consul.* telemetry information in case if i can find something.

In case there are other checks/tuning that needs to be done specifically to scale PKI, can someone let know their inputs?

- Keshava

Keshava Bharadwaj

unread,

Nov 27, 2017, 11:19:06 PM11/27/17

to Vault

Also, noticed few metrics on datadog while running the 20 threads and 20 connections for 20 seconds

The number of heap objects shows around 55k, while sys_bytes and alloc_bytes are around just 20M.(These are vault telemetry metrics)

Is it Normal to have such large number of heap objects?

- Keshava

Jeff Mitchell

unread,

Nov 28, 2017, 12:46:33 AM11/28/17

to Vault

Hi Keshava,

Some various notes about your posts:

* As was pointed out, micro and small instances tend to be poor candidates for Vault (both in benchmarking and running) due to unpredictable CPU performance, so I'd recommend avoiding those for this purpose.

* I have no experience with wrk, however, when I last did serious benchmarking I found that many of the quick and dirty benchmarking tools are actually pretty terrible at scaling, and worse, they can make it seem like the problem is with the application. The only benchmarking tool that I found at the time that would really give a good picture was Apache Bench. It was literally a difference of topping out at 500-800 requests per second vs. 35000 requests per second...on a very, very, very contrived benchmark, but showing that request capabilities of the benchmarking tool matter quite a lot. I would recommend you throw Apache Bench into the mix as a control.

* In real-world usage (and in benchmarking) if you have clients making repeated requests you should take advantage of HTTP/2 and/or HTTP/1.1 connection reuse as it makes a huge difference. For simple operations, HTTP request setup time (TCP three way handshake, etc.) can be pretty dominating.

* If you are having the PKI backend issue the certificate and private key, as opposed to signing CSRs, you will be bound by available entropy on the Vault server and the high CPU requirements for computing key pairs. This can easily cause fairly linear scaling. There are a few ways to avoid this but the most general-purpose is to have clients generate CSRs and submit them for signing. This will be exacerbated by using instances like smalls that have heavy CPU throttling.

* Vault can easily be holding 50k heap objects. Your graph looks pretty normal, you can easily see when the garbage collector is running. Over the lifetime of your test the actual allocated bytes is staying pretty stable (again, taking GC into account) so I don't see anything worrying there.

Best,

Jeff

To unsubscribe from this group and stop receiving emails from it, send an email to vault-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vault-tool/6bc775c7-55fc-4fe4-a65f-f3a2dd0c7293%40googlegroups.com.

Keshava Bharadwaj

unread,

Nov 28, 2017, 4:44:58 AM11/28/17

to Vault

Hi Jeff,

Thanks for your detailed responses.

Some notes on them:

1. Correct and totally agree that t2(burstable) instances should not be used. and as mentioned before I am well aware of it. Just added that data to show that there was no increase in scale when moved from T2 to M4.
similarly, there was no increase in scale when moved from M4.large to M4.2xlarge.

2. I tried apache bench(ab) first and got almost similar results. hence also tried with wrk to make sure results are almost comparable.

Got similar statistics with apache bench.

3. I had tried the apache bench's keep-alive(-k) option. I believe thats what you meant by connection reuse.

But the results were not entirely different. Both had similar throughput.

4. On almost all runs, cpu spiked to 100%. You are correct. CSR signing might be good, but our use case involves getting both certificates and private key from vault itself as we rely on vault to provide both.

since there are lot of services, each of them generating a csr and providing for signing would be laborious.

How many cpus would you recommend or foresee for pki usage which generates both certs and private key for a load of 20requests/sec?

-Keshava

Keshava Bharadwaj

unread,

Nov 28, 2017, 4:53:02 AM11/28/17

to Vault

Apache bench results:

$ab -k -p post.json -T application/json -H "X-Vault-Token: fb0cca83-f22c-5049-16b8-e2d343395d2f" -c 50 -n 50 -t 20 -v 2 https://vault-m42xlarge.local.com:8200/v1/pki/issue/exampleservice > ab-c50-n50-dynamo-k.log 2>&1

Server Software:
Server Hostname:        vault-m42xlarge.local.com
Server Port:            8200
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256

Document Path:          /v1/pki/issue/exampleservice
Document Length:        4478 bytes

Concurrency Level:      50
Time taken for tests:   20.089 seconds
Complete requests:      267
Failed requests:        141
  (Connect: 0, Receive: 0, Length: 141, Exceptions: 0)
Write errors:           0
Keep-Alive requests:    0
Total transferred:      1226353 bytes
Total body sent:        95130
HTML transferred:       1196182 bytes
Requests per second:    13.29 [#/sec] (mean)
Time per request:       3761.926 [ms] (mean)
Time per request:       75.239 [ms] (mean, across all concurrent requests)
Transfer rate:          59.62 [Kbytes/sec] received
                       4.62 kb/s sent
                       64.24 kb/s total

Connection Times (ms)
             min  mean[+/-sd] median   max
Connect:        6  164  88.9    173     536
Processing:   271 3127 1567.2   2818    9135
Waiting:      267 3127 1567.1   2818    9135
Total:        277 3291 1576.8   2980    9328

Percentage of the requests served within a certain time (ms)
 50%   2980
 66%   3643
 75%   4084
 80%   4357
 90%   5530
 95%   6457
 98%   7665
 99%   8025


$grep "HTTP/1.0 200 OK" ab-c50-n50-dynamo-k.log
267

This is similar to wrk results.

Keshava Bharadwaj

unread,

Nov 29, 2017, 8:42:04 AM11/29/17

to Vault

Hi Jeff,

With this use case of Vault generating pkis instead of creating certs from CSR, are those numbers expected with 2-4 vcpus?

Whats the number of cpus you expect for a throughput of atleast 50req/sec for pki usecase?

- Keshava

Jeff Mitchell

unread,

Nov 30, 2017, 10:16:19 AM11/30/17

to Vault

Hi Keshava,

I don't have any such numbers to give you, sorry. If you must generate all keys on the Vault server, you may want to look at different bit sizes and rsa vs. ecdsa keys.

Best,

Jeff

To unsubscribe from this group and stop receiving emails from it, send an email to vault-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vault-tool/25f6daf0-157a-428e-bb9b-d0ebda80f6f0%40googlegroups.com.

Keshava Bharadwaj

unread,

Nov 30, 2017, 10:12:52 PM11/30/17

to Vault

Hi Jeff,

Thanks.

Also, Are there any vault specific tunings or best practices that one needs to follow in order to tune vault for performance?

I've already gone through the production guide and we have incorporated the notes from here.- https://www.vaultproject.io/guides/production.html.

Thanks,

Keshava

Jeff Mitchell

unread,

Dec 4, 2017, 1:55:47 PM12/4/17

to Vault

Hi Keshava,

As we think of specifics we put it into that guide, so nothing else offhand!

Best,

Jeff

To unsubscribe from this group and stop receiving emails from it, send an email to vault-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vault-tool/11fc645d-3b43-4afe-a6c5-6a4b08049178%40googlegroups.com.

Keshava Bharadwaj

unread,

Dec 4, 2017, 11:57:16 PM12/4/17

to Vault

Hi Jeff,

ok sure. we will keep a tab on that page.

Thanks for your time and valuable suggestions in the thread.

Thanks,

Keshava

Jeff Mitchell

unread,

Dec 5, 2017, 8:00:30 AM12/5/17

to Vault

Happy to help!

To unsubscribe from this group and stop receiving emails from it, send an email to vault-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vault-tool/a959caa3-89f2-4fa1-a4df-7db36e125dd6%40googlegroups.com.

Reply all

Reply to author

Forward

Tuning/sizing vault for performance - PKI usecase

Keshava Bharadwaj

Matt Button

Keshava Bharadwaj

Keshava Bharadwaj

Jeff Mitchell

Keshava Bharadwaj

Keshava Bharadwaj

Keshava Bharadwaj

Jeff Mitchell

Keshava Bharadwaj

Jeff Mitchell

Keshava Bharadwaj

Jeff Mitchell