Tuning/sizing vault for performance - PKI usecase

1,131 views
Skip to first unread message

Keshava Bharadwaj

unread,
Nov 27, 2017, 4:01:20 AM11/27/17
to Vault
Hi,

We are trying to benchmark the PKI backend by getting a certificate against a role.
We are using approle backend for authentication(role_id and secret_id). All this runs on AWS.

Use-case: Get a certificate from pki secret backed on a role using approle auth backend.
Storage backend: Consul running on same node.

Our deployment is something like this -
Aws route 53      -> elb                 -> vault node.
(vault.local.com) -> (elb address) -> (vault IP address)

I am using wrk to run benchmarks and here are some results.
The benchmarks are against two instances of vault running on - one using t2.small(1vcpu, 2GB ram) and another m4.large(2vcpu, 8GB ram).

$ cat post.lua 

wrk.method = "POST"

wrk.body   = '{"common_name": "exampleservice.local.com"}'

wrk.headers["Content-Type"] = "application/json"

wrk.headers["X-Vault-Token"] = "7748fe79-61d7-81bc-03c9-8890918a9473"


0. Running with 1 thread and 1 connection for 1 second -


$ wrk -t1 -c1 -d1s --timeout 1s -s post.t2small.lua https://vault-t2small.local.com:8200/v1/pki/issue/exampleservice

Running 1s test @ https://vault-t2small.local.com:8200/v1/pki/issue/exampleservice

  1 threads and 1 connections

  Thread Stats   Avg      Stdev     Max   +/- Stdev

    Latency   249.09ms   69.83ms 310.67ms   66.67%

    Req/Sec     3.67      1.15     5.00     66.67%

  3 requests in 1.00s, 14.15KB read

Requests/sec:      3.00

Transfer/sec:     14.14KB


$ wrk -t1 -c1 -d1s --timeout 1s -s post.lua https://vault-m4large.local.com:8200/v1/pki/issue/exampleservice

Running 1s test @ https://vault-m4large.local.com:8200/v1/pki/issue/exampleservice

  1 threads and 1 connections

  Thread Stats   Avg      Stdev     Max   +/- Stdev

    Latency   239.77ms  103.09ms 381.69ms   75.00%

    Req/Sec     5.00      3.56    10.00     75.00%

  4 requests in 1.00s, 18.91KB read

Requests/sec:      3.99

Transfer/sec:     18.86KB


1. Running with 20 threads and 20 connections for 20 seconds-


$ wrk -t20 -c20 -d20s --timeout 20s -s post.t2small.lua https://vault-t2small.local.com:8200/v1/pki/issue/exampleservice

Running 20s test @ https://vault-t2small.local.com:8200/v1/pki/issue/exampleservice

  20 threads and 20 connections

  Thread Stats   Avg      Stdev     Max   +/- Stdev

    Latency     5.68s     2.40s   12.19s    65.45%

    Req/Sec     0.00      0.00     0.00    100.00%

  55 requests in 20.10s, 259.53KB read

Requests/sec:      2.74

Transfer/sec:     12.91KB


$ wrk -t20 -c20 -d20s --timeout 20s -s post.lua https://vault-m4large.local.com:8200/v1/pki/issue/exampleservice

Running 20s test @ https://vault-m4large.local.com:8200/v1/pki/issue/exampleservice

  20 threads and 20 connections

  Thread Stats   Avg      Stdev     Max   +/- Stdev

    Latency     4.74s     2.50s   16.15s    76.81%

    Req/Sec     0.00      0.00     0.00    100.00%

  68 requests in 20.05s, 321.41KB read

Requests/sec:      3.39

Transfer/sec:     16.03KB


2. Running with 50 threads and 50 connections for 20 seconds -

$ wrk -t50 -c50 -d20s --timeout 20s -s post.t2small.lua https://vault-t2small.local.com:8200/v1/pki/issue/exampleservice

Running 20s test @ https://vault-t2small.local.com:8200/v1/pki/issue/exampleservice

  50 threads and 50 connections

  Thread Stats   Avg      Stdev     Max   +/- Stdev

    Latency    10.20s     3.36s   17.46s    65.22%

    Req/Sec     0.00      0.00     0.00    100.00%

  46 requests in 20.10s, 217.07KB read

Requests/sec:      2.29

Transfer/sec:     10.80KB


$ wrk -t50 -c50 -d20s --timeout 20s -s post.lua https://vault-m4large.local.com:8200/v1/pki/issue/exampleservice

Running 20s test @ https://vault-m4large.local.com:8200/v1/pki/issue/exampleservice

  50 threads and 50 connections

  Thread Stats   Avg      Stdev     Max   +/- Stdev

    Latency     8.14s     3.10s   15.36s    71.43%

    Req/Sec     0.00      0.00     0.00    100.00%

  42 requests in 20.10s, 198.52KB read

  Socket errors: connect 29300, read 50, write 0, timeout 0

Requests/sec:      2.09

Transfer/sec:      9.88KB


Findings from results:
1. There is no proportional increase in throughput or decrease  in latency with increase of cpus or memory on vault nodes.(t2.small -> m4.large)
2. A single pki request<->response approximately takes ~250ms - 400ms, while on 50 concurrent requests take upto 18seconds.

Questions:
1. How can one scale vault nodes(Higher flavor not helping?) to get the response time in sub second interval, even on loads of 50 concurrent requests?
2. Are  there some recommended  Vault specific linux tunings that needs to be done on nodes running vault?
3. Any other recommendations or anything to look into?

Thanks,
Keshava

Matt Button

unread,
Nov 27, 2017, 9:38:34 AM11/27/17
to vault...@googlegroups.com
Hey Keshava,

I don't use the PKI backend so can't help too much there, but here are a few general notes:

- Vault servers run in active/standby mode. Adding extra servers won't improve performance, as only one vault node will be able to service requests. I think Vault enterprise can scale differently, but it sounds like you're using the standard open source version.
- t2 instances have unpredictable CPU performance by design, you probably shouldn't be using them for benchmarking vault. Our vault cluster uses C3 machines, and comfortably handles >400 requests/second to the transit backend. Our workload is almost entirely CPU bound and few of our requests hit the storage backend.
- It'd probably help if you could share how the ELB is configured, e.g. if it's a TCP/HTTP LB, and what healthchecks are configured. If the ELB is routing requests to the standby servers vault will have to proxy, or redirect requests to the active node.
- I'd suggest looking into the performance of your storage backend. You can configure vault to emit timings for how long it takes to get/set data in the underlying storage backend (the metrics will probably be called something like `vault.consul.*`)

Matt

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/vault/issues
IRC: #vault-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Vault" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vault-tool+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vault-tool/8476a11f-7b15-43d0-844f-d779ce218ae5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Keshava Bharadwaj

unread,
Nov 27, 2017, 10:20:44 PM11/27/17
to Vault
Hi Matt,

Thanks for the responses.

Some responses:

1. Correct. I'm aware of the fact that vault is active-standby and hence its scale up rather than scale out. Hence i kept on increasing the system sizes.(t2.small -> m4.large -> m4.xlarge). But i did not see any improvements in throughput.
    Yes we are using vault opensource version. I agree on t2 not used for benchmark. The numbers were to show that there's not increase in throughput even with m4.large.

2. The ELB is a TCP LB, with healthchecks configured to /v1/sys/health so that all requests always go to active node.(and not redirect from standby to active), and only one is active at a given time.
    

 

 



I'll check on the vault.consul.* telemetry information in case if i can find something.

In case there are other checks/tuning that needs to be done specifically to scale PKI, can someone let know their inputs?

- Keshava

Keshava Bharadwaj

unread,
Nov 27, 2017, 11:19:06 PM11/27/17
to Vault
Also, noticed few metrics on datadog while running the 20 threads and 20 connections for 20 seconds

The number of heap objects shows around 55k, while sys_bytes and alloc_bytes are around just 20M.(These are vault telemetry metrics)
Is it Normal to have such large number of heap objects?

 

 



- Keshava

Jeff Mitchell

unread,
Nov 28, 2017, 12:46:33 AM11/28/17
to Vault
Hi Keshava,

Some various notes about your posts:

* As was pointed out, micro and small instances tend to be poor candidates for Vault (both in benchmarking and running) due to unpredictable CPU performance, so I'd recommend avoiding those for this purpose.

* I have no experience with wrk, however, when I last did serious benchmarking I found that many of the quick and dirty benchmarking tools are actually pretty terrible at scaling, and worse, they can make it seem like the problem is with the application. The only benchmarking tool that I found at the time that would really give a good picture was Apache Bench. It was literally a difference of topping out at 500-800 requests per second vs. 35000 requests per second...on a very, very, very contrived benchmark, but showing that request capabilities of the benchmarking tool matter quite a lot. I would recommend you throw Apache Bench into the mix as a control.

* In real-world usage (and in benchmarking) if you have clients making repeated requests you should take advantage of HTTP/2 and/or HTTP/1.1 connection reuse as it makes a huge difference. For simple operations, HTTP request setup time (TCP three way handshake, etc.) can be pretty dominating.

* If you are having the PKI backend issue the certificate and private key, as opposed to signing CSRs, you will be bound by available entropy on the Vault server and the high CPU requirements for computing key pairs. This can easily cause fairly linear scaling. There are a few ways to avoid this but the most general-purpose is to have clients generate CSRs and submit them for signing. This will be exacerbated by using instances like smalls that have heavy CPU throttling.

* Vault can easily be holding 50k heap objects. Your graph looks pretty normal, you can easily see when the garbage collector is running. Over the lifetime of your test the actual allocated bytes is staying pretty stable (again, taking GC into account) so I don't see anything worrying there.

Best,
Jeff

To unsubscribe from this group and stop receiving emails from it, send an email to vault-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vault-tool/6bc775c7-55fc-4fe4-a65f-f3a2dd0c7293%40googlegroups.com.

Keshava Bharadwaj

unread,
Nov 28, 2017, 4:44:58 AM11/28/17
to Vault
Hi Jeff,

Thanks for your detailed responses.

Some notes on them:

1. Correct and totally agree that t2(burstable) instances should not be used. and as mentioned before I am  well aware of it. Just added that data to show that there was no increase in scale when moved from T2 to M4. 
    similarly, there was no increase in scale when moved from M4.large to M4.2xlarge.

2. I tried apache bench(ab) first and got almost similar results. hence also tried with wrk to make sure results are almost comparable.
    Got similar statistics with apache bench.

3. I had tried the apache bench's keep-alive(-k) option. I believe thats what you meant by connection reuse.
    But the results were not entirely different. Both had similar throughput.

4. On almost all runs, cpu spiked to 100%. You are correct. CSR signing might be good, but our use case involves getting both certificates and private key from vault itself as we rely on vault to provide both.
    since there are lot of services, each of them generating a csr and providing for signing would be laborious.

How many cpus would you recommend or foresee for pki usage which generates both certs and private key for a load of 20requests/sec?

-Keshava

Keshava Bharadwaj

unread,
Nov 28, 2017, 4:53:02 AM11/28/17
to Vault
Apache bench results:

$ab -k -p post.json -T application/json -H "X-Vault-Token: fb0cca83-f22c-5049-16b8-e2d343395d2f" -c 50 -n 50 -t 20 -v 2 https://vault-m42xlarge.local.com:8200/v1/pki/issue/exampleservice > ab-c50-n50-dynamo-k.log 2>&1

Server Software:

Server Hostname:        vault-m42xlarge.local.com

Server Port:            8200

SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256


Document Path:          /v1/pki/issue/exampleservice

Document Length:        4478 bytes


Concurrency Level:      50

Time taken for tests:   20.089 seconds

Complete requests:      267

Failed requests:        141

 (Connect: 0, Receive: 0, Length: 141, Exceptions: 0)

Write errors:           0

Keep-Alive requests:    0

Total transferred:      1226353 bytes

Total body sent:        95130

HTML transferred:       1196182 bytes

Requests per second:    13.29 [#/sec] (mean)

Time per request:       3761.926 [ms] (mean)

Time per request:       75.239 [ms] (mean, across all concurrent requests)

Transfer rate:          59.62 [Kbytes/sec] received

                      4.62 kb/s sent

                      64.24 kb/s total


Connection Times (ms)

            min  mean[+/-sd] median   max

Connect:        6  164  88.9    173     536

Processing:   271 3127 1567.2   2818    9135

Waiting:      267 3127 1567.1   2818    9135

Total:        277 3291 1576.8   2980    9328


Percentage of the requests served within a certain time (ms)

50%   2980

66%   3643

75%   4084

80%   4357

90%   5530

95%   6457

98%   7665

99%   8025



$grep "HTTP/1.0 200 OK" ab-c50-n50-dynamo-k.log

267












This is similar to wrk results.

Keshava Bharadwaj

unread,
Nov 29, 2017, 8:42:04 AM11/29/17
to Vault
Hi Jeff,

With this use case of Vault generating pkis instead of creating certs from CSR, are those numbers expected with 2-4 vcpus?
Whats the number of cpus you expect for a throughput of atleast 50req/sec for pki usecase?

- Keshava

Jeff Mitchell

unread,
Nov 30, 2017, 10:16:19 AM11/30/17
to Vault
Hi Keshava,

I don't have any such numbers to give you, sorry. If you must generate all keys on the Vault server, you may want to look at different bit sizes and rsa vs. ecdsa keys.

Best,
Jeff

To unsubscribe from this group and stop receiving emails from it, send an email to vault-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vault-tool/25f6daf0-157a-428e-bb9b-d0ebda80f6f0%40googlegroups.com.

Keshava Bharadwaj

unread,
Nov 30, 2017, 10:12:52 PM11/30/17
to Vault
Hi Jeff,

Thanks. 

Also, Are there any vault specific tunings or best practices that one needs to follow in order to tune vault for performance?

I've already gone through the production guide and we have incorporated the notes from here.- https://www.vaultproject.io/guides/production.html.

Thanks,
Keshava

Jeff Mitchell

unread,
Dec 4, 2017, 1:55:47 PM12/4/17
to Vault
Hi Keshava,

As we think of specifics we put it into that guide, so nothing else offhand!

Best,
Jeff

To unsubscribe from this group and stop receiving emails from it, send an email to vault-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vault-tool/11fc645d-3b43-4afe-a6c5-6a4b08049178%40googlegroups.com.

Keshava Bharadwaj

unread,
Dec 4, 2017, 11:57:16 PM12/4/17
to Vault
Hi Jeff,

ok sure. we will keep a tab on that page.
Thanks for your time and valuable suggestions in the thread.

Thanks,
Keshava

Jeff Mitchell

unread,
Dec 5, 2017, 8:00:30 AM12/5/17
to Vault
Happy to help!

To unsubscribe from this group and stop receiving emails from it, send an email to vault-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vault-tool/a959caa3-89f2-4fa1-a4df-7db36e125dd6%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages