Our experiences with around 600k secrets in vault.

418 views
Skip to first unread message

Craig Sawyer

unread,
Aug 28, 2017, 11:05:17 PM8/28/17
to Vault
Hi everyone,

There has been questions on occasion with how many secrets you can store in vault. I haven't seen anyone else talk about their sizing, except to say can Vault handle X amount of stuff?
Our experience is, things start to slow down when we hit ~ 600,000 secrets. Without getting into the details of our use-case, which is definitely not fabulous.. (we are working on transitioning to using the transit backend for this, as it would let us offload the storage onto end user machines and fit better with Vault) here is our config and experience. We sort of just shoved an existing process into Vault, to get some security, knowing it wasn't an ideal workload for vault.  Overall we are still happy with this decision, while we transition into a better process that does fit Vault better.

Config:
  HA vault on 3 8gb memory instances, they are not solely dedicated to vault (yes we want to fix this, but public school funding has it's limits...)
  Consul storage backend on 3 different machines (also with at least 8GB memory) -- also not dedicated to only consul.
  1 path in secret/APPNAME/ had 599k entries in it.  (plus we have about 1k secrets in other paths)

Most of our load was on the consul boxes when this hit, we had *some* impact to end users hitting up the vault instance, no timeouts were hit, but 30second delays were possible.  This happened last week, we don't normally run with 600k secrets, but we use the secrets backend to store temporary secrets we generate for end users, we have code that comes along and erases them when done, but there was a bug in the cleanup code that was causing it to not run, and it sort of piled up on us.  We normally average around 100k secrets in production, without issues.

We had a noticeable slow down, but no outage when we hit 600k secrets in Vault. We didn't spend much time diagnosing the problem, other than to just fix the cleanup code and get it cleaning up the old secrets that need to get erased. Plus we learned to keep a better eye on our cleanup code(it now alerts us on a failure)

CPU and Memory were not pegged out max for either the vault or consul process, the delay(s) seemed to be in I/O (likely network), but we didn't dig into this.

For us, with our current production config, our upper limit seems to be around 600k, before we have slowdown issues.  It was taking about 1/2 a second wall time to delete from that path for a while. We just cleaned up overnight, and everyone was back to normal the next day. 

Thanks Vault for not just crashing and burning on us!

Jeff Mitchell

unread,
Aug 29, 2017, 1:23:14 PM8/29/17
to Vault
Great to hear!

It's always hard to give benchmarks/scaling information to people because it can depend so much on *everything* -- instance sizing, network status, how fast secrets are being written/deleted, RPS, so on. So it's always nice for people to give us information on what worked for them.

One question for you -- did you play around at all with adjusting the cache size in Vault? (https://www.vaultproject.io/docs/configuration/index.html#cache_size)

Best,
Jeff

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/vault/issues
IRC: #vault-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Vault" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vault-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vault-tool/c93095ab-38af-4979-a56c-e2be0ed640ff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
Message has been deleted
0 new messages