Automating vault instance restarts and storing root tokens

882 views
Skip to first unread message

navi...@gmail.com

unread,
Apr 2, 2016, 10:53:52 AM4/2/16
to Vault

Wondering if there are any patterns for unsealing vault and accessing the root token so we can automate restarts if the vault instance is rebooted. We are planning to have a Restful service running on the vault instance which clients will invoke for mapping userids, login and request tokens etc. So we want the service to be initialized during instance startup so its ready to serve requests. The challenge we are facing is where do we store the encryption key and root password in a secure way so we can use them when the instance boots up. FYI, we are using consul as the storage backend.

Thanks
Navin

Jason Antman

unread,
Apr 3, 2016, 9:24:31 AM4/3/16
to Vault
Navin,

The pattern is: DO NOT DO IT! Use Vault's High Availability functionality to have as many manually-unsealed standby instances as you need.

We're just in the process of our first production Vault deployment, so I'm sure that others will chime in. But we (and I specifically) have been working on it for a few months now, and I think we finally have a pretty solid plan.

Storing the keys:

I guess it depends on how secure you need things to be. Our procedure is that when we stand up a vault cluster (or re-key an existing one), five humans (usually managers; they're our Vault Keyholders) are physically present when an engineer runs ``vault init``. Each keyholder is assigned a number 1-5, and physically copies that key from the screen onto a single sheet of paper. They then swap, and verify the keys were copied correctly. The engineer records the root token, which we'll use (manually) in the provisioning process and then store offline. Each keyholder is responsible for securing their keys (they're all technology managers or senior team members, well versed in security) with whatever method makes the most sense to them, and gives both security and ready access to the keys. I'm sure some people will use KeePass files or LastPass, and some will scroll the key on a tiny piece of paper and hide it somewhere in their wallet. But the idea is that (1) no more than 1 key has ever been entered into a given computer, and (2) nobody knows where anybody else stores their key, so even the odds of one keyholder obtaining multiple keys is very low.

If you use an automated method of unsealing the Vault, which has enough keys to unseal without human interaction, you've effectively eliminated the security of Vault. EVERY secret in your Vault is only as secure as that automated process. Even if your security needs are less demanding than mine, keep in mind that any person, machine or process that can access the unseal keys, can effectively access every secret in Vault.

How We Do It:

We run in EC2. We have a 5-node Vault cluster, with all 5 nodes manually unsealed by our keyholders when they boot. The 5 nodes are spread across three Availability Zones. All instances are monitored, including the status of the Vault instance on them; if an instance goes offline or a Vault becomes sealed, an on-call engineer is notified.

The key: If the active Vault instance goes down, one of **four** standbys takes its place. If *that* goes down, one of the remaining three standbys takes its place. If we lose one Vault instance in a given night or weekend (non-business-hours), we just wait for the next business day and have our keyholders unseal the replacement instance (or the newly-restarted vault service, whatever). If we lose two instances (down to a 3-node cluster; one active and two standby) we'll reach out to the keyholders after-hours to try to get the cluster back to capacity. If we lose three or more, we consider it a critical degradation and get keyholders to unseal replacement instances.

Vault's High Availability (HA) allows you to have a single active instance, and an effectively unlimited number of (unsealed) standby instances. You can have as many standby instances as you're willing to pay for, to avoid manual intervention.

-Jason

Jeff Mitchell

unread,
Apr 4, 2016, 10:27:07 AM4/4/16
to vault...@googlegroups.com
Hi Navin,

Jason's answer was excellent, so I'm just going to add two things onto it:

1) Jason didn't mention the PGP support for unseal keys, but it's a
great way to ensure that nobody copies down more than one key.

2) A really nice benefit to manually unsealing is that you can have a
process to ensure that the Vault binary being run is unmodified from
your expectations. Vault is open source, and a malicious party could
easily modify it to bad things. By allowing unseal key holders to
verify the binary before providing their key (for instance, with a
SHA256 sum against an expected value) you can know that what you're
running is what you're expecting to run.

Best,
Jeff
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/vault/issues
> IRC: #vault-tool on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Vault" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to vault-tool+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/vault-tool/bd81e01a-85a0-473c-8f03-bc9781dc413a%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

navi...@gmail.com

unread,
Apr 5, 2016, 9:32:07 AM4/5/16
to Vault

Thanks Jason for the detailed explanation and sharing your design.

We are running on google cloud and have a very aggressive SLA of recovery from failure if a zone goes down. Rest of our stack is designed to be provisioned in a automated way , so we could effectively push a button and everything comes up. Google also does regular maintenance of their VMs and even with the live migrate feature there could be periods when we have to manually bring the vault instances up even without a outage, our automated bootstrap process takes care of this for the rest of the stack. Having 3 unseal key holders available plus the root token for all severity introduces significant bottleneck in the uptime. 

One way we can possibly get around is like you mentioned use the HA feature and have enough vault instances running in a zone (maybe 6) but that adds to the cost.

navi...@gmail.com

unread,
Apr 5, 2016, 9:37:08 AM4/5/16
to Vault

Thanks Jeff. So does the unseal command verify the binary before prompting the keys ? We were planning to do the checksum verification as part of the vault startup script .

Jeff Mitchell

unread,
Apr 5, 2016, 9:42:58 AM4/5/16
to vault...@googlegroups.com
Hi Navin,

The unseal command is remote, so it can't do any checksumming on the
server. The server itself doesn't report a checksum either, partly
because reporting a checksum is super easy to fake if someone wanted
to put in a bad build...so verifying by a third party (your startup
script) is the right way to go. Make sure your startup script is also
verified. And whatever verifies that is verified. :-D

Regarding your GCP quandry, please understand that you *can*
automatically unseal if you want via various means, but as with most
security there is a tradeoff for convenience. Ultimately it's up to
your organization to decide what security plan you are comfortable
with.

--Jeff
> https://groups.google.com/d/msgid/vault-tool/2331cb5a-37d4-4f05-a3c2-988cb8e4d20c%40googlegroups.com.

navi...@gmail.com

unread,
Apr 5, 2016, 10:14:39 AM4/5/16
to Vault

Thanks Jeff. I understand the tradeoff and we might be ok with that but we also want to have a good balance between automation and security. So coming back to my original question, if we were to automate the unseal what would be be best way to do it.

Matt Button

unread,
Apr 5, 2016, 5:34:40 PM4/5/16
to vault...@googlegroups.com
Hi Navin,

We automate the unseal process on developer laptops. The approach is pretty straightforward - you just need to modify whatever supervisor script starts vault to run `vault unseal` after it starts. That way any node that starts up is automatically unsealed.


Matt

navi...@gmail.com

unread,
Apr 7, 2016, 11:59:19 AM4/7/16
to Vault

Thanks Matt. So looks like you are storing the unseal keys on the filesystem, which wont pass our security gate for Production. Are you manually unsealing in Prod if the instance reboots ?

Matt Button

unread,
Apr 7, 2016, 12:11:07 PM4/7/16
to vault...@googlegroups.com
Are you manually unsealing in Prod if the instance reboots ?

Yes, we have to manually unseal nodes. We have 2 standby instances managed by consul (MySQL is the physical backend). 

which wont pass our security gate for Production

Ok, can I ask what requirements your security gate imposes?

Matt

Brian Rodgers

unread,
Apr 8, 2016, 4:33:32 PM4/8/16
to Vault
I've given a bit of thought to this too, though I'm also very aware of the severe downsides to doing so and don't currently plan to do it.  I just keep 3 nodes running at all times, which has worked well enough despite my desire to be as automated as possible.

One approach that I'd considered (but haven't developed or even fully fleshed out in my head) would be to store the keys to unseal vault in vault itself.  If vault is up and running, a new standby node could query vault to get those credentials and unseal on startup.  If the cluster has gone down completely you would still need to manually unseal your first node, but if I'm doing a rolling restart for vault or OS upgrades, that could be more automated.  I'd need to keep a set of credentials to query vault for the unseal keys secure somewhere, which in AWS I'd probably do using KMS and an IAM instance role. 

It certainly still involves trading off on security, but it seems like a better approach then storing those keys on the file system somewhere.

navi...@gmail.com

unread,
Apr 9, 2016, 11:31:45 AM4/9/16
to Vault

> Ok, can I ask what requirements your security gate imposes?

  Wont be able to store unseal keys(even encrypted) on the vault instance filesystem for automation.

navi...@gmail.com

unread,
Apr 9, 2016, 11:50:12 AM4/9/16
to Vault

Thanks Brian. We discussed that option too and agree its better than storing the keys on the filesystem. Like you mentioned the challenge is in securing the credentials to query the unseal keys and we can use IAM (google cloud just launched IAM and still in beta) but again not completely secure. But agree that it provides a good tradeoff between security and automation.
Reply all
Reply to author
Forward
0 new messages