Need clarifications for HA mode Vault-Consul cluster

1,275 views
Skip to first unread message

PJ

unread,
Feb 29, 2016, 5:51:54 PM2/29/16
to Vault
First of all, thanks for the great community here. Many of the discussions here helped me immensely to setup a three node vault-consul cluster on three separate servers. After setting up and playing around a little bit, there were some questions as follows, that I couldn't get answers for in the docs here: https://www.vaultproject.io/ or https://www.consul.io:

1. What are the ways to improve performance of a vault cluster?
    a. Does adding new vault affect the performance or are redundant vault servers just installed for fault-tolerance?
    b. Why cant any of the three vaults, answer the queries itself instead of redirecting to the master vault node?

2. If a consul node is down, is the corresponding vault node marked as unavailable too?

3. What are the possible benefits of pairing vault:consul nodes in n:1 or 1:n format, instead of the recommended 1:1 pairing?

4. In a multi-datacenter scenario for vault-consul, if all the nodes in a consul cluster inside one datacenter are unavailable, how do we ensure request to vault, routes to the consul cluster in other datacenter?

5. If an active instance of Vault node fails, is it the responsibility of the REST API client to discover hot-standby nodes?

6. How do hot-standby nodes understand, if and when, the active(master) vault is unavailable?

7. Does master vault, have a memory cache? For each read request, does vault go to its backend, to fetch the results?

8. What are the impacts of hardware or filesystem or O.S. failure on one or multiple servers for vault-consul cluster? What is the recovery mechanism in such cases?

This will really help improve my understanding of Vault. Any insights are greatly appreciated.

Jeff Mitchell

unread,
Feb 29, 2016, 8:26:14 PM2/29/16
to vault...@googlegroups.com
Hi PJ,

> 1. What are the ways to improve performance of a vault cluster?

Answers to your specific questions are below; I'm not yet sure if I
can go into full detail, but I can say that I have managed to pass >
37k requests per second through Vault with audit logging off and > 24k
with file-based audit logging on, across a range of concurrent clients
numbering from 100 to 1000. So Vault's pretty speedy. Chances are
decent that the limiting factor in Vault's speed when you run it won't
be Vault itself but rather the networked physical and logical backends
(on the physical side, there is an LRU cache that helps quite a lot).

> a. Does adding new vault affect the performance or are redundant vault
> servers just installed for fault-tolerance?

Vault is active/standby, so extra standby nodes simply provide fault tolerance.

> b. Why cant any of the three vaults, answer the queries itself instead
> of redirecting to the master vault node?

One of the main reasons for this design is that it drastically
simplifies the operational model of Vault; as a result, it is easier
to have a good understanding of what might be happening inside Vault
at any given time. This is a really nice thing in a security product.
Introducing Raft or Paxos adds a lot of complexity, and even in a
leader/follower scenario like you might get with Raft or Paxos, if you
want strong consistency and no possibility of stale reads, you need to
forward queries to the master anyways.

Also, I mentioned the LRU cache on the physical side; we'd have to
have a much more complex networked cache (or forego a cache
altogether) if we had multiple Vault nodes writing. So that would
negate a lot of the potential speedups from having multiple active
nodes.

Altogether this is a simpler model (which is really nice for security)
and it's not at all clear that multiple-active would be any faster
(and it may be slower).

> 2. If a consul node is down, is the corresponding vault node marked as
> unavailable too?

I assume here you mean a local Consul agent rather than a Consul
server node? If you have Vault connecting through your local Consul
agent (which is the recommended approach), taking the Consul agent
down will affect Vault's ability to communicate, so if connectivity
isn't restored quickly then another node will take over active duty.

> 3. What are the possible benefits of pairing vault:consul nodes in n:1 or
> 1:n format, instead of the recommended 1:1 pairing?

I'm not sure where you saw a 1:1 pairing. Since standby nodes simply
increase fault tolerance, one or two should be enough, regardless of
how many Consul servers you have. We do recommend connecting Vault
through a local agent on the Vault node, because that way the Consul
agent handles such things as directing queries to the current Consul
leader to avoid request forwarding.

> 4. In a multi-datacenter scenario for vault-consul, if all the nodes in a
> consul cluster inside one datacenter are unavailable, how do we ensure
> request to vault, routes to the consul cluster in other datacenter?

Vault is not multi-datacenter aware. More importantly, the K/V stores
of Consul are per-datacenter. So you wouldn't want Vault to simply
redirect to the other datacenter, because it'd be a totally different
data set.

> 5. If an active instance of Vault node fails, is it the responsibility of
> the REST API client to discover hot-standby nodes?

There are a lot of ways to skin this cat, but if you're running with
Consul, using Consul health checks and connecting to the service
address for Vault should do this automatically. We're considering
putting in a Consul TTL-based health check directly in Vault for users
of the Consul backend; this way any such failover should happen very
quickly. Although you can already do health checks with e.g. 1 second
TTL, this would allow a Vault node to explicitly mark itself as
available/unavailable when its status changes (e.g. shutting down,
starting up, getting unsealed, etc.)

> 6. How do hot-standby nodes understand, if and when, the active(master)
> vault is unavailable?

Vault uses a lock in Consul; when the lock is released, either
explicitly or due to a session failure from that node's Consul client,
one of the other nodes is able to (atomically) grab it. Managing to
grab it lets that node that it is now the leader. You can get more
information here:
https://www.consul.io/docs/guides/leader-election.html

> 7. Does master vault, have a memory cache? For each read request, does vault
> go to its backend, to fetch the results?

Yes. There is an LRU cache for the physical store that is invalidated
on write. Some backends (e.g. transit) also have their own specialty
caches as well.

> 8. What are the impacts of hardware or filesystem or O.S. failure on one or
> multiple servers for vault-consul cluster? What is the recovery mechanism in
> such cases?

It depends on the underlying physical store. Vault is basically a
specialty database storing its data in some other service, so disaster
recover procedures for that service should be used to restore Vault's
data in case of disaster. At HC, we have snapshots of our Consul
cluster taken every five minutes. In the event of catastrophic failure
of our Consul cluster, we can simply restore one of the snapshots and
as long as the KV store is restored, Vault will be fine.

Best,
Jeff

PJ

unread,
Mar 1, 2016, 3:09:50 PM3/1/16
to Vault
Thanks so much, Jeff for your responses. I have a few follow up questions/comments inline.


On Monday, February 29, 2016 at 5:26:14 PM UTC-8, Jeff Mitchell wrote:
Hi PJ,

> 1. What are the ways to improve performance of a vault cluster?

Answers to your specific questions are below; I'm not yet sure if I
can go into full detail, but I can say that I have managed to pass >
37k requests per second through Vault with audit logging off and > 24k
with file-based audit logging on, across a range of concurrent clients
numbering from 100 to 1000. So Vault's pretty speedy. Chances are
decent that the limiting factor in Vault's speed when you run it won't
be Vault itself but rather the networked physical and logical backends
(on the physical side, there is an LRU cache that helps quite a lot).


The data points you mention are difficult to put into context without knowledge of how powerful the servers running the vault nodes were. But I understand why you cannot go into details.

 
>     a. Does adding new vault affect the performance or are redundant vault
> servers just installed for fault-tolerance?

Vault is active/standby, so extra standby nodes simply provide fault tolerance.

>     b. Why cant any of the three vaults, answer the queries itself instead
> of redirecting to the master vault node?

One of the main reasons for this design is that it drastically
simplifies the operational model of Vault; as a result, it is easier
to have a good understanding of what might be happening inside Vault
at any given time. This is a really nice thing in a security product.
Introducing Raft or Paxos adds a lot of complexity, and even in a
leader/follower scenario like you might get with Raft or Paxos, if you
want strong consistency and no possibility of stale reads, you need to
forward queries to the master anyways.

Also, I mentioned the LRU cache on the physical side; we'd have to
have a much more complex networked cache (or forego a cache
altogether) if we had multiple Vault nodes writing. So that would
negate a lot of the potential speedups from having multiple active
nodes.

Altogether this is a simpler model (which is really nice for security)
and it's not at all clear that multiple-active would be any faster
(and it may be slower).

I was not suggesting we use Paxos or Raft in Vault. The deployment model I had in mind was, without an LRU cache(or only local cache for that node), if Vault can act as dumb nodes to service requests and get data from backend. The consistency and replication of data should be the responsibility of a backend- consul. This helps in scenarios where we are limited by how fast servers that hosts vault, can process requests and respond. It would be a good exercise to measure performance between, three active nodes on slower machines according to the above deployment model compared to one active node on a faster machine, as per the current deployment model.
 
> 2. If a consul node is down, is the corresponding vault node marked as
> unavailable too?  

I assume here you mean a local Consul agent rather than a Consul
server node? If you have Vault connecting through your local Consul
agent (which is the recommended approach), taking the Consul agent
down will affect Vault's ability to communicate, so if connectivity
isn't restored quickly then another node will take over active duty.

Not really sure, what you mean here "another node will take over active duty"? Is it consul or vault node?

> 3. What are the possible benefits of pairing vault:consul nodes in n:1 or
> 1:n format, instead of the recommended 1:1 pairing?

I'm not sure where you saw a 1:1 pairing. Since standby nodes simply
increase fault tolerance, one or two should be enough, regardless of
how many Consul servers you have. We do recommend connecting Vault
through a local agent on the Vault node, because that way the Consul
agent handles such things as directing queries to the current Consul
leader to avoid request forwarding.

From the discussions on this topic: https://groups.google.com/d/topic/vault-tool/tyA8SKKq_Ic/discussion I understood, that we need to have 1 vault and 1 consul "server" node on the same server. But it seems like I am missing something. Could you point me to resources that will help me deploy the recommended approach? Please note, I am not using consul for anything else but as k/v store for vault.


> 4. In a multi-datacenter scenario for vault-consul, if all the nodes in a
> consul cluster inside one datacenter are unavailable, how do we ensure
> request to vault, routes to the consul cluster in other datacenter?

Vault is not multi-datacenter aware. More importantly, the K/V stores
of Consul are per-datacenter. So you wouldn't want Vault to simply
redirect to the other datacenter, because it'd be a totally different
data set.

Wow, I wonder why k/v stores are per-datacenter, because I always thought that consul has first class support for multi-datacenter ecosystem. Is multi-datacenter support for k/v store in the pipeline for consul? What are the challenges or principles behind supporting or not supporting it?

> 5. If an active instance of Vault node fails, is it the responsibility of
> the REST API client to discover hot-standby nodes?

There are a lot of ways to skin this cat, but if you're running with
Consul, using Consul health checks and connecting to the service
address for Vault should do this automatically. We're considering
putting in a Consul TTL-based health check directly in Vault for users
of the Consul backend; this way any such failover should happen very
quickly. Although you can already do health checks with e.g. 1 second
TTL, this would allow a Vault node to explicitly mark itself as
available/unavailable when its status changes (e.g. shutting down,
starting up, getting unsealed, etc.)

 So basically, the REST API client can perform a status check right now using this API: https://www.vaultproject.io/docs/http/sys-health.html ?

> 6. How do hot-standby nodes understand, if and when, the active(master)
> vault is unavailable?

Vault uses a lock in Consul; when the lock is released, either
explicitly or due to a session failure from that node's Consul client,
one of the other nodes is able to (atomically) grab it. Managing to
grab it lets that node that it is now the leader. You can get more
information here:
https://www.consul.io/docs/guides/leader-election.html

Thanks I will read more and get back to you with any questions

> 7. Does master vault, have a memory cache? For each read request, does vault
> go to its backend, to fetch the results?

Yes. There is an LRU cache for the physical store that is invalidated
on write. Some backends (e.g. transit) also have their own specialty
caches as well.  

Makes sense

> 8. What are the impacts of hardware or filesystem or O.S. failure on one or
> multiple servers for vault-consul cluster? What is the recovery mechanism in
> such cases?

It depends on the underlying physical store. Vault is basically a
specialty database storing its data in some other service, so disaster
recover procedures for that service should be used to restore Vault's
data in case of disaster. At HC, we have snapshots of our Consul
cluster taken every five minutes. In the event of catastrophic failure
of our Consul cluster, we can simply restore one of the snapshots and
as long as the KV store is restored, Vault will be fine.

The idea of snapshots is very appealing. Thanks this helps! Wonder if the snapshots are moved to a separate machine that is not running consul- something like a backup server.


Best,
Jeff

Jeff Mitchell

unread,
Mar 1, 2016, 3:53:47 PM3/1/16
to vault...@googlegroups.com
On Tue, Mar 1, 2016 at 3:09 PM, PJ <pushkar.jo...@gmail.com> wrote:
> The data points you mention are difficult to put into context without
> knowledge of how powerful the servers running the vault nodes were. But I
> understand why you cannot go into details.

I know. I'm sorry about that. I'll say that the servers were...very commodity.

> I was not suggesting we use Paxos or Raft in Vault. The deployment model I
> had in mind was, without an LRU cache(or only local cache for that node), if
> Vault can act as dumb nodes to service requests and get data from backend.
> The consistency and replication of data should be the responsibility of a
> backend- consul. This helps in scenarios where we are limited by how fast
> servers that hosts vault, can process requests and respond. It would be a
> good exercise to measure performance between, three active nodes on slower
> machines according to the above deployment model compared to one active node
> on a faster machine, as per the current deployment model.

There's a blog post floating around about a replicated
multi-datacenter deployment strategy for Vault. It's not officially
sanctioned by HC (although I did help the author out a bit, including
plumbing in an option to turn off the LRU cache). So depending on your
needs you could run something like it. Really, the generic backend
isn't the issue here; it's other backends that issue leases and need
to revoke things. We need to ensure that these operations run, and
also that they run only once, so having one true active node is hugely
important.

In general I think foregoing the cache and trying to have multiple
Vault servers query a networked storage backend on each request is
likely to lead to much worse performance, not better.

>> > 2. If a consul node is down, is the corresponding vault node marked as
>> > unavailable too?
>>
>>
>> I assume here you mean a local Consul agent rather than a Consul
>> server node? If you have Vault connecting through your local Consul
>> agent (which is the recommended approach), taking the Consul agent
>> down will affect Vault's ability to communicate, so if connectivity
>> isn't restored quickly then another node will take over active duty.
>
>
> Not really sure, what you mean here "another node will take over active
> duty"? Is it consul or vault node?

Depends on whether you meant a Consul server or a Vault node goes
down. :-) In either case, another server should take over. But,
usually when you are using Consul, you'll also have Consul agents on
each host, including the Vault host. Connecting to Consul through the
local agent is the recommended way to go, but it does mean that if
that local agent is down, Vault will lose its connection to the Consul
servers.

> From the discussions on this topic:
> https://groups.google.com/d/topic/vault-tool/tyA8SKKq_Ic/discussion I
> understood, that we need to have 1 vault and 1 consul "server" node on the
> same server.

No -- if you are using Consul, and if you are connecting Vault to
Consul using our recommended approach of having Vault connect to a
local Consul agent, then you will have a local Consul agent on each
Vault server. These do not have to be Consul servers, and in fact we
generally recommend that they aren't -- purely for defense-in-depth
reasons we suggest that Vault servers run only Vault (and a Consul
agent, and your init daemon, and your syslog, and other normal
things).

> Wow, I wonder why k/v stores are per-datacenter, because I always thought
> that consul has first class support for multi-datacenter ecosystem. Is
> multi-datacenter support for k/v store in the pipeline for consul? What are
> the challenges or principles behind supporting or not supporting it?

Consul does have first-class support for multi-datacenter ecosystems,
but it depends what you're doing with it. Multi-datacenter support was
designed for the service discovery/querying side of things. I do know
that you can query K/V stores in other datacenters, but the K/V stores
in each datacenter are distinct. As to why, you'd have to ask the
Consul devs, sorry.

> So basically, the REST API client can perform a status check right now
> using this API: https://www.vaultproject.io/docs/http/sys-health.html ?

Correct. The return code indicates the node's state.

>> > 8. What are the impacts of hardware or filesystem or O.S. failure on one
>> > or
>> > multiple servers for vault-consul cluster? What is the recovery
>> > mechanism in
>> > such cases?
>>
>> It depends on the underlying physical store. Vault is basically a
>> specialty database storing its data in some other service, so disaster
>> recover procedures for that service should be used to restore Vault's
>> data in case of disaster. At HC, we have snapshots of our Consul
>> cluster taken every five minutes. In the event of catastrophic failure
>> of our Consul cluster, we can simply restore one of the snapshots and
>> as long as the KV store is restored, Vault will be fine.
>
>
> The idea of snapshots is very appealing. Thanks this helps! Wonder if the
> snapshots are moved to a separate machine that is not running consul-
> something like a backup server.

IIRC they are encrypted and stored on a different machine. My general
advice when anyone is thinking of snapshots is to think of ZFS. :-D

Best,
Jeff

PJ

unread,
Mar 2, 2016, 2:15:01 PM3/2/16
to Vault
Thanks Jeff. You have been incredibly helpful. Few follow up questions:

  1. I assume, the session locking, release and acquire is already built-in to Vault. So the failover to standby Vault node should be automatic. Is that correct?
  2. Can you confirm that the recommended deployment is as follows:
    • Vault A and consul agent(client) A on server A
      Vault B and consul agent(client) B on server B
      Optional: vault C and consul agent(client) C on server C

      Consul agent(server) D on server D
      Consul agent(server) E on server E
      Consul agent(server) F on server F

      So the LAN members for consul will have three clients and three servers
  3. Can you confirm, incase of failover to standby, we lose the data in LRU cache of the failed master? In other words, the contents of LRU cache are not replicated to the new master.
Thanks again. I have a few followup questions about consul, but as you suggested I will ask them to consul devs.

Jeff Mitchell

unread,
Mar 2, 2016, 2:21:14 PM3/2/16
to vault...@googlegroups.com
On Wed, Mar 2, 2016 at 2:15 PM, PJ <pushkar.jo...@gmail.com> wrote:
> Thanks Jeff. You have been incredibly helpful. Few follow up questions:

No problem, glad to help!

> I assume, the session locking, release and acquire is already built-in to
> Vault. So the failover to standby Vault node should be automatic. Is that
> correct?

That's correct, although keep in mind that only unsealed standby nodes
can take over.

> Can you confirm that the recommended deployment is as follows:
>
> Vault A and consul agent(client) A on server A
> Vault B and consul agent(client) B on server B
> Optional: vault C and consul agent(client) C on server C
>
> Consul agent(server) D on server D
> Consul agent(server) E on server E
> Consul agent(server) F on server F
>
> So the LAN members for consul will have three clients and three servers

That looks fine.

> Can you confirm, incase of failover to standby, we lose the data in LRU
> cache of the failed master? In other words, the contents of LRU cache are
> not replicated to the new master.

Correct, it's in-memory only. When a node takes over active duty, it
purges the cache to ensure that it gets a clean view of the state of
the data store.

Best,
Jeff
Reply all
Reply to author
Forward
0 new messages