Multi-DC vault architecture

759 views
Skip to first unread message

Justin LaRose

unread,
Jan 8, 2016, 4:26:59 PM1/8/16
to Vault
I'm considering using this blog to replicate my vault consul data across multiple datacenters:
http://sysadminsjourney.com/blog/2015/10/30/replicating-hashicorp-vault-in-a-multi-datacenter-setup/

The blog mentions to not replicate certain keys such as under /core/leader, /core/lock and /sys/expire.

I am wondering if it would be ok to replicate the vault/auth/ keys. I'm assuming if I am able to replicate the paths under auth, I don't have to enable the different auth backends on each vault hosts.
 

Jeff Mitchell

unread,
Jan 8, 2016, 4:34:42 PM1/8/16
to vault...@googlegroups.com
Hi Justin,

Yes, that is both fine and recommended -- in fact, you don't want to
manually enable auth backends on different vault hosts because the
internal ID will be different and the data will be inaccessible. By
replicating all of the data, the mount information will also be
replicated so this will happen automagically.

Hope that helps,
Jeff
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/vault/issues
> IRC: #vault-tool on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Vault" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to vault-tool+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/vault-tool/893412be-e8c5-4f92-9baa-3ce8e41b7b23%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Justin LaRose

unread,
Jan 8, 2016, 5:12:52 PM1/8/16
to Vault
Thanks!

Justin LaRose

unread,
Jan 13, 2016, 11:25:32 AM1/13/16
to Vault
Hi Jeff,

Just for clarity, consul-replicate should replicate both:

vault/auth
vault/core/auth

The only reason I ask is the blog post mentions only vault/core/auth.

Thanks,

Justin

Jeff Mitchell

unread,
Jan 13, 2016, 12:36:17 PM1/13/16
to vault...@googlegroups.com
Hi Justin,

What's most important are the prefixes that you should *not*
replicate: /sys/expire, /core/leader, and /core/lock. I *think*
everything else should be replicated, but please understand that I
have not tested this myself, and this is not officially supported, so
due diligence is required :-)

--Jeff
> https://groups.google.com/d/msgid/vault-tool/c23942ce-51a3-4604-8d9b-a10a2e94b747%40googlegroups.com.

Justin LaRose

unread,
Jan 15, 2016, 12:45:15 AM1/15/16
to Vault
Hi Jeff, 

I have also considered treating a Consul cluster as a single logical datacenter though nodes would be in physically different datacenters. This would make consul-replicate unnecessary. I'm assuming this would be ok if there is low latency between nodes. Thoughts?

Thanks,

Justin

Jeff Mitchell

unread,
Jan 15, 2016, 9:51:43 AM1/15/16
to vault...@googlegroups.com
Hi Justin,

You'd probably have to talk to the Consul people about it. In the past
when I had non-WAN latency I had to make some code modifications in
Consul to adjust timings, but that was a while ago.

If you have a single Consul cluster across datacenters, you can run a
single Vault active/standby cluster as well.

--Jeff
> https://groups.google.com/d/msgid/vault-tool/23147e57-0b8f-4e2e-b2f4-7bbbccc3c81e%40googlegroups.com.

Justin LaRose

unread,
Jan 15, 2016, 2:07:25 PM1/15/16
to Vault
Someone brought to my attention latency isn't the only concern in my proposal. Network blips or linkage between datacenters may cause leader election to kick off which could result in 2 leaders in certain scenarios. 

Jeff Mitchell

unread,
Jan 15, 2016, 2:41:59 PM1/15/16
to vault...@googlegroups.com
Hi Justin,

Speaking in terms of Consul, not sure where you found that
information, but it's not (generally) correct. That's referred to as a
split-brain problem, and consensus algorithms (like Raft, the one in
Consul) are engineered to avoid that situation. That's why you should
always have an odd number of servers in your Consul cluster; that way,
a network partition can be handled appropriately -- if there is still
a majority, business can continue as usual, but if you no longer have
a majority of the known cluster members, actions that require
consensus refuse to be performed.

This is true even if you start with an odd number of peers and lose a
majority in all situations due to multiple network partitions (e.g. A,
B, and C are all isolated nodes) -- they will know that they each had
two other peers, and no node will declare itself the leader. If you
have an even number, a network partition can leave you with 2/2, in
which case no node can declare itself the leader -- so you can have
four nodes still online but no leader, whereas a 2/1 split with three
nodes will still allow a (single!) leader to be elected.

Network blips and outages can occur within a datacenter as well, so
there is nothing inherently different about going across datacenters.
You may get more frequent network partitions, so you may get more
frequent leader elections, but again -- network outages can happen
even within a datacenter, so Consul's consensus/leader election
handling is designed to handle this case without having a split brain.

--Jeff
> https://groups.google.com/d/msgid/vault-tool/9dcfc193-746e-46d3-ae9b-6ceea1dd6339%40googlegroups.com.

Cyril Scetbon

unread,
Dec 4, 2016, 6:07:46 PM12/4/16
to Vault
Hey Jeff, 

Regarding this architecture, you said it's not supported. What is on the road map regarding the multi-dc support ? 
In the meantime, with a Cassandra auth backend, can you confirm that using 2 different Vaults clusters with the same Cassandra auth backend is supported ? I suppose there shouldn't be any issue ?

Thank you

Jeff Mitchell

unread,
Dec 5, 2016, 11:24:38 AM12/5/16
to vault...@googlegroups.com
Hi Cyril,

On Sun, Dec 4, 2016 at 6:07 PM, Cyril Scetbon <csce...@gmail.com> wrote:
> Hey Jeff,
>
> Regarding this architecture, you said it's not supported. What is on the
> road map regarding the multi-dc support ?

The initial featureset for multi-DC replication (which will be a Vault
Enterprise-only feature) is slated for Q1 2017.

> In the meantime, with a Cassandra auth backend, can you confirm that using 2
> different Vaults clusters with the same Cassandra auth backend is supported
> ? I suppose there shouldn't be any issue ?

If you mean two Vault clusters, each with their own mount of the
Cassandra auth backend, talking to the same Cassandra cluster, it
should be fine -- the names have unique IDs in them to avoid
collisions.

Best,
Jeff

Cyril Scetbon

unread,
Dec 7, 2016, 10:30:54 PM12/7/16
to Vault
Okay thank you for the input.

I have an issue with an account surviving over the lease : 

cassandra@cqlsh> select * from system_auth.users ;

 name                                                                | super
---------------------------------------------------------------------+-------
                                                           cassandra |  True
 vault_readonly_root_71fbdfd1_66a4_3cd1_ab17_939bfff2ab6d_1480907764 | False

(2 rows)

I tried to unmount cassandra on both sides (in my 2 DCs architecture) and the account is still there.
I mounted cassandra on both sides, created again the same role, generated a few tokens :

cassandra@cqlsh> select * from system_auth.users ;


 name                                                                | super

---------------------------------------------------------------------+-------

 vault_readonly_root_26f577ec_2546_08bd_3bd4_e50001543e63_1481148020 | False

 vault_readonly_root_8c58d4d0_4c72_8e1a_9921_620d2c0c9d17_1481147948 | False

                                                           cassandra |  True

 vault_readonly_root_71fbdfd1_66a4_3cd1_ab17_939bfff2ab6d_1480907764 | False

 vault_readonly_root_3e9eb1eb_ffe0_14ae_71f8_5c55321d5553_1481148176 | False


After the lease period they all disappear and I end up with the account displayed in bold. What do you think ? The documentation says that unmounting removes all accounts created, but that's not what I have observed. A bug ? 

Jeff Mitchell

unread,
Dec 8, 2016, 10:04:56 AM12/8/16
to vault...@googlegroups.com
Hi Cyril,

It's hard to say what might be going on without more information --
it'd be helpful to know your Cassandra backend config. Also the fact
that neither of the two Vault servers think that they have an
outstanding lease (or they wouldn't allow the unmount) suggests that
maybe this account was created from a previous Vault instance?

If you're having more trouble with it, please open a ticket.

Best,
Jeff
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/vault/issues
> IRC: #vault-tool on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Vault" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to vault-tool+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/vault-tool/e36245cd-fa6c-4e14-852e-9d1538f344fe%40googlegroups.com.

Cyril Scetbon

unread,
Dec 10, 2016, 10:16:16 PM12/10/16
to Vault
Hey Jeff,

If you're having more trouble with it, please open a ticket.
Reply all
Reply to author
Forward
0 new messages