Hey Nico,
I think there are a couple of questions here, so I’ll try to answer them but let me know if I missed something!
WAN/LAN Separation: This split is done for a few reasons, independent of having a public or private IP. By design,
each DC operates independently of all others and can tolerate the loss of any other DC. Global visibility is managed
by forwarding RPCs instead of data replication. As part of this, each of the gossip rings is optimized for different use cases.
The LAN ring assumes LAN network timings (<100 msec RTT) while the WAN ring is much more relaxed (< 1s RTT).
Part of this assumption is that because leader election is being done only within the LAN we have tighter timing constraints.
What this means practically is that if you violate these assumptions, you will get unexpected failures. e.g. If you treat
multiple data centers that are “far away” as part of a single DC, the latency will be higher than expected and you will get
spurious failures (false positive node failures, excessive leader elections, etc).
WAN deployment: In terms of how to do a WAN deployment, you are right, a mesh network is required. Typically this
is done with site-to-site VPNs. The rest of it is handled internally to Consul.
Init Scripts: There are some around the internet, we use upstart and have a pretty simple script. Here is an example of
Security on public addresses: Consul can be run securely over the WAN if you enable all the encryption features.
This means `-encrypt` for gossip, and the TLS settings with `verify_incoming` and `verify_outgoing`. See this page:
Non-Authorized Nodes: If you have the encryption stuff enabled as per above, then a node without the proper
TLS/Keys cannot join the cluster and therefor cannot query anything from it. Once a node is in the cluster, the
ACL system can be used to apply finer grained access controls.
Exec Security: You need to be a member of the cluster for this, or alternatively have access to the HTTP API
of a node that is (by default only available on loopback). It relies on the KV system, so you can again use the
ACL system to lock it down further. It can also be disabled entirely. Hopefully finer grained control coming to the
ACL system soon.
LAN/WAN connect: Typically in a multidatacenter setup, at least one DC is expected to always exist, so people
just have the servers do a “consul join -wan <DC_A_1> <DC_A_2>” … or equivalent. With Consul 0.5 there is more
configuration flags to do `start_join_wan` and similar to do this automatically on start.
Hope that helps!
Best Regards,
Armon Dadgar