LAN vs. WAN, Security and Design

541 views
Skip to first unread message

Nico Schottelius

unread,
Jan 4, 2015, 11:57:14 AM1/4/15
to consu...@googlegroups.com
Good morning,

I wonder how to understand/use the WAN/LAN separation of consul, as most of our nodes are actually only having public ip addresses (only a minor number of embedded devices like Linux webcams are running in a private IP address space).

Some background:

We are currently running services in 4 datacenters in Switzerland and we think about discovering/monitoring our services with consul.
Two of the datacenters contain mostly nodes of customers, which should only be able to report information, but not to exec commands at all nor
query information from any other datacenter.

However at the moment we are not exactly sure, in which status consul is (alpha, beta, production?) and how to exactly utilise it.
Thus I have the following questions:

Design/Implementation wise:

- Can we combine those 4 datacenters into one consul cluster in a way that only 1 or 2 dcs can see all services/checks?
- How should we design the nodes to connect to the other datacenters? Is a mesh typical?
- Init/Startup scripts: We deploy consul using cdist [1] and can write our own init.d/upstart/systemd/whatever scripts - but if they are already existing, we would like to reuse them.

Regarding security  I wonder:

- Is it "secure" to run consul on public ip addresses, when encryption using -encrypt= is turned on?
- Can non-authorized nodes access any information from a consul cluster by default?
- How "secure" is the exec functionality? Can it in theory be accessed from a non client socket?
- Can exec be used by *any* node that is joined in a consul cluster?

Regarding LAN/WAN: 

- What is the "standard" way to connect multiple datacenters?
As described on github issue #560 [0], I assumed to have inconsistencies in the status of the cluster from different datacenters, because only one server of each datacenter joined the wan ring.

I look forward to your answers, so far consul gives the impression that it potentially *could* solve most of our monitoring problems.

Cheers,

Nico


Armon Dadgar

unread,
Jan 5, 2015, 1:48:18 PM1/5/15
to consu...@googlegroups.com, Nico Schottelius
Hey Nico,

I think there are a couple of questions here, so I’ll try to answer them but let me know if I missed something!

WAN/LAN Separation: This split is done for a few reasons, independent of having a public or private IP. By design,
each DC operates independently of all others and can tolerate the loss of any other DC. Global visibility is managed
by forwarding RPCs instead of data replication. As part of this, each of the gossip rings is optimized for different use cases.
The LAN ring assumes LAN network timings (<100 msec RTT) while the WAN ring is much more relaxed (< 1s RTT).
Part of this assumption is that because leader election is being done only within the LAN we have tighter timing constraints.

What this means practically is that if you violate these assumptions, you will get unexpected failures. e.g. If you treat
multiple data centers that are “far away” as part of a single DC, the latency will be higher than expected and you will get
spurious failures (false positive node failures, excessive leader elections, etc).

WAN deployment: In terms of how to do a WAN deployment, you are right, a mesh network is required. Typically this
is done with site-to-site VPNs. The rest of it is handled internally to Consul.

Init Scripts: There are some around the internet, we use upstart and have a pretty simple script. Here is an example of

Security on public addresses: Consul can be run securely over the WAN if you enable all the encryption features.
This means `-encrypt` for gossip, and the TLS settings with `verify_incoming` and `verify_outgoing`. See this page:

Non-Authorized Nodes: If you have the encryption stuff enabled as per above, then a node without the proper
TLS/Keys cannot join the cluster and therefor cannot query anything from it. Once a node is in the cluster, the
ACL system can be used to apply finer grained access controls.

Exec Security: You need to be a member of the cluster for this, or alternatively have access to the HTTP API
of a node that is (by default only available on loopback). It relies on the KV system, so you can again use the
ACL system to lock it down further. It can also be disabled entirely. Hopefully finer grained control coming to the
ACL system soon.

LAN/WAN connect: Typically in a multidatacenter setup, at least one DC is expected to always exist, so people
just have the servers do a “consul join -wan <DC_A_1> <DC_A_2>” … or equivalent. With Consul 0.5 there is more
configuration flags to do `start_join_wan` and similar to do this automatically on start. 

Hope that helps!

Best Regards,
Armon Dadgar

From: Nico Schottelius <nico.sch...@gmail.com>
Reply: Nico Schottelius <nico.sch...@gmail.com>>
Date: January 4, 2015 at 8:57:15 AM
To: consu...@googlegroups.com <consu...@googlegroups.com>>
Subject:  LAN vs. WAN, Security and Design

Good morning,

I wonder how to understand/use the WAN/LAN separation of consul, as most of our nodes are actually only having public ip addresses (only a minor number of embedded devices like Linux webcams are running in a private IP address space).

Some background:

We are currently running services in 4 datacenters in Switzerland and we think about discovering/monitoring our services with consul.
Two of the datacenters contain mostly nodes of customers, which should only be able to report information, but not to exec commands at all nor
query information from any other datacenter.

However at the moment we are not exactly sure, in which status consul is (alpha, beta, production?) and how to exactly utilise it.
Thus I have the following questions:

Design/Implementation wise:

- Can we combine those 4 datacenters into one consul cluster in a way that only 1 or 2 dcs can see all services/checks?
- How should we design the nodes to connect to the other datacenters? Is a mesh typical?
- Init/Startup scripts: We deploy consul using cdist [1] and can write our own init.d/upstart/systemd/whatever scripts - but if they are already existing, we would like to reuse them.I




Regarding security  I wonder:

- Is it "secure" to run consul on public ip addresses, when encryption using -encrypt= is turned on?
- Can non-authorized nodes access any information from a consul cluster by default?
- How "secure" is the exec functionality? Can it in theory be accessed from a non client socket?
- Can exec be used by *any* node that is joined in a consul cluster?

Regarding LAN/WAN: 

- What is the "standard" way to connect multiple datacenters?
As described on github issue #560 [0], I assumed to have inconsistencies in the status of the cluster from different datacenters, because only one server of each datacenter joined the wan ring.

I look forward to your answers, so far consul gives the impression that it potentially *could* solve most of our monitoring problems.

Cheers,

Nico


--
You received this message because you are subscribed to the Google Groups "Consul" group.
To unsubscribe from this group and stop receiving emails from it, send an email to consul-tool...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages