We're actually going through this exercise right now, so I'd love to share what we're thinking for deployment, and also help anyone out who is getting stuck.
Our environment consists of the following right now: Nomad, Consul, Vault (no need to talk about this here), and linkerd. We've been heavy users of Consul for a while, and the other three are relatively new additions into our environment.
We have a lot of services we'll be pushing into our Nomad environment soon, but we'll continue to have to support applications outside of it. These services can talk gRPC (HTTP 2) and HTTP 1 (mostly JSON REST APIs). Both of these play into our design decisions for how we've tied Nomad/linkerd/Consul together:
- We need to be able to run linkerd on hosts that don't have Docker/Nomad
- Our services don't necessarily (and shouldn't have to know) how to talk TLS, but we would like to secure transport between hosts
- HTTP 1.1 services will route traffic based on the HTTP Host header, gRPC services based on the gRPC path
- All our services (the Nomad and outside Nomad) services register services in Consul using Registrator and we want to use this same lookup information to route via linkerd
- We don't want to limit ourselves in a way that we can't do green/blue or canary deploys of services where a small amount of traffic is sent one path, and the rest by way of the normal service path
- We didn't want to use a transparent proxy approach with linkerd, and instead favoured explicit configuration of services to use the linkerd endpoints
To that end (mainly to cover the first point), we deploy linkerd on all hosts that will communicate with our services outside of Docker (we have it packaged as a Debian package). This is the "per-host" deployment scenario linkerd talk about here:
https://linkerd.io/in-depth/deployment/. If you are a Nomad only environment, you could accomplish the same thing by deploying linkerd as a Nomad service, so you end up with one instance on every Nomad host.
To secure communications into and between hosts, we use the "linker to linker" configuration pattern discussed on that same page. To talk from service A to service B, you enter service A via linkerd, it reaches out to linkerd (TLS added here), which routes to the destination linkerd (TLS removed here), and then to the destination service. This results in two routers in linkerd: an outbound router for traffic leaving a service to hit another service, and an ingress router for traffic received from other linkerd instances.
To handle both HTTP 1.1 and gRPC services, we define routers for each. This is where the behaviour is different.
For a HTTP 1.1 service, we expect that the HTTP host changes depending upon the service you want to talk to. For example, if you wanted to talk to our auth service ("auth.service.consul") via the linkerd path, you should request it via
http://auth.linkerd:4140. To handle this DNS resolution, we run a local Unbound instance on every host in our environment. It's configured to resolve "*.linkerd" to a dummy link-local interface (169.254.1.1), so containers are able to resolve that name. linkerd when it sees a request like this is configured to strip ".linkerd" from the hostname, and ask Consul for a list of backend services. The request is then forwarded to the destination linkerd, which then maps it to a locally running instance.
For a gRPC service, we'll be handling things a little differently. Right now it's a fixed list in our prototype environment, but let me describe how we want it to work: In this instance, we'll be running linkerd's namerd, as well as an internal service one of our engineers has built called "Meshroute". Meshroute looks at our Consul services, and discovers any service with a "grpc:xyz" tag. xyz is extracted, and is used to build a routing table of gRPC service name/path to Consul service name. This dtab routing table is stored in Consul's K/V store, and namerd is configured to go look it up. It looks something like this:
/svc/bigcommerce.rpc.storeconfig.StoreConfig => /#/io.l5d.consul/.local/storeconfig;
In this case, all our services are configured with a single RPC host:
http://linkerd:4142. The path based routing provided by linkerd takes care of getting you to the right spot. In the future we can use this same tag extraction process to provide advanced behaviour like traffic splitting, weighting, etc for services that share the same gRPC tag, by building a different dtab configuration and pushing it into Consul for namerd to consume.
The important thing to understand about "linker to linker" configurations is that you need to use both the "port" and "localhost" transformers. When traffic leaves a linkerd instance, it should hit the linkerd instance running on the destination host instead of the service directly. To do this, you apply a port transformation on the looked up hosts which transforms the port to the linkerd port. On the receiving side, you do the same Consul service lookup again but you need to filter it down to the service instances running on that host. To do this you use the "localhost" transformer which will basically strip namer lookup result down to only the services that have an IP address that exists on that host.
Note: because of how we do the *.linkerd routing, and how our services expect a HTTP host header to be set that matches the consul service (*.service.consul), you'll see two namers in there: one that overrides the HTTP host based on the Consul service name, and one that doesn't. The "consul_to_linker" service doesn't mangle the host header, and is applied only on outbound requests to other linkerd instances. When we finally route to the destination service, we want the host header overwritten with the Consul service name.
Chris