Hey,
More data points:
A critical role the agent plays is participating in the gossip layer (
https://www.consul.io/docs/internals/gossip.html). Most importantly from that is the ability for efficient detection of node failures and disseminating that data quickly and cheaply (cpu and network) at large scale. When a node is unhealthy, we implicit mark the services on that node as unhealthy and attempt to route away (via service discovery). For very large deployments where failure is the norm, this sort of cheap failure detection is very valuable. To learn more, watch this awesome talk our Dir. of Research gave:
https://www.youtube.com/watch?v=u-a7rVJ6jZY
The local agent also resolves the “initial service discovery” problem: how do you discover the service discoverer? There are a lot of ways to answer this, but with Consul one way that is almost always true is: `localhost:8500`. The local agent makes it always available for applications, and the concern of cluster joining and membership is on a single point rather than every application.
Another important role the agent plays is managing local service registrations, executing health checks, and keeping the server up to date even in the case of failure. This is known as “Anti-Entropy” and is documented here:
https://www.consul.io/docs/internals/anti-entropy.html But basically: services can continue to register/deregister/update and health checks continue to run while servers are down, and when the servers are available, the client agent syncs and ensures the server state is up to date and correct. This introduces a nice failure tolerance because a lot of operations
rarely fail (such as service registration) so long as the local agent is running.
More recently with the introduction of Connect (
https://www.consul.io/docs/connect/index.html), the agent plays another critical role: edge caching for performance-critical and security-critical data. For Connect, the private key used to sign leaf certificates for services lives directly with the client agent where the service is registered and is not replicated anywhere. Further, all the CA certificates, intentions (access control rules), etc. are all cached locally so that API calls are extremely cheap, a property that is particularly important with a feature like Connect.
The client agent is therefore highly recommended. Consul
does work without it (by design, to support nodes that can’t have an agent, see:
https://github.com/hashicorp/consul-esm too), but it is important to understand the tradeoffs.
Hope that helps.