Prometheus Agent as Sidecar with Prometheus-Operator

39 views

Skip to first unread message

Arthur Silva Sens

unread,

Apr 14, 2023, 9:00:02 AM4/14/23

to Prometheus Developers

Hi everybody, I'm Arthur from the Prometheus-Operator team.

We've recently added support for running Prometheus in Agent mode with Prometheus-Operator and we've started to brainstorm new Deployment Patterns that could be explored with the Agent, e.g. as Daemonsets or Sidecars.

At this point in time, I'm drafting how things could look like if Prometheus Agent is run as Pod sidecars, and would love to know the opinion of the community about it. I'm particularly interested to know if there is an appetite from the community for such a deployment pattern and if you find new failure modes with that approach.

Here is the proposal:

Agent Deployment Pattern: Sidecar Injection

Summary

With Prometheus-Operator finally supporting running Prometheus in Agent mode, we can start thinking about different deployment patterns that can be explored with this minimal container. This document aims to continue the work started by this document, focusing on exploring how Prometheus-Operator can leverage deploying PrometheusAgents as sidecars running alongside pods that a user wants to monitor.

Background

By the time this document was written, Prometheus-Operator can deploy Prometheus in Agent mode, but only using a pattern similar to the original implementation of Prometheus Server: using StatefulSets. The original design document for Prometheus Agent already mentions that different deployment patterns are desired, however, for the sake of speeding up the initial implementation it was decided to re-use the logic and start with the Agent running as StatefulSets.

Also for the sake of speeding up implementation, this document won't focus on several new Deployment patterns, but only one: Sidecar Injection.

Looking at the traditional deployment model, we have a single Prometheus (or an HA setup) per cluster or namespace, responsible for scraping all containers under their scope. Prometheus operator relies on ServiceMonitor, PodMonitor, and Probe CRs to configure Prometheus, which will eventually use Kubernetes service-discovery to find endpoints that need to be scraped.

Depending on the Cluster's scale and how often Prometheus hits Kubernetes API, Prometheus service discovery can increase the load on the API significantly and affect the overall functionality of said cluster.

Another problem is that one or more containers can be updated to a problematic version that causes a Cardinality Spike. Depending on the proportion of the spike, it is possible that a container could single-handedly crash the monitoring system of the whole cluster.

Proposal

This document proposes a new deployment model where Prometheus-Operator injects Prometheus agents as a sidecar container (and Prometheus config reloader) to pods that needs to be scrapped. With a sidecar, we tackle both problems mentioned above:

Load on Kubernetes API won't exist since it's not needed anymore. Prometheus will scrape containers from the same pod through their shared network interface and scrape configuration can be declared via pod annotations.
A sudden cardinality spike will not affect the whole monitoring system. In a worst-case scenario, it will fail a single pod.

A common pattern used with Prometheus's Kubernetes service discovery is the usage of annotation to declaratively tell Prometheus which endpoints need to be scraped. From a code search at Github for prometheus.io/scrape: "true", we can tell that this approach has good adoption already. To not conflict with the already commonly used annotation, we can start with our own, but with a very similar approach.

The existing PrometheusAgent CRD would be extended with a new field called mode, which can be one of two values(for now): [statefulset, sidecar], with statefulset as default. If mode is set to sidecar, Prometheus-Operator won't deploy any Prometheus agents initially. Instead, it will watch for Pod updates and inject the Prometheus Agent as a sidecar with the pre-determined annotations present.

In addition to telling the deployment model, the Agent CR will be the source of truth for remote-write configuration, such as URL and authentication. A change to the remote-write configuration would still require a hot reload of potentially millions of agent sidecar containers, but by avoiding having the remote-write configuration in pod annotation we at least avoid requiring that the Pod manifest also needs to be upgraded.

If different sets of pods require different remote-write configurations, then multiple PrometheusAgent CRs are needed. This means that the pod also needs to specify which Agent CR will inject the sidecar:

apiVersion: v1
kind: Pod
metadata:
name: example
annotations:
prometheus.operator.io/scrape: "true"
prometheus.operator.io/path: "/metrics"
prometheus.operator.io/port: "8080"
prometheus.operator.io/scrape-interval: "60s"
prometheus.operator.io/agent-selector: "monitoring/agent-example"
spec:
...
---
apiVersion: monitoring.coreos.com/v1alpha1
kind: PrometheusAgent
metadata:
name: agent-example
namespace: monitoring
spec:
mode: sidecar
remoteWrite:
- url: https://example.com

With a visualization:

What to do with ServiceMonitor, PodMonitor, and Probe selectors?

With the sidecar approach, our goal is to scale Prometheus horizontally while avoiding impact in the Kubernetes API. It wouldn't make sense for a sidecar to also scrape metrics from other pods.

If mode is set to sidecar, a validating webhook would forbid PrometheusAgent CRs to be created/updated with the following fields:

serviceMonitorSelector
serviceMonitorNamespaceSelector
podMonitorSelector
podMonitorNamespaceSelector
probeSelector
probeNamespaceSelector

CaveatsConfig Hot Reload

There will be two ways to change Prometheus configuration now, 1) by changing annotation on the pod and 2) by changing the remote-write field in PrometheusAgent CRD. The first one will only trigger a hot reload for the involved pod, but the latter has the potential to trigger millions of hot reloads, depending on the scale of the cluster.

While there is no research regarding the config-reloader efficiency, this particular container might become problematic for huge-scale environments.

WAL not optimized for small environments

Prometheus Write-Ahead-log(WAL) is stored as a sequence of numbered files with 128MiB each by default. This means that, by default, at least 128MiB is needed for running Prometheus Agent if we ignore every other part of Prometheus. Using a sidecar, we're optimizing for horizontal scale and 128MiB might be much more than necessary to store metrics from a single Pod.

Lack of High-Availability setup

With the problem that Prometheus is not optimized for very small environments, injecting 2 sidecars per Pod sounds like a big waste of resources. However, with only 1 sidecar HA Prometheus won't be an option.

With that said, having an HA Prometheus in the traditional deployment pattern seems to be more critical than the sidecar approach. That's because with Prometheus fails in the first approach we lose the monitoring stack for the whole cluster, while with the latter we just lose metrics from a pod.

References

Arthur Silva Sens

unread,

Apr 14, 2023, 9:09:24 AM4/14/23

to Prometheus Developers

Oof, just noticed that the images do not load in some email clients 😬

The proposal can also be seen at this Pull Request, with the images :)

--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-developers/JHmnU8IVGMc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/d3e4d7c7-d79e-494a-bdcc-32ce2d04a88dn%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages