The idea behind sampling is to control the spans you send to your observabilitybackend, resulting in lower ingest costs. Different organizations will havetheir own reasons for not just why they want to sample, but also what theywant to sample. You might want to customize your sampling strategy to:
There are also some limitations to consider that are related to OpenTelemetry.Note that some of these limitations also apply more broadly to any client-hostedtail-based sampling solution, not just OpenTelemetry.
Tail-based sampling works with Grafana Agent in Flow or static modes.Flow mode configuration files are written in River.Static mode configuration files are written in YAML.Examples in this document are for Flow mode. You can also use the Static mode Kubernetes operator.
In tail-based sampling, sampling decisions are made at the end of the workflow allowing for a more accurate sampling decision.The Grafana Agent groups spans by trace ID and checks its data to seeif it meets one of the defined policies (for example, latency or status_code).For instance, a policy can check if a trace contains an error or if it tooklonger than a certain duration.
To group spans by trace ID, the Agent buffers spans for a configurable amount of time,after which it considers the trace complete.Longer running traces are split into more than one.However, waiting longer times increases the memory overhead of buffering.
One particular challenge of grouping trace data is for multi-instance Agent deployments,where spans that belong to the same trace can arrive to different Agents.To solve that, you can configure the Agent to load balance traces across agent instancesby exporting spans belonging to the same trace to the same instance.
This is achieved by redistributing spans by trace ID once they arrive from the application.The Agent must be able to discover and connect to other Agent instances where spans for the same trace can arrive.Kubernetes users should use a headless service.
Redistributing spans by trace ID means that spans are sent and received twice,which can cause a significant increase in CPU usage.This overhead increases with the number of Agent instances that share the same traces.
Grafana Agent Flow is a component-based revision of Grafana Agent with a focus on ease-of-use, debuggability, and ability to adapt to the needs of power users.Flow configuration files are written in River instead of YAML.
The following policy will only sample traces where all of the conditions for the sub-policies are met.In this case, it takes the prior two policies and will only sample traces where the span attribute http.target does not contain the value /healthcheck or is prefixed with /metrics/ and at least one of the spans of the trace contains an OpenTelemetry Error status code.
For example, the most common form of head sampling isConsistent Probability Sampling.It may also be referred to as Deterministic Sampling. In this case, a samplingdecision is made based on the trace ID and a desired percentage of traces tosample. This ensures that whole traces are sampled - no missing spans - at aconsistent rate, such as 5% of all traces.
The primary downside to head sampling is that it is not possible make a samplingdecision based on data in the entire trace. This means that head sampling iseffective as a blunt instrument, but is wholly insufficient for samplingstrategies that must take whole-system information into account. For example, itis not possible to use head sampling to ensure that all traces with an errorwithin them are sampled. For this, you need Tail Sampling.
As you can see, tail sampling allows for a much higher degree of sophistication.For larger systems that must sample telemetry, it is almost always necessary touse Tail Sampling to balance data volume with usefulness of that data.
Finally, for some systems, tail sampling may be used in conjunction with HeadSampling. For example, a set of services that produce an extremely high volumeof trace data may first use head sampling to only sample a small percentage oftraces, and then later in the telemetry pipeline use tail sampling to make moresophisticated sampling decisions before exporting to a backend. This is oftendone in the interest of protecting the telemetry pipeline from being overloaded.
With a tail sampling strategy, the decision to sample the trace is made considering all or most of the spans. For example, tail sampling is a good option to sample only traces that have errors or traces with long request duration.
For Application Observability, we recommend sampling at the data collector after metrics generation so that all traces are available to generate accurate metrics. If the metrics are generated from sampled traces, their values will be affected by the sampling.
The collector receives all traces, generates metrics, and sends metrics to Grafana Cloud Prometheus. In parallel, the collector applies a tail sampling strategy to the traces and sends sampled data to Grafana Cloud Tempo.
To view the Grafana Alloy configuration for tail sampling, select the river tab below. To view the OpenTelemetry Collector configuration for tail sampling, select the yaml tab below.
The Legacy option for span metrics source in the configuration is for customers who use Grafana Alloy or OpenTelemetry Collector with metric names that match those used by the Tempo metrics generator.
To view the Grafana Alloy legacy configuration for tail sampling, select the river tab below. To view the OpenTelemetry Collector legacy configuration for tail sampling, select the yaml tab below.
We started off by setting a low sampling rate and that more or less worked for a couple of weeks. But this is not a very useful sampling strategy. Different endpoints have different characteristics, and need to be sampled differently. For example, you might have 10x-100x as many writes as reads, and you might want a super low sampling rate for the write path and much higher sampling rate for the read path.
Jaeger introduced remote sampling. This is head sampling, but supercharged. In a config file, you can specify the default sampling rate and strategy for all operations, and then specifically override it for a few operations.
But the best part about remote sampling is that the sampling strategies are refreshed every minute. So you could set a low sampling rate, but during an incident, increase it to debug certain code-paths.
You can specify the same rules as remote sampling, but because you have all the spans together, you could then make sampling decisions based on the spans (rather than just the root span). A common use case is sampling X% of the traces, but 100% of the traces that have an error-ing span in them.
But this requires all the spans of a trace to be together. Which requires two deployments of the Collector, the first layer routing all the spans of a trace to the same collector in the downstream deployment (using load-balancing exporter). And the second layer doing the tail sampling.
This is expensive, and the architecture is hard to operate and scale. But if you can get it right, it could pay off heavily. Because you will be capturing all the interesting traces, you could end up storing a lot less data, and any cost of the sampling infrastructure could be offset by the cost of storage savings.
Currently, it stores all the traces for a specified interval (default 30s), and there is no way to override it. For example, our writes take around 100ms-3s, and our reads take up to 50s. And there are 10x as many writes as reads. It is wasteful to store everything for the 50s. We should be able to define a decision wait per service and operation. This way we can keep the write path traces for 5s, and read path traces for 60s without breaking the bank.
But traces, span multiple operations across services and thereby contain a lot more context compared to metrics emitted by a single application. This lets us include additional data into the metrics generated, such as the upstream and downstream applications. For example the Tempo service graph is powered by metrics generated from traces and it's really hard to achieve the same with just metrics.
You can achieve the same with the service graph connector in the OpenTelemetry Collector.
Further, it is common to RED metrics on endpoints, but not very common to do RED metrics on function calls, which is something generating metrics from spans enabled. You can achieve it with purely metrics, but it's less common to wrap a function call in metric, than wrapping it in a span.
NB: Autometrics is making it easier to see function performance with just metrics.
Now if you want to generate metrics on sampled traces, they are bound to be inaccurate. For example, if you have a sampling rate of 10%, and you generate metrics on the traces, the requests/sec would be 10% of the actual value.
A solution to this is to blow up the number based on the sampling rate. For example, if you know the sampling rate is 5% and you see 5 reqs/sec in traces, you can blow it up to 100 (5 * 20) reqs/sec. But this is not very accurate and it's actually worse for errors. If you see a single error, it gets blown up into 20 errors. The duration metrics are also not super accurate. This is bad for low traffic services, but with enough throughput and scale, you might get semi-decent accuracy.
There is no way to generate accurate metrics based on the sampled data with tail-sampling. For example, if you have a policy to only sample error=true and user.id=1234, you cannot get the overall req/sec for the service.
The metrics need to be generated before the sampling happens. And the best place to do it is the first layer of routing collectors.