How to Debug - Error on ingesting out-of-order samples" num_dropped=xx

43 views
Skip to first unread message

dineshnithy...@gmail.com

unread,
Aug 27, 2020, 2:44:54 AM8/27/20
to Prometheus Users
Hi Team,

we are currently facing issues where metrics were getting dropped throwing exception as "Error on ingesting out-of-order samples"

  • And this happens only for specific job configs and need pointers on On what circumstances does it happens and any alternative solutions to it ?
  • we are using consul sd configs to scrape the metrics and is there any optimal way of handling this within consul_sd job configuration or via relabelling 


Brian Candler

unread,
Aug 27, 2020, 3:37:33 AM8/27/20
to Prometheus Users
On Thursday, 27 August 2020 07:44:54 UTC+1, dineshnithy...@gmail.com wrote:
Hi Team,

we are currently facing issues where metrics were getting dropped throwing exception as "Error on ingesting out-of-order samples"

  • And this happens only for specific job configs

The error is self-explanatory: prometheus scraped samples for a particular time series, but the timestamps were not in sequence (that is, a later scrape gave an earlier timestamp than a previous scrape).

Therefore, you need to show the configuration of those jobs, what exporters those jobs are scraping and how those exporters are configured.

The most likely causes are:

1. you are using a custom exporter where you are trying to set the timestamp on each metric (which is not recommended practice).

2. you are ingesting multiple metrics as if they were the same metric - e.g. you have done relabelling such that different metrics have the same metric name and labels.  Each timeseries needs a unique label set; usually the metric name plus "job" and "instance" labels are sufficient to ensure uniqueness.

If you can explain exactly what you're doing and what you're trying to achieve, someone may be able to suggest a better way to approach it.

  • we are using consul sd configs to scrape the metrics and is there any optimal way of handling this within consul_sd job configuration or via relabelling 
Consul configuration tells prometheus *which* nodes to scrape, and which labels to apply.  The problem *might* be that you have not set enough labels to ensure uniqueness for each timeseries.  But without seeing any of the configs, this is just speculation.

Julien Pivotto

unread,
Aug 27, 2020, 4:11:20 AM8/27/20
to Brian Candler, Prometheus Users
On 27 Aug 00:37, Brian Candler wrote:
> On Thursday, 27 August 2020 07:44:54 UTC+1, dineshnithy...@gmail.com wrote:
> >
> > Hi Team,
> >
> > we are currently facing issues where metrics were getting dropped throwing
> > exception as "Error on ingesting out-of-order samples"
> >
> >
> > - And this happens only for specific job configs
> >
> >
> The error is self-explanatory: prometheus scraped samples for a particular
> time series, but the timestamps were not in sequence (that is, a later
> scrape gave an earlier timestamp than a previous scrape).
>
> Therefore, you need to show the configuration of those jobs, what exporters
> those jobs are scraping and how those exporters are configured.
>
> The most likely causes are:
>
> 1. you are using a custom exporter where you are trying to set the
> timestamp on each metric (which is not recommended practice).
>
> 2. you are ingesting multiple metrics as if they were the same metric -
> e.g. you have done relabelling such that different metrics have the same
> metric name and labels. Each timeseries needs a unique label set; usually
> the metric name plus "job" and "instance" labels are sufficient to ensure
> uniqueness.

3. Multiple recording rules produce the same output

>
> If you can explain exactly what you're doing and what you're trying to
> achieve, someone may be able to suggest a better way to approach it.
>
>
> > - we are using consul sd configs to scrape the metrics and is there
> > any optimal way of handling this within consul_sd job configuration or via
> > relabelling
> >
> > Consul configuration tells prometheus *which* nodes to scrape, and which
> labels to apply. The problem *might* be that you have not set enough
> labels to ensure uniqueness for each timeseries. But without seeing any of
> the configs, this is just speculation.
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/1131a1c5-b7ca-4874-a391-463e879ef603o%40googlegroups.com.


--
Julien Pivotto
@roidelapluie
Reply all
Reply to author
Forward
0 new messages