query for time series misses samples (that should be there), but not when offset is used

Christoph Anton Mitterer

unread,

Mar 22, 2024, 7:17:17 PM3/22/24

to Prometheus Users

Hey.

I noticed a somewhat unexpected behaviour, perhaps someone can explain why this happens.

- on a Prometheus instance, with a scrape interval of 10s

- doing the following queries via curl from the same node where Prometheus runs (so there cannot be any different system times or so

Looking at the sample times via e.g.:

$ while true; do curl -g 'http://localhost:9090/api/v1/query?query=up[1m]' 2> /dev/null | jq '.data.result[0].values' | grep '[".]' | paste - - ; echo; sleep 1 ; done

the timings look suuuuper tight:

    1711148768.175,        "1"
    1711148778.175,        "1"
    1711148788.175,        "1"
    1711148798.175,        "1"
    1711148808.175,        "1"
    1711148818.175,        "1"

    1711148768.175,        "1"
    1711148778.175,        "1"
    1711148788.175,        "1"
    1711148798.175,        "1"
    1711148808.175,        "1"
    1711148818.175,        "1"

I.e. it's *always* .175. I guess in reality it may not actually be that tight, and Prometheus just sets the timestamps artificially... but that doesn't matter for me.

When now doing a query like (in a while loop with no delay):

up[1m]

and counting the number of samples, I'd expect to get always either 6 samples or perhaps 7 (if may query would happen exactly at a .175 time).

But since the sample times are so super tight, I'd not expect to ever get less than 6.

But that's just what happens:

1711148408.137921942
1711148408.179789148
1711148408.190407865
1711148408.239896472

    1711148358.175,        "1"
    1711148368.175,        "1"
    1711148378.175,        "1"
    1711148388.175,        "1"
    1711148398.175,        "1"

1711148408.249002352
1711148408.287031384

    1711148358.175,        "1"
    1711148368.175,        "1"
    1711148378.175,        "1"
    1711148388.175,        "1"
    1711148398.175,        "1"

1711148408.294628944
1711148408.342150984
1711148408.351871893
1711148408.405270701

Here, the non indented times, are timestamps from before and after the whole curl .. | .. pipe.

The indented lines are the samples + timestamps in those cases, where != 6 are returned, done via something similar hacky like:

$ while true; do f="$( date +%s.%N >&2; curl -g 'http://localhost:9090/api/v1/query?query=up[1m]' 2> /dev/null | jq '.data.result[0].values' | grep '[".]' | paste - - ; date +%s.%N >&2)"; if [ "$( printf '%s\n' "$f" | wc -l)" != 8 ]; then printf '\n%s\n\n' "$f"; fi ; done

One sees, that both times before and after the curl, are already behind the .175, yet still the most recent sample (which should already be there - and which in fact shows up at 1711148408.175 in later queries) is missing.

Interestingly, when doing these queries offset 10s (beware that curl requires %20 as space)... none of this happens and I basically always get 6 samples - as more or less expected.

[I say more or less, because I wonder,... whether it's possible to get 7 ... should it be?]

Any ideas why? And especially also why not with an offset?

Thanks,

Chris.

Chris Siebenmann

unread,

Mar 23, 2024, 5:24:21 PM3/23/24

to Christoph Anton Mitterer, Prometheus Users, Chris Siebenmann

I think that it does potentially matter for you in this situation. My
understanding is that Prometheus freezes the timestamp for all ingested
samples from a particular scrape at the time where the scrape begins
(including the 'up{}' sample). However, I wouldn't expect the samples to
appear in the TSDB and be available for queries until after the scrape
finishes, and that might take some amount of time (you can see this in
the scrape_duration_seconds metric).

(The timestamps are so exact because it is ten seconds from the start of
the previous scrape, not its end. As long as scrapes take less than ten
seconds, you'll see them start on this extremely even timeline.)

So if you query for live data as of 'right now' and do so often enough
and fast enough, I believe that you may miss a scrape that is currently
in progress and being ingested into the TSDB. If you offset your query
enough into the past that the scrape and ingestion will have finished,
you will reliably get the results you expect.

- cks

Ben Kochie

unread,

Mar 24, 2024, 3:16:14 AM3/24/24

to Chris Siebenmann, Christoph Anton Mitterer, Prometheus Users

Yup, this is correct. Prometheus sets the timestamp of the sample at the start of the scrape. But since it's an ACID compliant database, the data is not queryable until after it's been fully ingested.

This is intentional, because the idea is that whatever atomicity is desired by the target is handled by the target. Any locks taken are done when the target receives the GET /metrics. The exposition formatting, compression, and wire transfer time should not impact the "when time when the sample was gathered".

And yes, the timing is a tiny bit faked. There are some hidden flags that control this behavior.

--scrape.adjust-timestamps

--scrape.timestamp-tolerance

The default allows up to 2ms (+-0.002) of timing jitter to be ignored. This was added in 2020 due to a regression in the accuracy of the Go internal timer functions.

See: https://github.com/prometheus/prometheus/issues/7846

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/3658428.1711229055%40apps0.cs.toronto.edu.

Christoph Anton Mitterer

unread,

Apr 4, 2024, 1:03:23 PM4/4/24

to Prometheus Users

Hey Chris, Brian.

Thanks for your replies/confirmations.

On Sunday, March 24, 2024 at 8:16:14 AM UTC+1 Ben Kochie wrote:

Yup, this is correct. Prometheus sets the timestamp of the sample at the start of the scrape. But since it's an ACID compliant database, the data is not queryable until after it's been fully ingested.

This is intentional, because the idea is that whatever atomicity is desired by the target is handled by the target. Any locks taken are done when the target receives the GET /metrics. The exposition formatting, compression, and wire transfer time should not impact the "when time when the sample was gathered".

Does make sense, yes... was that documented somewhere? I think it would be helpful if e.g. the page about the querying basics would tell, these two properties:

- that data is only returned if it has fully arrived, and thus may not be, even if the query is after the sample time

- that Prometheus "adjusts" the timestamps within a certain range

And yes, the timing is a tiny bit faked. There are some hidden flags that control this behavior.

--scrape.adjust-timestamps
--scrape.timestamp-tolerance

The default allows up to 2ms (+-0.002) of timing jitter to be ignored. This was added in 2020 due to a regression in the accuracy of the Go internal timer functions.

See: https://github.com/prometheus/prometheus/issues/7846

Makes sense, too. And is actually vital for what I do over in https://groups.google.com/g/prometheus-users/c/BwJNsWi1LhI/m/ik2OiRa2AAAJ

Just out of curiosity, what happens, if the jitter is more than the +-0.002?

Thanks,

Chris.

Ben Kochie

unread,

Apr 5, 2024, 1:10:29 AM4/5/24

to Christoph Anton Mitterer, Prometheus Users

On Thu, Apr 4, 2024 at 7:03 PM Christoph Anton Mitterer <cale...@gmail.com> wrote:

Hey Chris, Brian.

Thanks for your replies/confirmations.

On Sunday, March 24, 2024 at 8:16:14 AM UTC+1 Ben Kochie wrote:
Yup, this is correct. Prometheus sets the timestamp of the sample at the start of the scrape. But since it's an ACID compliant database, the data is not queryable until after it's been fully ingested.

This is intentional, because the idea is that whatever atomicity is desired by the target is handled by the target. Any locks taken are done when the target receives the GET /metrics. The exposition formatting, compression, and wire transfer time should not impact the "when time when the sample was gathered".

Does make sense, yes... was that documented somewhere? I think it would be helpful if e.g. the page about the querying basics would tell, these two properties:
- that data is only returned if it has fully arrived, and thus may not be, even if the query is after the sample time
- that Prometheus "adjusts" the timestamps within a certain range

Good question, I'm not sure we documented this well. I think it would be nice to add a "TSDB details" page.

And yes, the timing is a tiny bit faked. There are some hidden flags that control this behavior.

--scrape.adjust-timestamps
--scrape.timestamp-tolerance

The default allows up to 2ms (+-0.002) of timing jitter to be ignored. This was added in 2020 due to a regression in the accuracy of the Go internal timer functions.

See: https://github.com/prometheus/prometheus/issues/7846

Makes sense, too. And is actually vital for what I do over in https://groups.google.com/g/prometheus-users/c/BwJNsWi1LhI/m/ik2OiRa2AAAJ

Just out of curiosity, what happens, if the jitter is more than the +-0.002?

If the jitter is > 0.002, the real value is stored.

Thanks,
Chris.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/ce9bc4cb-6c93-4df0-93ed-cc83e1e17f80n%40googlegroups.com.

Christoph Anton Mitterer

unread,

Apr 5, 2024, 1:45:05 PM4/5/24

to Prometheus Users

Hey.

On Friday, April 5, 2024 at 7:10:29 AM UTC+2 Ben Kochie wrote:

If the jitter is > 0.002, the real value is stored.

Interesting... though I guess bad for my solution in the other thread, where I make the assumption that it's guaranteed that samples are always exactly on point with the same interval in-between.

Haven't checked it yet, but I'd guess that blows the approach in the other thread.

Is there some metric to see whether such non-aligned samples occurred?

Also, what would happen if e.g. there was a first scrape, which get's delayed > 0.002 s ... and before that first scrape arrives, there's yet another (later) scrape which has no jitter and is on time?

Are they going to be properly ordered?

Cheers

Chris.

Chris Siebenmann

unread,

Apr 5, 2024, 3:12:49 PM4/5/24

to Christoph Anton Mitterer, Prometheus Users, Chris Siebenmann

> Also, what would happen if e.g. there was a first scrape, which get's
> delayed > 0.002 s ... and before that first scrape arrives, there's yet
> another (later) scrape which has no jitter and is on time?
> Are they going to be properly ordered?

As far as I know, this can't happen. The scrape interval timer is a
ticker (specifically a Go time.Ticker[*]) and it effectively only does a
new tick and thus a new scrape of the target when the old one is done. I
believe that you might be able to get scrapes (of a given target) back
to back, but only one scrape (of a given target) is ever happening at
one time because the thing that does the scrape is driven by the ticks
of the timer.

(If Prometheus or the system it's running on is under load these ticks
and handling them may be delayed, too, but again there's only one scrape
of a given target in progress at once.)

- cks
*: https://pkg.go.dev/time#Ticker

Reply all

Reply to author

Forward