On 31 December 2017 at 04:20, Peter Zaitsev <
p...@percona.com> wrote:
>
> Now I'm trying to understand if this is documented behavior ? Bug or what ?
I'd say it's mostly "by design". A scrape happens at a time t, with
that time assigned as a timestamp to all samples in the scrape. (This
has already a few implications, as the scrape takes a finite time. Is
t the beginning or the end of the scrape? The monitored target will in
general not take a real snapshot at t but collect metrics over a
finite timestamp again. However, the only thing relevant for the issue
at hand is that t is not in the future.) Now ingestion of all those
samples will take a finite time, from arrival of the samples at the
Prometheus server to visibility for queries. If you query for "now",
samples still being ingested, but with a timestamp in the past, will
not show up.
Compared to usual scrape intervals (15s to 60s) and the resulting
typical ranges used for rates, this delay hardly matters. However,
with high frequency scrapes, perhaps even combined with a heavily
loaded Prometheus server, you run into the problems you have observed
here. In general, the Prometheus collection model should be robust
against a lost or delayed scrape. But I assume, in your case, we are
getting into an area where the delay is comparable to a scrape
interval, which throws off the heuristic of determining the "end" of a
series within a range expression.
Prometheus 2 doesn't change anything in principle, but ingestion is
much more streamlined, so the delay is expected to be much shorter.
A real solution would be some kind of ingestion watermark that would
restrict the most recent timestamp for which queries can be issued
(essentially putting "now" a bit into the past). However, those
watermark approaches have some problematic implications.
Another solution might be to tie the range interpolation heuristics
(used for rate calculation) into staleness handling (essentially
making use of the knowledge that the series hasn't actually ended even
if the ingestion delay is approaching the scrape interval).
But perhaps it's really a non-issue in practice with Prometheus 2
because the ingestion delay will hardly ever get close to practically
relevant scrape intervals.
> Am I correct in understanding what
>
> 1) the "instant" query result should match the last data point of the "range" query when queried to the current time ?
If you are referring to "console" vs. "graph" view (or "query" vs.
"query_range" from the point of view of the API), then yes. The range
query is equivalent to issuing a series of instant queries.
> 2) The rate() for the query is considered from the current time, so if I have [5s] interval when data from current_time()-5sec to current_time() will be considered and value computed based on the data available in this interval,
> or is it suppose to be something else ?
The [5s] range for each rate calculation is taken from the time
reference of the query, i.e. it is between t and t-5s. t varies
depending on what the query is for. If you just run a console query,
it's "now". If you run a graph query with a certain resolution, it's
"now", "now" minus once the resolution, "now" minus twice the
resolution etc. And the `offset` modifier shifts t again.