Hi there,
We're seeing really large spikes when using the `rate()` function on some of our metrics. I've been able to isolate a single time series that displays this problem, which I'm going to call `counter`. I haven't attached the actual metric labels here, but all of the data you see here is from `counter` over the same time period.
This is the raw data, as obtained through a request to /api/v1/query:
{
"data": {
"result": [
{
"metric": {/* redacted */},
"values": [
[
1649239253.4,
"225201"
],
[
1649239313.4,
"225226"
],
[
1649239373.4,
"225249"
],
[
1649239433.4,
"225262"
],
[
1649239493.4,
"225278"
],
[
1649239553.4,
"225310"
],
[
1649239613.4,
"225329"
],
[
1649239673.4,
"225363"
],
[
1649239733.4,
"225402"
],
[
1649239793.4,
"225437"
],
[
1649239853.4,
"225466"
],
[
1649239913.4,
"225492"
],
[
1649239973.4,
"225529"
],
[
1649240033.4,
"225555"
],
[
1649240093.4,
"225595"
]
]
}
],
"resultType": "matrix"
},
"status": "success"
}
The next query is taken from the Grafana query inspector, because for reasons I don't understand I can't get Prometheus to give me any data when I issue the same query to /api/v1/query_range. The query is the same as the above query, but wrapped in a rate([1m]):
"request": {
"url": "api/datasources/proxy/1/api/v1/query_range?query=rate(counter[1m])&start=1649239200&end=1649240100&step=60",
"method": "GET",
"hideFromInspector": false
},
"response": {
"status": "success",
"data": {
"resultType": "matrix",
"result": [
{
"metric": {/* redacted */},
"values": [
[
1649239200,
"0"
],
[
1649239260,
"0"
],
[
1649239320,
"0"
],
[
1649239380,
"0"
],
[
1649239440,
"0"
],
[
1649239500,
"0"
],
[
1649239560,
"0"
],
[
1649239620,
"0"
],
[
1649239680,
"0"
],
[
1649239740,
"9391.766666666665"
],
[
1649239800,
"0"
],
[
1649239860,
"0"
],
[
1649239920,
"0"
],
[
1649239980,
"0"
],
[
1649240040,
"0.03333333333333333"
],
[
1649240100,
"0"
]
]
}
]
}
}
}
Given the gradual increase in the underlying counter, I have two questions:
1. How come the rate is 0 for all except 2 datapoints?
2. How come there is one enormous datapoint in the rate query, that is seemingly unexplained in the raw data?
For 2 I've seen in other threads that the explanation is an unintentional counter reset, caused by scrapes a millisecond apart that make the counter appear to go down for a single scrape interval. I don't think I see this in our raw data, though.
We're using Prometheus version 2.26.0, revision 3cafc58827d1ebd1a67749f88be4218f0bab3d8d, go version go1.16.2.