rate() doesn't work

56 views
Skip to first unread message

Isabel Noronha

unread,
Apr 28, 2020, 2:43:32 PM4/28/20
to Prometheus Users
Query1.sum by(container_label_com_docker_swarm_service_name)(container_network_transmit_bytes_total{{image!="",container_label_com_docker_swarm_service_name="$service"})

Query2.sort_desc(sum by (container_label_com_docker_swarm_service_name) (rate(container_network_transmit_bytes_total{image!="",container_label_com_docker_swarm_service_name="$service"}[1m] ) ))

Query1 works fine and gives the overall network_transmit_bytes per service="xyz"
Query2 doesn't return any results when I use the rate function.

Could anyone tell me what is wrong?

Thanks,
Regards,
Isabel

Brian Candler

unread,
Apr 28, 2020, 3:24:15 PM4/28/20
to Prometheus Users
You haven't given enough details about what your metrics look like or how often you are scraping.

However, I can tell you that rate(foo[1m]) only looks at a 1 minute window of the input.  If you are only scraping at 1 minute intervals, that means there will only be one data point in that window, and it cannot calculate a rate from that - so you will get no answer.

If you are scraping at 1 minute intervals then the minimum you need is rate(foo[2m])

If you do rate(foo[5m]) this will give you an average across the first and last data points in the range, i.e. an average over a 4 minute period if you are scraping at 1 minute intervals.

If you do irate(foo[5m]) this will give you the rate between the last two data points in the range.

Isabel Noronha

unread,
Apr 28, 2020, 3:31:11 PM4/28/20
to Prometheus Users
scrape_interval is 5m
I changed the scrape_interval today forgot to change it in grafana variable.
Thank you.

Stuart Clark

unread,
Apr 28, 2020, 5:08:58 PM4/28/20
to Isabel Noronha, Prometheus Users
On 28/04/2020 20:31, Isabel Noronha wrote:
> scrape_interval is 5m
> I changed the scrape_interval today forgot to change it in grafana
> variable.
> Thank you.

You are likely to have issues with a scrape interval that long.

Due to staleness the maximum scrape interval is about 2 minutes, so
you'd be best off reducing it from 5m.

--
Stuart Clark

Isabel Noronha

unread,
Apr 29, 2020, 1:51:23 AM4/29/20
to Prometheus Users
Yes I changed it 40 s.
I had increased  it since in one host 2k containers were running and then there was this error "context deadline exceeded error".
So I thought scrape_interval isn't sufficient to scrape all the metrics.

Isabel Noronha

unread,
Apr 29, 2020, 2:00:50 AM4/29/20
to Prometheus Users
sudo sysctl fs.inotify.max_user_watches=1048576

I ran the above command and it started working fine.
But now my only concern is I'll have 2k(containers on each host) * 20(host). 
I have done relabeling already.
Would this problem reproduce if I increase no. of hosts?
Currently it is 1 host with 2k containers.

Ben Kochie

unread,
Apr 29, 2020, 3:59:14 AM4/29/20
to Isabel Noronha, Prometheus Users
On Wed, Apr 29, 2020 at 8:00 AM Isabel Noronha <isabeln...@gmail.com> wrote:
sudo sysctl fs.inotify.max_user_watches=1048576

I ran the above command and it started working fine.
But now my only concern is I'll have 2k(containers on each host) * 20(host). 
I have done relabeling already.
Would this problem reproduce if I increase no. of hosts?

Prometheus scrapes each target independently in separate threads. So adding additional targets will not be a problem.

Isabel Noronha

unread,
Apr 29, 2020, 6:59:18 AM4/29/20
to Prometheus Users
Thank you!
Reply all
Reply to author
Forward
0 new messages