rate() doesn't work

Isabel Noronha

unread,

Apr 28, 2020, 2:43:32 PM4/28/20

to Prometheus Users

Query1.sum by(container_label_com_docker_swarm_service_name)(container_network_transmit_bytes_total{{image!="",container_label_com_docker_swarm_service_name="$service"})

Query2.sort_desc(sum by (container_label_com_docker_swarm_service_name) (rate(container_network_transmit_bytes_total{image!="",container_label_com_docker_swarm_service_name="$service"}[1m] ) ))

Query1 works fine and gives the overall network_transmit_bytes per service="xyz"

Query2 doesn't return any results when I use the rate function.

Could anyone tell me what is wrong?

Thanks,

Regards,

Isabel

Brian Candler

unread,

Apr 28, 2020, 3:24:15 PM4/28/20

to Prometheus Users

You haven't given enough details about what your metrics look like or how often you are scraping.

However, I can tell you that rate(foo[1m]) only looks at a 1 minute window of the input. If you are only scraping at 1 minute intervals, that means there will only be one data point in that window, and it cannot calculate a rate from that - so you will get no answer.

If you are scraping at 1 minute intervals then the minimum you need is rate(foo[2m])

If you do rate(foo[5m]) this will give you an average across the first and last data points in the range, i.e. an average over a 4 minute period if you are scraping at 1 minute intervals.

If you do irate(foo[5m]) this will give you the rate between the last two data points in the range.

Isabel Noronha

unread,

Apr 28, 2020, 3:31:11 PM4/28/20

to Prometheus Users

scrape_interval is 5m

I changed the scrape_interval today forgot to change it in grafana variable.

Thank you.

Stuart Clark

unread,

Apr 28, 2020, 5:08:58 PM4/28/20

to Isabel Noronha, Prometheus Users

On 28/04/2020 20:31, Isabel Noronha wrote:
> scrape_interval is 5m
> I changed the scrape_interval today forgot to change it in grafana
> variable.
> Thank you.

You are likely to have issues with a scrape interval that long.

Due to staleness the maximum scrape interval is about 2 minutes, so
you'd be best off reducing it from 5m.

--
Stuart Clark

Isabel Noronha

unread,

Apr 29, 2020, 1:51:23 AM4/29/20

to Prometheus Users

Yes I changed it 40 s.

I had increased it since in one host 2k containers were running and then there was this error "context deadline exceeded error".

So I thought scrape_interval isn't sufficient to scrape all the metrics.

Isabel Noronha

unread,

Apr 29, 2020, 2:00:50 AM4/29/20

to Prometheus Users

sudo sysctl fs.inotify.max_user_watches=1048576

I ran the above command and it started working fine.

But now my only concern is I'll have 2k(containers on each host) * 20(host).

I have done relabeling already.

Would this problem reproduce if I increase no. of hosts?

Currently it is 1 host with 2k containers.

Ben Kochie

unread,

Apr 29, 2020, 3:59:14 AM4/29/20

to Isabel Noronha, Prometheus Users

On Wed, Apr 29, 2020 at 8:00 AM Isabel Noronha <isabeln...@gmail.com> wrote:

sudo sysctl fs.inotify.max_user_watches=1048576

I ran the above command and it started working fine.
But now my only concern is I'll have 2k(containers on each host) * 20(host).
I have done relabeling already.
Would this problem reproduce if I increase no. of hosts?

Prometheus scrapes each target independently in separate threads. So adding additional targets will not be a problem.

Isabel Noronha

unread,

Apr 29, 2020, 6:59:18 AM4/29/20

to Prometheus Users

Thank you!

Reply all

Reply to author

Forward