using high speed counters but getting dips and spikes

cch...@gmail.com

unread,

Jul 18, 2018, 11:45:05 AM7/18/18

to Prometheus Users

I'm monitoring a Cisco ASR9000 with snmp-exporter, and I'm getting the following weird oddity in my monitoring via grafana with a prometheus backend.

The queries aren't anything special

irate(ifHCInOctets{job="cisco",alias=~"Physical-WAN.*", instance="172.16.0.136"}[2m])*8

and

irate(ifHCOutOctets{job="cisco",alias=~"Physical-WAN.*", instance="172.16.0.136"}[2m])*8*-1

What could be causing this and is their anyway to smooth it out (i normally have a lower resolution set 1/4, but set it to 1/1 cause then the peaks and valleys are next to each other at 1/4 they are spaced separate.

Brian Brazil

unread,

Jul 18, 2018, 11:52:31 AM7/18/18

to Chris C., Prometheus Users

When using irate you need to be careful that you are fully zoomed in so that you don't miss data points. For the graph below I doubt this is the case, so you probably should be using rate() instead.

These are true spikes from the Prometheus standpoint, but seem a bit odd so either you've a regular spike that's aligning with the step here so you're seeing it correctly or the device is doing some caching. I'd suggest zooming in to help see exactly what's causing it.

Brian

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c352ba99-b196-4a42-a921-7db72452fb07%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Brian Brazil

www.robustperception.io

Chris C.

unread,

Jul 18, 2018, 1:12:39 PM7/18/18

to brian....@robustperception.io, promethe...@googlegroups.com

Switching to rate, got rid of the spikes, but i still see dips, just not as insanely big, when you say "completely zoomed in" what do you mean exactly?

On Wed, 18 Jul 2018 at 11:52, Brian Brazil <brian....@robustperception.io> wrote:

On 18 July 2018 at 16:45, <cch...@gmail.com> wrote:
I'm monitoring a Cisco ASR9000 with snmp-exporter, and I'm getting the following weird oddity in my monitoring via grafana with a prometheus backend.

The queries aren't anything special
irate(ifHCInOctets{job="cisco",alias=~"Physical-WAN.*", instance="172.16.0.136"}[2m])*8
and
irate(ifHCOutOctets{job="cisco",alias=~"Physical-WAN.*", instance="172.16.0.136"}[2m])*8*-1

What could be causing this and is their anyway to smooth it out (i normally have a lower resolution set 1/4, but set it to 1/1 cause then the peaks and valleys are next to each other at 1/4 they are spaced separate.
When using irate you need to be careful that you are fully zoomed in so that you don't miss data points. For the graph below I doubt this is the case, so you probably should be using rate() instead.

These are true spikes from the Prometheus standpoint, but seem a bit odd so either you've a regular spike that's aligning with the step here so you're seeing it correctly or the device is doing some caching. I'd suggest zooming in to help see exactly what's causing it.

Brian

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c352ba99-b196-4a42-a921-7db72452fb07%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Brian Brazil
www.robustperception.io

Brian Brazil

unread,

Jul 18, 2018, 3:55:13 PM7/18/18

to Chris C., Prometheus Users

On 18 July 2018 at 18:12, Chris C. <cch...@gmail.com> wrote:

Switching to rate, got rid of the spikes, but i still see dips, just not as insanely big, when you say "completely zoomed in" what do you mean exactly?

That your step is no greater than your scrape interval.

Brian

On Wed, 18 Jul 2018 at 11:52, Brian Brazil <brian.brazil@robustperception.io> wrote:

On 18 July 2018 at 16:45, <cch...@gmail.com> wrote:
I'm monitoring a Cisco ASR9000 with snmp-exporter, and I'm getting the following weird oddity in my monitoring via grafana with a prometheus backend.

The queries aren't anything special
irate(ifHCInOctets{job="cisco",alias=~"Physical-WAN.*", instance="172.16.0.136"}[2m])*8
and
irate(ifHCOutOctets{job="cisco",alias=~"Physical-WAN.*", instance="172.16.0.136"}[2m])*8*-1

What could be causing this and is their anyway to smooth it out (i normally have a lower resolution set 1/4, but set it to 1/1 cause then the peaks and valleys are next to each other at 1/4 they are spaced separate.
When using irate you need to be careful that you are fully zoomed in so that you don't miss data points. For the graph below I doubt this is the case, so you probably should be using rate() instead.

These are true spikes from the Prometheus standpoint, but seem a bit odd so either you've a regular spike that's aligning with the step here so you're seeing it correctly or the device is doing some caching. I'd suggest zooming in to help see exactly what's causing it.

Brian

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c352ba99-b196-4a42-a921-7db72452fb07%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Brian Brazil
www.robustperception.io

--

Brian Brazil

www.robustperception.io

Alin Sînpălean

unread,

Jul 19, 2018, 5:12:16 AM7/19/18

to Prometheus Users

You should only use irate when you look at the underlying data in full resolution.

At all other times you should use rate, so you're displaying all your collected data (averaged) instead of sampling it. Unfortunately Prometheus' implementation of rate goes for internal consistency rather than common sense, so it will throw away data regardless, unless you smear your data across multiple steps by computing the rate over an unnecessarily long range (and even then some samples will weigh more than others).

That being said, the reason you are seeing spikes and dips in your graph is because your counter has them to begin with. This is likely due to aliasing (i.e. your counter is not growing perfectly smoothly and you're consistently scraping it exactly after or exactly before a large increase), jitter (your monitored device responds and/or Prometheus scrapes more slowly when under load, so those samples will either have timestamps that are off the exact interval you specified or have a "correct" timestamp but be collected earlier or later than the exact timestamp). These effects may, in some cases, be reduced by querying with a start/end time that are offset by a few seconds (which, unfortunately Grafana recently made impossible by aligning start and end times with a multiple of the step).

And finally, it could actually be that your network traffic actually has those spikes: if you have 10 devices that request a large amount of data every 10 minutes (which is the period your graph seems to show) and each of them does it in a different minute, except 2 of them that do it within the same minute, this is what that would look like.

Cheers,

Alin.

On Wednesday, July 18, 2018 at 9:55:13 PM UTC+2, Brian Brazil wrote:

On 18 July 2018 at 18:12, Chris C. <cch...@gmail.com> wrote:
Switching to rate, got rid of the spikes, but i still see dips, just not as insanely big, when you say "completely zoomed in" what do you mean exactly?

That your step is no greater than your scrape interval.

Brian

On Wed, 18 Jul 2018 at 11:52, Brian Brazil <brian....@robustperception.io> wrote:

On 18 July 2018 at 16:45, <cch...@gmail.com> wrote:
I'm monitoring a Cisco ASR9000 with snmp-exporter, and I'm getting the following weird oddity in my monitoring via grafana with a prometheus backend.

The queries aren't anything special
irate(ifHCInOctets{job="cisco",alias=~"Physical-WAN.*", instance="172.16.0.136"}[2m])*8
and
irate(ifHCOutOctets{job="cisco",alias=~"Physical-WAN.*", instance="172.16.0.136"}[2m])*8*-1

What could be causing this and is their anyway to smooth it out (i normally have a lower resolution set 1/4, but set it to 1/1 cause then the peaks and valleys are next to each other at 1/4 they are spaced separate.
When using irate you need to be careful that you are fully zoomed in so that you don't miss data points. For the graph below I doubt this is the case, so you probably should be using rate() instead.

These are true spikes from the Prometheus standpoint, but seem a bit odd so either you've a regular spike that's aligning with the step here so you're seeing it correctly or the device is doing some caching. I'd suggest zooming in to help see exactly what's causing it.

Brian

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c352ba99-b196-4a42-a921-7db72452fb07%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Brian Brazil
www.robustperception.io

--
Brian Brazil
www.robustperception.io

Reply all

Reply to author

Forward