Traffic quota rule alert node_exporter

47 views
Skip to first unread message

Roman Melnyk

unread,
Oct 20, 2023, 4:49:55 AM10/20/23
to Prometheus Users
Hello.
Can someone help me to write alert rule for traffic quota exhausted:  for last 30 days server used more than 95% of allowed monthly traffic volume
Thank you!

Brian Candler

unread,
Oct 20, 2023, 5:17:27 AM10/20/23
to Prometheus Users
If it's really an "allowed monthly traffic volume" you mean then you just compare the amount consumed over that time with the threshold:  increase(foo[30d]) > some_threshold_value

However, I suspect you're actually talking about traffic *rates* (i.e. volume per unit time). A typical question is, "was the 5-minute average rate less than X Gbps for 95% of the time?")

The pieces you need are:
* rate(foo[5m]) to calculate the 5-minute average rate
* a subquery to evaluate that expression multiple times over 30 days
quantile_over_time to pick the 95th percentile value

Something like this:
quantile_over_time(0.95, rate(ifHCInOctets{ifDescr="pppoe-out2"}[5m])[30d:5m]) * 8

(The *8 is to convert bytes per second into bits per second)

Once you're happy with the results of this expression, to make an alerting rule you'd add a filter on the end as "> some_threshold_value"

Roman Melnyk

unread,
Oct 20, 2023, 1:58:29 PM10/20/23
to Prometheus Users
Just need to resolve billing problem.
I rent some number of VMs, and every vm got fixed amount of allowed traffic per month.
So I need some kind of rule to get alerts when traffic quota exceeded in current month, and for example, billing day is 1 

Brian Candler

unread,
Oct 20, 2023, 5:10:17 PM10/20/23
to Prometheus Users
" traffic quota exceeded in current month"

That's different to what you asked for first time ("for last 30 days"). PromQL by itself isn't very good for things like "current calendar month".

You could write an external program which talks to the API: it can calculate the timestamp that it wants the query to be evaluated at, and specify that in the API call (or using the @ modifier)

Your program could:
- find the current value of the traffic volume counter
- find the traffic volume counter's value at the 1st of the month
- subtract them
- compare to the expected value at this point in the month: (day of month / days in month * monthly quota)

But TBH, I think it would be both easier and more useful to alert on the most recent 24 hour's usage, i.e. the rate of consumption. If you are burning more than 1/30th of the monthly quota every day then you need to find out why, and keep a careful eye on it.

increase(foo[24h]) > (1000000000000 / 30)
Reply all
Reply to author
Forward
0 new messages