Monotonously increasing cost calculation over a selected duration of time in grafan/prometheus

44 views
Skip to first unread message

touseef yousuf

unread,
Jun 2, 2020, 10:00:10 AM6/2/20
to Prometheus Users

Hi, I want to calcute the cost of memory/cpu usage for different teams(with underlying kubernetes infrastructure) monotonously over a period of time(lets say 3 months).The formulla i am using currently is


sum(container_cpu_usage_seconds_total{namespace=~"$namespace",pod="$pod") *  $costcpu +  sum(container_memory_usage_bytes{namespace=~"$namespace",pod=~"$pod"})/1024/1024/1024 *  $costram 


However the case with this is it doesnot give me the monotonously increasing cost over a selected duration time.It just gives me sum at that time instant.So the graph obtained goes up and down based on usage at that time instant.

Can you please suggest me the alternative way/function/formulla so that i can achieve a monotonously increasing cost in grafana over a selected time duration

Brian Candler

unread,
Jun 2, 2020, 11:40:11 AM6/2/20
to Prometheus Users
sum() works across multiple timeseries at a given instant.

However, the problem seems to be that your metrics are different.  "container_cpu_usage_seconds_total" is almost certainly a counter which increases monotonically, but "container_memory_usage_bytes" is almost certainly a gauge which goes up and down.

I suggest you start by graphing the two separately, to get to a feel of how they look.

The question most people want to answer is "how much resource did I use over the previous 3 months"?

For the counter, to get the increase over 3 months you can use the increase() function with a range vector:

    increase(container_cpu_usage_seconds_total[90d])

Or more simply, just subtract the values now and 90 days previously: e.g.

    container_cpu_usage_seconds_total - container_cpu_usage_seconds_total offset 90d

- but that will give wrong results if the counter has reset to zero during that time, so increase() is strongly recommended.

In the prometheus UI, in the "exec" view you'll see the values for right now (i.e. how now compares with 90 days ago).  If you graph this, it will be swept over time.  So the value shown for 7 days ago won't be how much you were using 7 days ago, but how much you used over the period from 97 days ago to 7 days ago.

If you want the graph to show how much resource you were using *at that instant*, then you use rate() on the counter. 

For the container memory usage you probably want to use avg_over_time() with a range vector, e.g.

    avg_over_time(container_memory_usage_bytes[90d])

Again, at a given point in time, this will show the average memory usage over the previous 90 day period.  If you want the instantaneous usage, then the bare metric (container_memory_usage_bytes) is what you want.

If you want the graph to show the *cumulative* usage of resource up to and including that time, then container_cpu_usage_seconds_total is already cumulative - although it starts from an arbitrary offset, and it may reset to zero at inopportune times.
*Cumulative" usage of memory is more awkward - are you saying you want a metered usage measured in GB-seconds?  You would need to integrate the value, i.e. the inverse of deriv(), and I don't know how to do that with prometheus.

Note: I have not bothered summing across series - with the examples above you'll get as many result timeseries as you have input timeseries.  You can add sum(...) or sum by (labels) (...) across those expressions as required.  e.g.

    sum by (namespace,pod) (...)

Note that for cpu_usage_seconds_total you may want to filter out "idle" CPU seconds if they are given as a separate metric, otherwise summing across all the dimensions will always add up to 100%.  Ditto for memory and "free" or "buffer/cache".
Reply all
Reply to author
Forward
0 new messages