Prometheus memory usage keeps increasing over time.

78 views
Skip to first unread message

Sohaib Omar

unread,
May 22, 2020, 6:38:17 AM5/22/20
to Prometheus Users
Hi all, I am running Prometheus 2.17.1 as a stateful set inside the Kubernetes cluster. I want to debug what's causing Prometheu's memory usage to increase over time even when cardinality or unique time series are constant over time. How can I make sure that Prometheus is not leaking memory?
Below is some of my Prometheus config:

storage.tsdb.max-block-duration 12m
storage
.tsdb.min-block-duration 2h
storage
.tsdb.retention.time     2h
query
.lookback-delta    5m
query
.max-concurrency   20


Unique series over time:

unique series over time.png



Memory usage over time:

image (3).png

In a larger cluster, which has around 80k cardinality(constant over time), the memory usage increased up to 5GB in a span of two days before being OOMKilled. I am assuming Prometheus is not freeing up its cache. How can I debug it better or know what's causing this?
Thanks

Julien Pivotto

unread,
May 22, 2020, 7:04:12 AM5/22/20
to Sohaib Omar, Prometheus Users
On 22 May 03:38, Sohaib Omar wrote:
> Hi all, I am running Prometheus *2.17.1 *as a stateful set inside the
> Kubernetes cluster. I want to debug what's causing Prometheu's memory usage
> to increase over time even when cardinality or unique time series are
> constant over time. How can I make sure that Prometheus is not leaking
> memory?
> Below is some of my Prometheus config:


Prometheus 2.17.1 has a known memory leak. Please upgrade to at least
2.17.2 :)


>
> storage.tsdb.max-block-duration 12m
> storage.tsdb.min-block-duration 2h
> storage.tsdb.retention.time 2h
> query.lookback-delta 5m
> query.max-concurrency 20
>
>
> *Unique series over time:*
>
> [image: unique series over time.png]
>
>
> *Memory usage over time:*
>
> [image: image (3).png]
> In a larger cluster, which has around 80k cardinality(constant over time),
> the memory usage increased up to *5GB *in a span of two days before being
> OOMKilled. I am assuming Prometheus is not freeing up its cache. How can I
> debug it better or know what's causing this?
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c982234b-27b8-4b2c-a6d3-4be5bb11b72e%40googlegroups.com.




--
Julien Pivotto
@roidelapluie

Sohaib Omar

unread,
May 22, 2020, 7:34:37 AM5/22/20
to Prometheus Users
Thanks for the quick response, Julien. I Will update to 2.17.2 and report back.

Cheers!
> To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.

Sohaib Omar

unread,
May 22, 2020, 11:22:45 AM5/22/20
to Prometheus Users
Hi Julien and all, so I updated the Prometheus version to 2.17.2 and let it run for a few hours. Even now I see a similar memory usage increasing over time trend whereas cardinality over the time is same.

memory over time:

image (4).png


Cardinality over time:

image (5).png

Scrape/sec:

image (6).png


Thanks
Reply all
Reply to author
Forward
0 new messages