Debugging OOM issue.

86 views
Skip to first unread message

yagyans...@gmail.com

unread,
Nov 9, 2020, 4:56:50 AM11/9/20
to Prometheus Users

Hi. I am using Promtheus v 2.20.1 and suddenly my Prometheus crashed because of Memory overshoot. How to pinpoint what caused the Prometheus to go OOM or which query caused the Prometheus go OOM?

Thanks in advance!

Christian Hoffmann

unread,
Nov 9, 2020, 7:01:18 AM11/9/20
to yagyans...@gmail.com, Prometheus Users
Hi,
Prometheus writes the currently active queries to a file which is read
upon restart. Prometheus will print all unfinished queries, see here:

https://www.robustperception.io/what-queries-were-running-when-prometheus-died

This should help pin-pointing the relevant queries.

Often it's some combination of querying long timestamps and/or high
cardinality metrics.

Kind regards,
Christian

yagyans...@gmail.com

unread,
Nov 25, 2020, 12:49:14 AM11/25/20
to Prometheus Users
Thanks, Christian.

Today I noticed something that is totally new to me. Prometheus went down and I got the query because of which it went down but strangely at that time I checked the server did not go OOM, the Memory dropped directly from constant usage of 77% to zero, but usually when a Query takes a long time the Memory usage spikes up which causes the Prometheus to crash because of OOM. This time there was no sudden spike in either CPU or Memory Utilization.

Any thoughts on this?

Ben Kochie

unread,
Nov 25, 2020, 2:37:32 AM11/25/20
to yagyans...@gmail.com, Prometheus Users
Maybe set a lower `--query.max-samples` flag setting. The default is 50 million samples. I typically lower this to 20 million to avoid too-heavy queries. You can also lower the defualt `--query.max-concurrency=20` to avoid overloading.

Likely, if you need to make large queries, you should allocate more memory for Prometheus.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/1bfe152b-bf4a-4c33-85a0-9ad9637a241fn%40googlegroups.com.

Yagyansh S. Kumar

unread,
Nov 25, 2020, 2:45:31 AM11/25/20
to Ben Kochie, Prometheus Users
Thanks, Ben. Was thinking of doing the same because a single query is causing my Prometheus to go down occasionally.
One query though, will limiting the concurrency slow down the overall evaluation process?

Ben Kochie

unread,
Nov 25, 2020, 2:48:50 AM11/25/20
to Yagyansh S. Kumar, Prometheus Users
No, concurrency only affects how many queries are running at the same time. 

Yagyansh S. Kumar

unread,
Nov 25, 2020, 3:58:59 AM11/25/20
to Ben Kochie, Prometheus Users
Cool, thanks for the quick help.
Reply all
Reply to author
Forward
0 new messages