Prometheus crash due to OOM

526 views
Skip to first unread message

pin...@hioscar.com

unread,
Jan 5, 2022, 1:25:13 PM1/5/22
to Prometheus Users
Hi,

We are running Prometheus 2.25.0.

We have been running into issues with expensive queries causing prometheus service to crash. We are giving it 64GB ram. We have aggressively limited query timeout to 1m and query.max-samples to 10,000,000 (20% of default value), which based on my reading  (https://www.robustperception.io/why-does-prometheus-use-so-much-ram) should take up to 20MB, totally reasonable to handle.

Yet, our prometheus service crashes. In query log, we see a few occurrences of
> "error": "query processing would load too many samples into memory in query execution",
And then minutes later, we see a lot of IO ops, and OOM, and prometheus service crashes.

It doesn't seem that query.max-samples does anything to prevent prometheus from crashing.
It is almost like the bad queries went on and kept loading data.

Please advise. Thanks!

Matthias Rampke

unread,
Jan 6, 2022, 4:28:21 PM1/6/22
to pin...@hioscar.com, Prometheus Users
This is odd indeed. The only thing I can think of is that aside from the samples loaded into the query engine, a good amount of data may need to be paged in. The TSDB engine makes heavy use of mmap, so the actual data from disk is not accounted for as process memory. In some circumstances, especially when querying very long times, this can temporarily need much more memory than the samples being read.

I also find it curious that it takes a while to crash after the query is already cancelled.

Can you try getting a memory profile from before the query, and between the query and crash? The pprof utility should work out of the box for that: https://github.com/google/pprof

/MR


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/3d6d5d28-8dce-44b8-a62e-f929af7e537en%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages