Best practices for Prometheus 2 on Kubernetes

333 views
Skip to first unread message

Khusro Jaleel

unread,
Dec 20, 2017, 2:06:11 PM12/20/17
to Prometheus Users
Hello, I'm planning on setting up Prometheus 2.0 on a GKE Kubernetes 1.8 cluster and I wanted to know what the best practices are for tuning such a setup, and what to look for.

For example, with a "n1-highcpu-8" node, what are the best settings for requests / limits for the prometheus pod, and what about target heap usage, is that still necessary? If I'm using an external disk on GKE what sort of GCE disk class is best for performance? 

I noticed that this page exists for 1.8 with a lot of useful things to watch out for, is there something similar planned for 2.0?

Thanks!

Ben Kochie

unread,
Dec 20, 2017, 6:05:38 PM12/20/17
to Khusro Jaleel, Prometheus Users
Prometheus 2.0 requires 1/10 the CPU of Prometheus 1.x, so you will need a lot less of that.  n1-standard-X should be fine.

Target heap is obsolete, 2.0 automatically allocates (and frees) memory as needed.  But you will want to dedicate some memory to page cache, as it now uses normal filesystem page cache for query performance.

Disk IO is also 100x better from 1.x to 2.0.  Standard disk should be ok now.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/55141b86-89a6-44f9-b386-d2a67cae5120%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Khusro Jaleel

unread,
Dec 21, 2017, 7:00:42 AM12/21/17
to Prometheus Users
Thanks Ben, is dedicating memory to page cache a command-line option or is that something it manages automatically? If I set the "request/limits" on the Kubernetes pod to 5000Mi for example, what Prometheus parameters should I adjust, if any?


On Wednesday, 20 December 2017 23:05:38 UTC, Ben Kochie wrote:
Prometheus 2.0 requires 1/10 the CPU of Prometheus 1.x, so you will need a lot less of that.  n1-standard-X should be fine.

Target heap is obsolete, 2.0 automatically allocates (and frees) memory as needed.  But you will want to dedicate some memory to page cache, as it now uses normal filesystem page cache for query performance.

Disk IO is also 100x better from 1.x to 2.0.  Standard disk should be ok now.
On Wed, Dec 20, 2017 at 8:06 PM, Khusro Jaleel <kerne...@gmail.com> wrote:
Hello, I'm planning on setting up Prometheus 2.0 on a GKE Kubernetes 1.8 cluster and I wanted to know what the best practices are for tuning such a setup, and what to look for.

For example, with a "n1-highcpu-8" node, what are the best settings for requests / limits for the prometheus pod, and what about target heap usage, is that still necessary? If I'm using an external disk on GKE what sort of GCE disk class is best for performance? 

I noticed that this page exists for 1.8 with a lot of useful things to watch out for, is there something similar planned for 2.0?

Thanks!

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.

Ben Kochie

unread,
Dec 21, 2017, 9:54:30 AM12/21/17
to Khusro Jaleel, Prometheus Users
That's one of the down sides to using page cache, there's no explicit way to control it.  The only way to handle it in K8s is to adjust the pod resource request/limit.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/2daeead9-aafa-4dda-8994-a9b05b60de9c%40googlegroups.com.

Khusro Jaleel

unread,
Jan 3, 2018, 6:40:45 AM1/3/18
to Prometheus Users
Thanks you, I had a couple other questions about memory usage. Is it possible to figure out how much memory I *should* give the Prometheus pod in Kubernetes by looking at the number of samples for example? If there is a way to get a rough figure, that might help me figure out if I am starting out with enough or not enough memory? 

These are some of the values I have from Prometheus 1.x on my K8s cluster so far:
rate(prometheus_local_storage_ingested_samples_total[5m]) 7735.679166666667
prometheus_local_storage_memory_series prometheus_local_storage_memory_series{instance="localhost:9090",job="prometheus"} 235357
prometheus_local_storage_memory_chunks prometheus_local_storage_memory_chunks{instance="localhost:9090",job="prometheus"} 1193648




On Thursday, 21 December 2017 14:54:30 UTC, Ben Kochie wrote:
That's one of the down sides to using page cache, there's no explicit way to control it.  The only way to handle it in K8s is to adjust the pod resource request/limit.
On Thu, Dec 21, 2017 at 1:00 PM, Khusro Jaleel <kerne...@gmail.com> wrote:
Thanks Ben, is dedicating memory to page cache a command-line option or is that something it manages automatically? If I set the "request/limits" on the Kubernetes pod to 5000Mi for example, what Prometheus parameters should I adjust, if any?


On Wednesday, 20 December 2017 23:05:38 UTC, Ben Kochie wrote:
Prometheus 2.0 requires 1/10 the CPU of Prometheus 1.x, so you will need a lot less of that.  n1-standard-X should be fine.

Target heap is obsolete, 2.0 automatically allocates (and frees) memory as needed.  But you will want to dedicate some memory to page cache, as it now uses normal filesystem page cache for query performance.

Disk IO is also 100x better from 1.x to 2.0.  Standard disk should be ok now.

On Wed, Dec 20, 2017 at 8:06 PM, Khusro Jaleel <kerne...@gmail.com> wrote:
Hello, I'm planning on setting up Prometheus 2.0 on a GKE Kubernetes 1.8 cluster and I wanted to know what the best practices are for tuning such a setup, and what to look for.

For example, with a "n1-highcpu-8" node, what are the best settings for requests / limits for the prometheus pod, and what about target heap usage, is that still necessary? If I'm using an external disk on GKE what sort of GCE disk class is best for performance? 

I noticed that this page exists for 1.8 with a lot of useful things to watch out for, is there something similar planned for 2.0?

Thanks!

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/55141b86-89a6-44f9-b386-d2a67cae5120%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.

Ben Kochie

unread,
Jan 3, 2018, 7:21:09 AM1/3/18
to Khusro Jaleel, Prometheus Users
I think the number was about 8kB memory per series.  This is for process memory.  It can be more depending on the query patterns, recording rules, etc.

Your example above has 230k series, so it would be ~1.75G memory.  I would start with 4G memory just to be safe and adjust from there.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/4ff4f7c7-b626-432b-966a-4aa31873e888%40googlegroups.com.

Khusro Jaleel

unread,
Jan 4, 2018, 6:47:36 AM1/4/18
to Prometheus Users
Thanks Ben, I am moving to Prom 2.0 so was actually referring to Prometheus 2.x not 1.x. Is that 8kB memory per series for 2.x ? 

Ben Kochie

unread,
Jan 4, 2018, 10:24:03 AM1/4/18
to Khusro Jaleel, Prometheus Users
Yes, I think it's mostly the same for both.  But 1.x needs extra for caching additional series.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a393f575-154a-4f37-8851-b4b51624639e%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages