How to get a functional HA Proemtheus setup?

Fisher Joe

unread,

May 30, 2021, 5:04:54 AM5/30/21

to Prometheus Users

Hello,

Let's say I am bumping all my prometheus instances to at least 2 replicas to achieve HA. I add Thanos, I have a global view, metrics are deduplicated, all is fine.

One day, I have a huge spike because of high cardinality metrics, and this will kill my prometheus instance. But wouldn't this kill all of my replicas as I am scraping the very same targets with each of those which is the best practice, right?

So I guess, all of the replicas will be killed by oomkill, and not event Thanos can really help me in this case. Or am I missing something?

Thanks in advance!

Another quick question: let's say I am using Istio. Would it make sense to add Thanos to an istio-injected namespace with envoy-proxies or the overhead/complexity is not necessary and there's no real advantage to do so?

Julien Pivotto

unread,

May 30, 2021, 9:46:20 AM5/30/21

to Fisher Joe, Prometheus Users

On 30 May 02:04, Fisher Joe wrote:
>
>
> Hello,
>
> Let's say I am bumping all my prometheus instances to at least 2 replicas
> to achieve HA. I add Thanos, I have a global view, metrics are
> deduplicated, all is fine.
>
> One day, I have a huge spike because of high cardinality metrics, and this
> will kill my prometheus instance. But wouldn't this kill all of my replicas
> as I am scraping the very same targets with each of those which is the best
> practice, right?
>
> So I guess, all of the replicas will be killed by oomkill, and not event
> Thanos can really help me in this case. Or am I missing something?

Hello,

First of all Prometheus provides multiple ways of protecting you: we
have limits on the number of targets, the number of metrics, the length
of labels.

Second, in my environment, my 2 prometheis servers have slightly
different size. One of them has slightly more ram and more vCPU. It
ensures that if a target is doing something bad, the second Prometheus
should stay alive a bit longer to alert us.

Also, we have meta monitoring in place. We have other means to verify
that Prometheus is running. In the field, you will see people using dead
man snitch features, or monitoring prometheus with a "meta" prometheus
server.

Regards,

>
> Thanks in advance!
>
> Another quick question: let's say I am using Istio. Would it make sense to
> add Thanos to an istio-injected namespace with envoy-proxies or the
> overhead/complexity is not necessary and there's no real advantage to do so?
>

> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/1aa4463e-51e9-497e-935f-4e7c2ede11e7n%40googlegroups.com.

--
Julien Pivotto
@roidelapluie

Fisher Joe

unread,

May 30, 2021, 11:25:39 AM5/30/21

to Prometheus Users

Yes, that might work, although how would you configure these efficiently via prometheus-operator configs/helm charts? These are aiming for identical configs.

Sharding might help, or not?

Reply all

Reply to author

Forward