On 30 May 02:04, Fisher Joe wrote:
>
>
> Hello,
>
> Let's say I am bumping all my prometheus instances to at least 2 replicas
> to achieve HA. I add Thanos, I have a global view, metrics are
> deduplicated, all is fine.
>
> One day, I have a huge spike because of high cardinality metrics, and this
> will kill my prometheus instance. But wouldn't this kill all of my replicas
> as I am scraping the very same targets with each of those which is the best
> practice, right?
>
> So I guess, all of the replicas will be killed by oomkill, and not event
> Thanos can really help me in this case. Or am I missing something?
Hello,
First of all Prometheus provides multiple ways of protecting you: we
have limits on the number of targets, the number of metrics, the length
of labels.
Second, in my environment, my 2 prometheis servers have slightly
different size. One of them has slightly more ram and more vCPU. It
ensures that if a target is doing something bad, the second Prometheus
should stay alive a bit longer to alert us.
Also, we have meta monitoring in place. We have other means to verify
that Prometheus is running. In the field, you will see people using dead
man snitch features, or monitoring prometheus with a "meta" prometheus
server.
Regards,
>
> Thanks in advance!
>
> Another quick question: let's say I am using Istio. Would it make sense to
> add Thanos to an istio-injected namespace with envoy-proxies or the
> overhead/complexity is not necessary and there's no real advantage to do so?
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
prometheus-use...@googlegroups.com.
> To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/1aa4463e-51e9-497e-935f-4e7c2ede11e7n%40googlegroups.com.
--
Julien Pivotto
@roidelapluie