Hello team,
We are currently evaluating Thanos as a solution for horizontally scaling our Prometheus setup.
For Global rule evaluation with Thanos ruler, one has to make a tradeoff between availability and accuracy. For our use case, we favor accuracy compared to availability. But wondering if the tradeoff with availability can be improved
Thanos querier declares a response is partial when atleast one instance exposing Store APIs is down. Systems preferring accuracy will “abort” rule evaluations during partial responses. But considering a typical Prometheus HA setup contains replicas of Prometheus instances , it’s very inconvenient to abort alert rule evaluations every time any single replica is down. Any one instance could be down for various reasons(scheduled maintenance, patching, deployment etc).
Is there any way to improve the availability of Global alert rules?
Does it make sense to enhance the store APIs to be replica-aware? During partial responses, Can the querier indicate if there is an error in retrieving data from all replicas or the error is in receiving data from only subset of them.
Thanks
Thanos querier declares a response is partial when atleast one instance exposing Store APIs is down. Systems preferring accuracy will “abort” rule evaluations during partial responses. But considering a typical Prometheus HA setup contains replicas of Prometheus instances , it’s very inconvenient to abort alert rule evaluations every time any single replica is down. Any one instance could be down for various reasons(scheduled maintenance, patching, deployment etc).
Is there any way to improve the availability of Global alert rules?
Does it make sense to enhance the store APIs to be replica-aware? During partial responses, Can the querier indicate if there is an error in retrieving data from all replicas or the error is in receiving data from only subset of them.
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/9a22b98a-dc9e-4d16-aeac-004a677675fbn%40googlegroups.com.