Alert Manager in a Prometheus/Thanos deployment

1,656 views
Skip to first unread message

George

unread,
Aug 28, 2020, 3:20:18 AM8/28/20
to Prometheus Users
Hi all

Having looked around, Mostly Thanos web site. Please confirm my understanding.

In a multi Prometheus server environment, with Thanos deployed for federation.

Option #1
I have a single Alert Manager... the Alert manager is then configured/told about every Prometheus server (data store) in my environment,

Option #2
Configure the Alertmanager with a Thanus Ruler + Query (module) which then virtualizes the entire prometheus stack (aka I hide my prometheus servers behind Thanos Rules/Query).

Comment

G

--
You have the obligation to inform one honestly of the risk, and as a person
you are committed to educate yourself to the total risk in any activity!

Once informed & totally aware of the risk,
every fool has the right to kill or injure themselves as they see fit!
1*l_5E_Ap4Ps5Ys6zDR73x3Q.png
prur_1801.png

David Leadbeater

unread,
Aug 28, 2020, 3:39:31 AM8/28/20
to George, Prometheus Users
First, Prometheus pushes alerts to alertmanager (i.e. each Prometheus server sends to alertmanager). So for #1 every Prometheus is told about your alertmanager.

For #2 you are concentrating your alerts into one place, if the data for those alerts is available on Prometheus instances then you can make a more reliable setup by just raising the alert locally to where the data is first available. See in particular “risk” in the thanos ruler docs: 

The ideal solution is likely a combination of #1 and #2, where alerts can be raised by Prometheus if the data is available on Prometheus, or Thanos ruler for alerts about aggregated data (where the ruler can send to the same alertmanager as the Prometheus instances do).

The key thing to think is “if I have a network partition or other failure, will I still get alerts?” — this is easily true for Prometheus (as rules are running on local data), Thanos depends a lot on your architecture.

David  

--


You received this message because you are subscribed to the Google Groups "Prometheus Users" group.


To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.


To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CALw5UjsRzceN0o%3D8EBCP1R1gscj3OrntJoJCuApZw_LkEFuuzw%40mail.gmail.com.


Ben Kochie

unread,
Aug 28, 2020, 4:42:22 AM8/28/20
to George, Prometheus Users
You always want "one" alertmanager (usually in a cluster of 2 or more). The alertmanager is meant to be a single clearing house per organization. A place where you can see, silence, and route all alerts.

This is a separate concern from where you _generate_ alerts.

In a multi-Prometheus environment, the best practice is to generate alerts as close to the data as possible.
* If the data you have is in Prometheus, run the alert rules in Prometheus and have it generate alerts.
* If the data you have is cross-Prometheus in and only in Thanos, run the alert rules in Thanos Rule.

--
Reply all
Reply to author
Forward
0 new messages