./prometheus -config.file=prometheus.yml -alertmanager.url http://localhost:9095,http://localhost:9094,http://localhost:9093
Hi,Seems AlertManager supports HA by creating a mesh of AlertManagers [1]. The Prometheus server, can then point to each of these instances from the AlertManager mesh by specifying their URLs in the alertmanager.url as follows[[./prometheus -config.file=prometheus.yml -alertmanager.url http://localhost:9095,http://localhost:9094,http://localhost:9093
]]However, when the config file for one of the AlertManagers in the mesh is changed, it seems the change is not replicated across other instances in the mesh ?
Would appreciate if someone could highlight the advantages of using a mesh of AlertManager ?
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/96eaf9e3-9e4e-41d2-a1e1-746add555b83%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
On 8 May 2017 at 10:25, Rahul Srivastava <norm...@gmail.com> wrote:Hi,Seems AlertManager supports HA by creating a mesh of AlertManagers [1]. The Prometheus server, can then point to each of these instances from the AlertManager mesh by specifying their URLs in the alertmanager.url as follows[[./prometheus -config.file=prometheus.yml -alertmanager.url http://localhost:9095,http://localhost:9094,http://localhost:9093
]]However, when the config file for one of the AlertManagers in the mesh is changed, it seems the change is not replicated across other instances in the mesh ?That's correct. You need to update all of them.
Would appreciate if someone could highlight the advantages of using a mesh of AlertManager ?If one AM goes down or there's a network partition, notifications will still happen.
Brian Brazil
On Monday, 8 May 2017 15:06:43 UTC+5:30, Brian Brazil wrote:On 8 May 2017 at 10:25, Rahul Srivastava <norm...@gmail.com> wrote:Hi,Seems AlertManager supports HA by creating a mesh of AlertManagers [1]. The Prometheus server, can then point to each of these instances from the AlertManager mesh by specifying their URLs in the alertmanager.url as follows[[./prometheus -config.file=prometheus.yml -alertmanager.url http://localhost:9095,http://localhost:9094,http://localhost:9093
]]However, when the config file for one of the AlertManagers in the mesh is changed, it seems the change is not replicated across other instances in the mesh ?That's correct. You need to update all of them.So it seems they are just a set of individual AlertManagers running.
Would appreciate if someone could highlight the advantages of using a mesh of AlertManager ?If one AM goes down or there's a network partition, notifications will still happen.Well, IIUC, the AlertManagers need not necessarily be in a mesh from that perspective. Prometheus is configured to point to the URL/s for all the AlertManagers, so if one AlertManager goes down, Prometheus can still send Alerts to the other AlertManagers. How does it really matter if those AlertManagers are part of a mesh or not ? -- notification would be sent out, if there is atleast one AlertManager alive. So that brings us back to -- what is the advantage that *mesh* offers vs individual AlertManagers ?
Thanks,Rahul.--Brian Brazil
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/ff42dcc1-f4ab-4a4c-8e41-3b8053c0be04%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsubscri...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/ff42dcc1-f4ab-4a4c-8e41-3b8053c0be04%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAHJKeLqW0fuqbwtNg2Ni1OkGi6d5Fnrp-ssauCHMNi1E%2Be1NZQ%40mail.gmail.com.
On 8 May 2017 at 11:03, Rahul Srivastava <norm...@gmail.com> wrote:On Monday, 8 May 2017 15:06:43 UTC+5:30, Brian Brazil wrote:On 8 May 2017 at 10:25, Rahul Srivastava <norm...@gmail.com> wrote:Hi,Seems AlertManager supports HA by creating a mesh of AlertManagers [1]. The Prometheus server, can then point to each of these instances from the AlertManager mesh by specifying their URLs in the alertmanager.url as follows[[./prometheus -config.file=prometheus.yml -alertmanager.url http://localhost:9095,http://localhost:9094,http://localhost:9093
]]However, when the config file for one of the AlertManagers in the mesh is changed, it seems the change is not replicated across other instances in the mesh ?That's correct. You need to update all of them.So it seems they are just a set of individual AlertManagers running.Largely.
Would appreciate if someone could highlight the advantages of using a mesh of AlertManager ?If one AM goes down or there's a network partition, notifications will still happen.Well, IIUC, the AlertManagers need not necessarily be in a mesh from that perspective. Prometheus is configured to point to the URL/s for all the AlertManagers, so if one AlertManager goes down, Prometheus can still send Alerts to the other AlertManagers. How does it really matter if those AlertManagers are part of a mesh or not ? -- notification would be sent out, if there is atleast one AlertManager alive. So that brings us back to -- what is the advantage that *mesh* offers vs individual AlertManagers ?In normal operations, you only get one notification no matter how many AMs are in the mesh. With n separate AMs, you'd get n notifications.
Brian
Thanks,Rahul.--Brian Brazil
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsubscri...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/ff42dcc1-f4ab-4a4c-8e41-3b8053c0be04%40googlegroups.com.
--Brian Brazil
See how this works in Fabian's 5-minute lightning talk from PromCon 2016: https://www.youtube.com/watch?v=XvqaYbiTOMg
On Mon, May 8, 2017 at 3:36 PM, Brian Brazil <brian.brazil@robustperception.io> wrote:On 8 May 2017 at 11:03, Rahul Srivastava <norm...@gmail.com> wrote:On Monday, 8 May 2017 15:06:43 UTC+5:30, Brian Brazil wrote:On 8 May 2017 at 10:25, Rahul Srivastava <norm...@gmail.com> wrote:Hi,Seems AlertManager supports HA by creating a mesh of AlertManagers [1]. The Prometheus server, can then point to each of these instances from the AlertManager mesh by specifying their URLs in the alertmanager.url as follows[[./prometheus -config.file=prometheus.yml -alertmanager.url http://localhost:9095,http://localhost:9094,http://localhost:9093
]]However, when the config file for one of the AlertManagers in the mesh is changed, it seems the change is not replicated across other instances in the mesh ?That's correct. You need to update all of them.So it seems they are just a set of individual AlertManagers running.Largely.So it seems the participants in an AletManager mesh can all have different config ?
-- If so, how does the aggregation of the Alert works ? -- say the group_wait, etc. may have different values in each of the AlertManager participant in the mesh.
Would appreciate if someone could highlight the advantages of using a mesh of AlertManager ?If one AM goes down or there's a network partition, notifications will still happen.Well, IIUC, the AlertManagers need not necessarily be in a mesh from that perspective. Prometheus is configured to point to the URL/s for all the AlertManagers, so if one AlertManager goes down, Prometheus can still send Alerts to the other AlertManagers. How does it really matter if those AlertManagers are part of a mesh or not ? -- notification would be sent out, if there is atleast one AlertManager alive. So that brings us back to -- what is the advantage that *mesh* offers vs individual AlertManagers ?In normal operations, you only get one notification no matter how many AMs are in the mesh. With n separate AMs, you'd get n notifications.That makes a lot of sense.Btw, is there a Primary node in the AlertManager mesh ? If so, what happens when that primary goes down ?
Primary in Mesh: ./alertmanager ... -mesh.listen-address=:8001Participant in Mesh: ./alertmanager ... -mesh.peer=127.0.0.1:8001Thanks,Rahul.Brian--Thanks,Rahul.--Brian Brazil
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsubscri...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/ff42dcc1-f4ab-4a4c-8e41-3b8053c0be04%40googlegroups.com.--Brian Brazil
On 8 May 2017 at 11:46, Rahul Srivastava (र।हुल श्रीवास्तव) <norm...@gmail.com> wrote:On Mon, May 8, 2017 at 3:36 PM, Brian Brazil <brian.brazil@robustperception.io> wrote:On 8 May 2017 at 11:03, Rahul Srivastava <norm...@gmail.com> wrote:On Monday, 8 May 2017 15:06:43 UTC+5:30, Brian Brazil wrote:On 8 May 2017 at 10:25, Rahul Srivastava <norm...@gmail.com> wrote:Hi,Seems AlertManager supports HA by creating a mesh of AlertManagers [1]. The Prometheus server, can then point to each of these instances from the AlertManager mesh by specifying their URLs in the alertmanager.url as follows[[./prometheus -config.file=prometheus.yml -alertmanager.url http://localhost:9095,http://localhost:9094,http://localhost:9093
]]However, when the config file for one of the AlertManagers in the mesh is changed, it seems the change is not replicated across other instances in the mesh ?That's correct. You need to update all of them.So it seems they are just a set of individual AlertManagers running.Largely.So it seems the participants in an AletManager mesh can all have different config ?Yes, don't do that.
-- If so, how does the aggregation of the Alert works ? -- say the group_wait, etc. may have different values in each of the AlertManager participant in the mesh.You may get duplicate alerts.
Would appreciate if someone could highlight the advantages of using a mesh of AlertManager ?If one AM goes down or there's a network partition, notifications will still happen.Well, IIUC, the AlertManagers need not necessarily be in a mesh from that perspective. Prometheus is configured to point to the URL/s for all the AlertManagers, so if one AlertManager goes down, Prometheus can still send Alerts to the other AlertManagers. How does it really matter if those AlertManagers are part of a mesh or not ? -- notification would be sent out, if there is atleast one AlertManager alive. So that brings us back to -- what is the advantage that *mesh* offers vs individual AlertManagers ?In normal operations, you only get one notification no matter how many AMs are in the mesh. With n separate AMs, you'd get n notifications.That makes a lot of sense.Btw, is there a Primary node in the AlertManager mesh ? If so, what happens when that primary goes down ?Kinda. That's handled gracefully, there'll be a new AM that's get first shot at sending notifications.
--Brian Brazil
On Mon, May 8, 2017 at 4:27 PM, Brian Brazil <brian.brazil@robustperception.io> wrote:On 8 May 2017 at 11:46, Rahul Srivastava (र।हुल श्रीवास्तव) <norm...@gmail.com> wrote:Btw, is there a Primary node in the AlertManager mesh ? If so, what happens when that primary goes down ?Kinda. That's handled gracefully, there'll be a new AM that's get first shot at sending notifications.Interesting -- so the mesh is not broken when the (so called) Primary goes down. If that's the case, then the Prmary isn't really a Primary (I guess that's what you meant by -- kinda :-))Sorry, but I didn't understand this completely -- when the primary goes down (which was listening on 8001, say), port 8001 no longer exists. So how come the other participants are still gossiping on 8001 in the mesh ?
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/4c93700c-2b9e-483a-8629-98022eb41751%40googlegroups.com.
See the help text description (./alertmanager -h) of the -mesh.peer flag: "initial peers (may be repeated)"You will need to list that flag three times, once for each of the Alertmanagers. Otherwise indeed everything is only connected to the first one.
--
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsubscri...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
On Tue, May 9, 2017 at 4:12 PM, Julius Volz <juliu...@gmail.com> wrote:See the help text description (./alertmanager -h) of the -mesh.peer flag: "initial peers (may be repeated)"You will need to list that flag three times, once for each of the Alertmanagers. Otherwise indeed everything is only connected to the first one.So far I was under the impression that when a new participant joins an existing mesh, it gets connected to every single member in the mesh, automatically. But seems like that is not the case. If every alertmanager should explicitly specify *all* the alertmanagers it should connect to in the mesh, then any new participant joining the mesh, should somehow figure out all the running alertmanagers and the mesh.listen-address for each of those alertmanagers so that it can specify multiple mesh.peer flags on startup ? This may get tricky when alertmanagers in a mesh are scaled dynamically based on load.
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CA%2BT6YoyXVzL1a1PzwE3AzZd7D3Gu8tDkEV3dpQtn9XVJJGdrVg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAGXP4-SN%3D0aZOfUOksuuy_5RU8L6sSME-2bwtL0khhQBW6DHpg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAL%2BpMaAC4mo%3DaD1D7PbUcK1S0UAQ2YqyzhE%2BjND7tt_8ys0MtQ%40mail.gmail.com.
After some debugging and circling back with folks from Weave, the problem simply seems to be running all instances on the same host/IP.The general assumption is that all peers use the same mesh port
or all possible ports are part of the initial peer list.
Three instances running in the same host on 3 different ports, where just one of them is provided in the initial peer list, doesn't fulfil that.Hence B and C never initiate a connection between themselves and depend on A.Summed up, this problem should never really occur in a valid setup across multiple nodes. I'm still not 100% sure how this technical limitation comes to be, but I don't think it affects any production deployments.
On Tue, May 9, 2017 at 2:35 PM Julius Volz <juliu...@gmail.com> wrote:
--On Tue, May 9, 2017 at 2:00 PM, Rahul Srivastava (र।हुल श्रीवास्तव) <norm...@gmail.com> wrote:On Tue, May 9, 2017 at 4:12 PM, Julius Volz <juliu...@gmail.com> wrote:See the help text description (./alertmanager -h) of the -mesh.peer flag: "initial peers (may be repeated)"You will need to list that flag three times, once for each of the Alertmanagers. Otherwise indeed everything is only connected to the first one.So far I was under the impression that when a new participant joins an existing mesh, it gets connected to every single member in the mesh, automatically. But seems like that is not the case. If every alertmanager should explicitly specify *all* the alertmanagers it should connect to in the mesh, then any new participant joining the mesh, should somehow figure out all the running alertmanagers and the mesh.listen-address for each of those alertmanagers so that it can specify multiple mesh.peer flags on startup ? This may get tricky when alertmanagers in a mesh are scaled dynamically based on load.Actually it seems I was misinformed on this. I had assumed that the Mesh library we're using only ensures that gossip messages make it from one node to all other nodes as long as there is a statically configured path between the nodes (not even necessarily a full mesh). I thought that would be ok, as AM would normally be operated as a relatively static cluster.However, I've just learned from Peter (author of the Weave Mesh lib) and Fabian that memberships should also be gossiped, as you expected. So the question is why this is not working here.
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsubscri...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
Agreed.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.