Acting on an alert automatically

3,050 views
Skip to first unread message

Konstantin Ivanov

unread,
Jun 15, 2016, 11:33:36 AM6/15/16
to Prometheus Developers
Is there a way based on an alert from AlertManager to execute a script. This is pretty helpful for let's say restart a process etc.

Matthias Rampke

unread,
Jun 15, 2016, 11:36:47 AM6/15/16
to Konstantin Ivanov, Prometheus Developers
Hey,

the way to do this is to wrap your script behind something that can
receive a webhook and act on it. This way, the process lifecycle is
decoupled. You can probably repurpose some web server's CGI handler
for this if you need to … or depending on your case it may be easier
to implement the logic straight in a custom service.

/MR

On Wed, Jun 15, 2016 at 3:33 PM, Konstantin Ivanov <kaiv...@gmail.com> wrote:
> Is there a way based on an alert from AlertManager to execute a script. This
> is pretty helpful for let's say restart a process etc.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-devel...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Matthias Rampke
Engineer

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany | +49
173 6395215

Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg |
HRB 110657B

Konstantin Ivanov

unread,
Jun 15, 2016, 12:15:15 PM6/15/16
to Prometheus Developers, kaiv...@gmail.com
So Alertmanager cannot be leveraged for this? It needs to be a separate service that pulls from Prometheus and executes the script ?


On Wednesday, June 15, 2016 at 10:36:47 AM UTC-5, Matthias Rampke wrote:
Hey,

the way to do this is to wrap your script behind something that can
receive a webhook and act on it. This way, the process lifecycle is
decoupled. You can probably repurpose some web server's CGI handler
for this if you need to … or depending on your case it may be easier
to implement the logic straight in a custom service.

/MR

On Wed, Jun 15, 2016 at 3:33 PM, Konstantin Ivanov <kaiv...@gmail.com> wrote:
> Is there a way based on an alert from AlertManager to execute a script. This
> is pretty helpful for let's say restart a process etc.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an

Matthias Rampke

unread,
Jun 15, 2016, 12:25:29 PM6/15/16
to Konstantin Ivanov, prometheus-developers

No, Alertmanager can send the web hooks, they are one of the available notifiers. You would not need to poll, only handle these hooks when they are called.

/MR

On Jun 15, 2016 18:15, "Konstantin Ivanov" <kaiv...@gmail.com> wrote:
So Alertmanager cannot be leveraged for this? It needs to be a separate service that pulls from Prometheus and executes the script ?

On Wednesday, June 15, 2016 at 10:36:47 AM UTC-5, Matthias Rampke wrote:
Hey,

the way to do this is to wrap your script behind something that can
receive a webhook and act on it. This way, the process lifecycle is
decoupled. You can probably repurpose some web server's CGI handler
for this if you need to … or depending on your case it may be easier
to implement the logic straight in a custom service.

/MR

On Wed, Jun 15, 2016 at 3:33 PM, Konstantin Ivanov <kaiv...@gmail.com> wrote:
> Is there a way based on an alert from AlertManager to execute a script. This
> is pretty helpful for let's say restart a process etc.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> For more options, visit https://groups.google.com/d/optout.



--
Matthias Rampke
Engineer

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany | +49
173 6395215

Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg  |
HRB 110657B

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.

Daniel Barker

unread,
Jun 15, 2016, 9:59:53 PM6/15/16
to Prometheus Developers
I have a go server set up to receive the webhook push. It currently uses some conditional logic to send me a page, but I'll eventually break it up into separate services.

ucn...@gmail.com

unread,
Jun 28, 2016, 4:02:43 PM6/28/16
to Prometheus Developers
On Wednesday, June 15, 2016 at 8:59:53 PM UTC-5, Daniel Barker wrote:
> I have a go server set up to receive the webhook push. It currently uses some conditional logic to send me a page, but I'll eventually break it up into separate services.

This is along the lines of what I am thinking of doing as well. Basically have the first alert only call the webhook, then the web hook will either fix the issue or create an alert which does get escalated to whatever the paging mechanism is. I'm also figuring on each service running in its own Go process and exporting "/metrics" for prometheus to then pull in. Meta-metrics, in a sense. ;)

Cheers,
Bill

Fabian Reinartz

unread,
Jun 28, 2016, 5:18:09 PM6/28/16
to ucn...@gmail.com, Prometheus Developers

Just a random thought: you could also setup routing for the same alerts twice. Once to your auto-fixing webhook and once to something that actually pages.
You make the wait interval for the latter reasonably long and it won't fire if the webhook fixed the issue by which the alerts got resolved.

Could be a semantically simpler alternative to creating extra alerts.


Bill Anderson

unread,
Jun 29, 2016, 2:45:17 PM6/29/16
to Fabian Reinartz, Prometheus Developers
That would certainly be one way to do it. They both have their merits. In my case I want to know the correction specifically failed and why, however. Plus each component is a discrete step. 

joshua...@pearson.com

unread,
May 29, 2018, 9:24:54 AM5/29/18
to Prometheus Developers
Hello. Can anyone please point me to any examples of some sort of application receiving an alert from alertmanager and then running some sort of automated task to resolve the problem?

i.e. if for a given alert the steps to resolve the problem are the same every time, this way if the alert fires in the middle of the night I don't need to be paged to resolve it. e.g. for K8SControllerManagerDown (https://github.com/coreos/prometheus-operator/blob/v0.19.0/contrib/kube-prometheus/assets/prometheus/rules/kube-controller-manager.rules.yaml) perhaps the automated action could be to do `kubectl delete` on the `kube-controller-manager` pod (and then page me if the pod doesn't come up in the "Running" status).

I Googled and the only thing I found is the following repo: https://github.com/imgix/prometheus-am-executor

Please let me know if I should instead/also post in: https://groups.google.com/forum/#!forum/prometheus-users

Any suggestions would be greatly appreciated. thanks!
Reply all
Reply to author
Forward
0 new messages