Alerts for all containers using cadvisor

3,761 views
Skip to first unread message

thiago.t...@bravi.com.br

unread,
Nov 16, 2015, 1:41:49 PM11/16/15
to Prometheus Developers
Hello :)

I've set up a Prometheus instance using docker, I also set up an cadvisor instance to export data to Prometheus.

I'm trying to set an alert rule to fire when any of my running containers stop, but it only works when I specify the container name.
The problem is that I have a lot of running containers and it would take too long to set all of them up, I've already tried a filter in the query like this:

absent(((time() - container_last_seen{name=~"dockito"}) < 5))

and my current and working rule looks like this:

ALERT dockit
IF absent(((time() - container_last_seen{name="dockito-vault"}) < 5))
FOR 15s
SUMMARY "Instance {{$labels.instance}} down"
DESCRIPTION "{{$labels.instance}} of job {{$labels.job}} has been down for more than 5 minutes."

Can anyone help me?

Thanks in advance.

Matthias Rampke

unread,
Nov 16, 2015, 1:53:47 PM11/16/15
to thiago.t...@bravi.com.br, Prometheus Developers
How long after a container is gone will cadvisor report
`container_last_seen`? If it stays for a while (or anyway because
Prometheus will also hold on to the metric I think), you could just
change it to

ALERT ContainerGoneFishing
IF time() - container_last_seen{name=~"dockito"} > 5
FOR 15s


Two general remarks though – FOR is optional, since the alert is
expressed in a time difference anyway why not fold it in there?

Secondly, that the container is there doesn't mean your application is
healthy. On the contrary, I would just set the restart policy and stop
worrying about it exiting unexpectedly; on the other hand you should
use Prometheus to monitor the application itself (if you can add the
metrics endpoint to it) or an exporter that gets the info out of it in
whatever way it can.

When Prometheus scrapes the service directly, you can use the "up"
metric as a proxy for "it's there" which is not the full story but
much closer than "there is a container of that name".

/MR
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Matthias Rampke
Engineer

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany | +49
173 6395215

Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg |
HRB 110657B

Thiago Trennepohl

unread,
Nov 16, 2015, 2:09:58 PM11/16/15
to Matthias Rampke, Prometheus Developers
You are right, I'll look for ways of checking  the  application itself, it's really better than "there is a container of that name".

Thank you ;)

Paul Letski

unread,
Mar 28, 2016, 6:02:18 AM3/28/16
to Prometheus Developers, m...@soundcloud.com, thiago.t...@bravi.com.br
Hi all! 

I have exactly the same question, so I chose not to start a new thread :)
I have several containers with common part in names, on different instances. And I still need to know if any of this containers are down. 

I created alert rule:

ALERT ContainerIsDown
    IF absent(((time() - container_last_seen{name=~".*commonpart.*"}) < 5)) FOR 15s
    LABELS {
        instance = "vm1",
        }
   ANNOTATIONS {
        summary = "Container {{$labels.name}} is down",
        description = "You should do something"
    }

Here is my question. How can I get information in alert what container on what instance is down? Is it possible without labels? Or maybe I can use variables in labels? 
For now I'm getting messages like this: "[FIRING:1] ContainerIsDown (vm1)", but I want to get them like this "[FIRING:1] ContainerIsDown (vm1 container_name)". Is it possible?

Thank you in advance!!
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.

Brian Brazil

unread,
Mar 28, 2016, 6:32:29 AM3/28/16
to Paul Letski, Prometheus Developers, Matthias Rampke, thiago.t...@bravi.com.br
On 28 March 2016 at 11:02, Paul Letski <let...@gmail.com> wrote:
Hi all! 

I have exactly the same question, so I chose not to start a new thread :)
I have several containers with common part in names, on different instances. And I still need to know if any of this containers are down. 

I created alert rule:

ALERT ContainerIsDown
    IF absent(((time() - container_last_seen{name=~".*commonpart.*"}) < 5)) FOR 15s
    LABELS {
        instance = "vm1",
        }
   ANNOTATIONS {
        summary = "Container {{$labels.name}} is down",
        description = "You should do something"
    }

Here is my question. How can I get information in alert what container on what instance is down? Is it possible without labels? Or maybe I can use variables in labels? 
For now I'm getting messages like this: "[FIRING:1] ContainerIsDown (vm1)", but I want to get them like this "[FIRING:1] ContainerIsDown (vm1 container_name)". Is it possible?

The default alert notification has the values of all common labels, so I'm guessing you have more than one alert firing here. You'll need a custom notification template in the alertmanager to do this. See http://prometheus.io/blog/2016/03/03/custom-alertmanager-templates/

Brian
 

Thank you in advance!!
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.

> For more options, visit https://groups.google.com/d/optout.



--
Matthias Rampke
Engineer

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany | +49
173 6395215

Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg  |
HRB 110657B

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Paul Letski

unread,
Mar 28, 2016, 7:15:40 AM3/28/16
to Prometheus Developers, let...@gmail.com, m...@soundcloud.com, thiago.t...@bravi.com.br

Thank you! Don't know how did I miss it :)

Thank you in advance!!
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.

> For more options, visit https://groups.google.com/d/optout.



--
Matthias Rampke
Engineer

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany | +49
173 6395215

Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg  |
HRB 110657B

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Reply all
Reply to author
Forward
0 new messages