I am trying to work with the UP metrics to determine the number of times the service was down for less than a minute (potentially a network hiccup) during a time range (or per hour).
The best I got so far is up == 0 would give me a series with points only when the service was down but I am not sure what to do next.
Any help with this type of query would be greatly appreciated
Thanks.
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/5d88134d-7099-404b-9b1b-2bbd338b3f04%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I am however not sure I get the full picture of your suggestion. I need to do a count of number of times the up metric was ==0 for less than a minute and then get those in bins of hourly count.
On Thursday, July 6, 2017 at 1:11:57 PM UTC-4, Ben Kochie wrote:
> There are a number of functions that will help with this.
>
>
> My favorite is simply doing:
>
>
> avg_over_time(up[1h]) which will give you a float percent of the uptime.
>
>
>
>
> On Thu, Jul 6, 2017 at 6:49 PM, <awa...@gmail.com> wrote:
> Hi,
>
>
>
> I am trying to work with the UP metrics to determine the number of times the service was down for less than a minute (potentially a network hiccup) during a time range (or per hour).
>
>
>
> The best I got so far is up == 0 would give me a series with points only when the service was down but I am not sure what to do next.
>
>
>
> Any help with this type of query would be greatly appreciated
>
>
>
> Thanks.
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
>
> To post to this group, send email to prometheus...@googlegroups.com.
Thanks Ben for your quick reply.
I am however not sure I get the full picture of your suggestion. I need to do a count of number of times the up metric was ==0 for less than a minute and then get those in bins of hourly count.
On Thursday, July 6, 2017 at 1:11:57 PM UTC-4, Ben Kochie wrote:
> There are a number of functions that will help with this.
>
>
> My favorite is simply doing:
>
>
> avg_over_time(up[1h]) which will give you a float percent of the uptime.
>
>
>
>
> On Thu, Jul 6, 2017 at 6:49 PM, <awa...@gmail.com> wrote:
> Hi,
>
>
>
> I am trying to work with the UP metrics to determine the number of times the service was down for less than a minute (potentially a network hiccup) during a time range (or per hour).
>
>
>
> The best I got so far is up == 0 would give me a series with points only when the service was down but I am not sure what to do next.
>
>
>
> Any help with this type of query would be greatly appreciated
>
>
>
> Thanks.
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
>
> To post to this group, send email to prometheus...@googlegroups.com.
>
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/5d88134d-7099-404b-9b1b-2bbd338b3f04%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/4b8bbd3f-20eb-44aa-988f-e083567e11b1%40googlegroups.com.
Ah. Now I understand your point. Yes your approach is perfect for alerting and I will end using very shortly.
But unfortunately, I need to create a diagram to show to my network administrator demonstrating the number of times of micro-downtime.
On Thursday, July 6, 2017 at 2:23:51 PM UTC-4, Ben Kochie wrote:
> My idea is that instead of thinking about specific buckets, you can simplify things based on a SLO/SLA metric.
>
>
> Say you have a 15s scrape interval, that's samples per hour.
>
>
> If one lost sample per hour was ok (99.58%), you could set an alert for uptime average below 99.5%
>
>
> This is much easier to deal with than trying to line up buckets in the way you are trying to do.
>
>
> However, what you're asking for is possible.
>
>
> On Jul 6, 2017 20:01, <awa...@gmail.com> wrote:
> Thanks Ben for your quick reply.
>
>
>
> I am however not sure I get the full picture of your suggestion. I need to do a count of number of times the up metric was ==0 for less than a minute and then get those in bins of hourly count.
>
>
>
> On Thursday, July 6, 2017 at 1:11:57 PM UTC-4, Ben Kochie wrote:
>
> > There are a number of functions that will help with this.
>
> >
>
> >
>
> > My favorite is simply doing:
>
> >
>
> >
>
> > avg_over_time(up[1h]) which will give you a float percent of the uptime.
>
> >
>
> >
>
> >
>
> >
>
> > On Thu, Jul 6, 2017 at 6:49 PM, <awa...@gmail.com> wrote:
>
> > Hi,
>
> >
>
> >
>
> >
>
> > I am trying to work with the UP metrics to determine the number of times the service was down for less than a minute (potentially a network hiccup) during a time range (or per hour).
>
> >
>
> >
>
> >
>
> > The best I got so far is up == 0 would give me a series with points only when the service was down but I am not sure what to do next.
>
> >
>
> >
>
> >
>
> > Any help with this type of query would be greatly appreciated
>
> >
>
> >
>
> >
>
> > Thanks.
>
> >
>
> >
>
> >
>
> > --
>
> >
>
> > You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
>
> >
>
> > To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsubscri...@googlegroups.com.
>
> >
>
> > To post to this group, send email to prometheus...@googlegroups.com.
>
> >
>
> > To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/5d88134d-7099-404b-9b1b-2bbd338b3f04%40googlegroups.com.
>
> >
>
> > For more options, visit https://groups.google.com/d/optout.
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsubscri...@googlegroups.com.
>
> To post to this group, send email to prometheus...@googlegroups.com.
>
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/4b8bbd3f-20eb-44aa-988f-e083567e11b1%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsubscri...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/b9f7b7e1-6283-4ea6-8d77-ec01a3aedb80%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.