Alerts need to be triggered only on weekend

1,445 views
Skip to first unread message

Kishore kumar saha

unread,
Aug 5, 2021, 12:22:47 AM8/5/21
to Prometheus Users
Hello All,

I want to set up an alert monitoring for a job/service which is running only on weekends.
Alerts should trigger during weekend if the service is not running or goes down.

Tried with below query (just for testing, in actual query value would be 0). PFA

d********_status{job="targets",name="d*-*******ise-*****re-coord"} == 1 and on() day_of_week() == 6 or day_of_week() == 0 

Also in alert rules applied if condition
      - alert: "[EMR]: Java coordinator is down"
        if: day_of_week() == 0 OR day_of_week() == 6

But neither of them working.

Thanks,
Kishore
Screen Shot 2021-08-05 at 9.33.55 AM.png
Screen Shot 2021-08-05 at 9.33.55 AM.png

Brian Candler

unread,
Aug 5, 2021, 4:34:22 AM8/5/21
to Prometheus Users
Please don't just say "it doesn't work".  Say what happened.  Did you see an error message?  Did the alert fire when you were not expecting it, or not fire when you were expecting it?

Adding "if: ..." to an alerting rule shouldn't work - I cannot see that syntax anywhere in the documentation.  You need to put the logic into the expr: and you can debug your expr using the prometheus query GUI (*).

Adding parentheses might help.  I suspect that
foo and on() bar or baz
will be parsed as
(foo and (on() bar)) or baz
which is not what you require.

Because you're using on() I think you're already aware that "and", "or" and "==" don't work the way that newcomers expect.  To summarize:

foo : is a set of timeseries.
foo{bar="baz"} : is the same set of timeseries, filtered down to only those with label bar="baz".
foo{bar="baz"} == 1 : is the same set of timeseries, filtered down to only those with label bar="baz" and whose value is 1.

foo and bar : is the set of timeseries from foo, filtered to only those where bar exists with the exact same set of labels (but any value). Given that day_of_week() == 6 is a scalar value (6) it has no labels.

(*) You can test your expression in the PromQL browser like this: try selecting the graph view and scrolling out so you display 2 weeks.

up == 1  # ok
up == 1 and day_of_week() == 4  # no results
up == 1 and on() day_of_week() == 4  # works
up == 1 and on () day_of_week() == 4 or day_of_week() == 5  # unexpected: gives value 1 or value 5
up == 1 and on () (day_of_week() == 4 or day_of_week() == 5)  # correct

I think this confirms that your problem is lack of parentheses.

Brian Candler

unread,
Aug 5, 2021, 4:35:32 AM8/5/21
to Prometheus Users
Correction:

I suspect that
foo and on() bar or baz
will be parsed as
(foo and on() bar) or baz
which is not what you require.

Kishore kumar saha

unread,
Aug 6, 2021, 7:58:54 AM8/6/21
to Brian Candler, Prometheus Users
Hello Brian, 

Thank you so much  for your input. I have already tried with that option below, but not sure if the output is expected.
> up == 1 and on () (day_of_week() == 4 or day_of_week() == 5)  # correct.

Please find the attachment. 

Thanks,
Kishore

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/8209afab-919a-4434-9e37-a69e14b6d9bfn%40googlegroups.com.
Screen Shot 2021-08-06 at 5.18.48 PM.png

Brian Candler

unread,
Aug 6, 2021, 8:10:57 AM8/6/21
to Prometheus Users
That looks correct to me.  What were you expecting to see instead?

Alerts trigger when the expression returns one or more timeseries, with *any* value.  Even an expression with value 0 will trigger an alert.  So to prevent triggering the alert, the expression has to return an empty set of timeseries (i.e. no result).

You should find that
foo and on () (day_of_week() == 0 or day_of_week() == 6)
masks the expression foo so that the result is empty except for those two days of the week.

Try comparing the graphs of:

datapipeline_coordinator_status{...blah...} == 1
datapipeline_coordinator_status{...blah...} == 1 and on () (day_of_week() == 0 or day_of_week() == 6)

Does it make sense now?

(You didn't describe the meaning of the datapipeline_coordinator_status metric, but I'm presuming from that expression that you want it to alert when the value is 1)

Kishore kumar saha

unread,
Aug 6, 2021, 9:03:11 AM8/6/21
to Brian Candler, Prometheus Users
Hello Brian,

Thanks once again.
I want it to alert me when the value is 0. Actually the job runs only on weekends, so in case it goes down it should trigger an alert.
So the actual expression is below

datapipeline_coordinator_status{job="targets",name="********-coord"} == 0 and on() (day_of_week() == 6 or day_of_week() == 0)

Regards,
Kishore

Brian Candler

unread,
Aug 6, 2021, 9:05:04 AM8/6/21
to Prometheus Users
That looks good to me.  Entering that into the promQL browser will show values (of 0) when the alert would have fired, and gaps where it wouldn't have fired.

Benjamin Ridley

unread,
Aug 6, 2021, 10:36:20 PM8/6/21
to Prometheus Users
Hi Kishore,

You could also use Alertmanager's time based muting to setup an alert schedule for this. This is a different approach to doing it in Prometheus as the alert will still fire in Prometheus, but Alertmanager will prevent you being notified about it.

Just in case you weren't aware of all the options.

Cheers,
Ben
Reply all
Reply to author
Forward
0 new messages