Hi,
Essentially an automated workflow is called when I receive the above firing alert capturing it via webhook.
In that automated workflow, I'm stopping the app and restarting the whole ec2 instance. so here, up becomes 0 when I stop the app.
and http_server_requests_seconds_bucket metric is not available. thus it is resulting in resolution.(expected functionality from Prometheus resolved principle)
so I'm in a plan of using this rule:
(assuming A1 will run first and A2 later in the sequence)
groups:
- name: <name>
rules:
- alert: 'A1'
expr: histogram_quantile(.99, sum(rate(http_server_requests_seconds_bucket{uri="<uri>",status="200",appName="app"}[5m])) by (le,appName)) > bool 0.01
labels:
app_name: app
- alert: A2
expr: up{job="X"} == 0
labels:
app_name: app
1. A1 will run and throws a firing alert when the condition is met. and calls automated_worflow_1 that stops the app.
2. when the app is stopped --> A2 condition is met and fires alert that calls automated_workflow_2. and this workflow when called it restarts the instance.
3. once the instance is up. start the app --> (results in Resolved alert for A2 )
(In Between resolved alert can be thrown for the A1 as metric is not available when app is stopped.)
so is there a way, I can silent/ignore Resolved alert for only A1 in prometheus/alert manager? so that I can have Resolved alert for A2 alone, once the app is back.
Thanks,
Rashmitha