Typical usecase for using the "amtool alert add" command

114 views
Skip to first unread message

Al

unread,
Dec 14, 2020, 12:05:43 PM12/14/20
to Prometheus Users
Hi 

I realize alert conditions in a Promertheus ecosystem should be triggered from a prometheus instance itself although there is the "amtool alert add" command that can be used to manually trigger an alert.   Is this something which is commonly used in production use-cases?  I can see a benefit to using this command as I could still allow users to trigger alerts in a standardized way, but without having to have specific pre-defined alerting conditions.  There may also be situations where there is no metric collected but only an alert to be triggered in the situation a specific event occurs.

From my understanding, when prometheus fires an alert, it will send the payload to all instances of alert manager with in the cluster and then they will handle which instance will actually route the alert to the final destination (e.g.: Victorops, email, webook, etc).  If this is in fact correct, does this mean that amtool should also send the alert to all alertmanager instances within the cluster?

I appreciate any clarification you can provide me with.


Thanks

Stuart Clark

unread,
Dec 14, 2020, 12:52:53 PM12/14/20
to Al, Prometheus Users
That command is only intended for testing. Alerts aren't a one-off API
call from Prometheus to Alertmanager. Instead Prometheus will repeatedly
call every single Alertmanager periodically until the alert is cleared.
If Alertmanager stops receiving these updates it will mark the alert as
resolved.

Alerts in the Prometheus world are triggered based on the evaluation of
alerting rules, which themselves are queries which interrogate metrics.
Therefore every alert would be based on some sort of source metric
(there are a few exceptions, such as having an alert which always fires
to check the alerting pipeline for example).

For one of the example use cases you gave you said an alert should be
triggered if an event happens. Prometheus itself isn't an event system,
but you can create metrics from events. So in that case you'd have a
metric that could be a counter of the number of events that have
happened. Then your alert would fire when that value increases (for
example).

Are you able to give some more information on what use cases you are
trying to handle?

--
Stuart Clark

Al

unread,
Dec 16, 2020, 11:21:45 AM12/16/20
to Prometheus Users
Thanks for the quick response Stuart.  One of our specific use cases (although there will be more over time) would be something where a first or second level support team escalates an issue they can't solve t to the engineers responsible for the product.   In this case, there would be no metric as this is an event that could happen at any time for which we don't really want a metric.  Triggering the alert via  alertmanager seemed a logical choice as it already handles the logic of the routing to the necessary destinations (email, webhook, victorops, etc).  All the user would have to do us run the amtool command, with the necessary labels and wouldn't have to worry about any other specifics.  

Based on your explanation, I now understand alertmanager can't really be used that way.    Could you show me where in the AM sourcecode that it will close an alert unless it is continuously notified by Prometheus?  I'd like to know for my own personal knowledge.  

Now having considered these facts,  do you have any suggestions based on this example?  Is this just something we should handle separately with another custom application?  If that's the case, it's a bit discouraging as now that means we have to handle the logic of alert routing in more than one location.



Al

Laurent Dumont

unread,
Dec 16, 2020, 6:36:20 PM12/16/20
to Al, Prometheus Users
I don't think your use case is something that AM or Prometheus is looking to solve.

The way I see it :
  • Prometheus has metrics and alarm patterns.
  • It triggers an alarm and sends it to AM.
  • AM receives the alarm and does some basic routing based on labels.
  • Once the Prometheus pattern becomes false, a recovery is sent.
That's pretty much it. There is no concept of escalation, end to end service recovery, or service mapping inside AM or Prometheus.

In theory, you could have a "fake" alarm, where you send some json to AM, with specific flags and that triggers a specific route to send the SMS/email to the appropriate recipients. But I don't think it's really part of the core purpose of AM.

It's one of the values of a service like PagerDuty. But that still relies on Prometheus metric --> Prometheus alert triggered --> AM alert received --> AM sends the alert somewhere.

Just my 2 cents :)

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/74029e3f-d642-4174-9a19-646c23618430n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages