--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--On Thu, Jul 7, 2016 at 10:51 AM Matt Bostock <ma...@mattbostock.com> wrote:Hello,--I'm interested in anecdotes or suggestions for best practices for alert naming.Specifically, while Prometheus allows multiple alerting rules with the same name, have people found this useful and what side effects should I be aware of?I saw some discussion on GitHub:At this point I'm mostly interested in the practical impact of having multiple alert rules with the same name.For example, if I have multiple alerts called 'DiskUsage', each targeting different mountpoints on different machines, the benefits are:- alert name easier to read- inhibition rules easier to configure in AlertManager- I can also still distinguish the original alerting rules in the ALERTS metric using labelsI guess the problem comes if I have two alerting rules for disk usage, one targeting all nodes and another targetting a specific node, with different thresholds. What happens in that case? I guess the order in which the alerting rules appear in the rules files is significant in that case?Thanks,Matt
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Another alerting pattern we (sparingly) use to deal with differences in thresholds are metrics (sometimes exported by the service itself, sometimes just constant rules) for the thresholds. These are then used in the one all-encompassing alert expression.
With some generous application of OR you can even have default thresholds.
We usually declare warning and critical alerts with the same name distinguish by a "severity" label. We almost always ensure a "service" label is set, either explicitly in the alert rule through relabelling/label_replace. These two together are the base case of our alert routing.
/MR