On 07.02.23 05:57, 'George Robinson' via Prometheus Developers wrote:
>
> While I appreciate the responsibility of writing correct templates is on
> the user, I have also been considering whether Alertmanager should be more
> tolerant of template errors, and attempt to send some kind of notification
> when this happens. For example, falling back to the default template that
> we have high confidence of being correct.
I think that makes sense. The fall-back template could call out very
explicitly that the intended template failed to expand and therefore
you get a replacement, maybe even with the error message of the
attempt to expand the original template.
But I'm not really an Alertmanager experts. And despite having a lot
of historical context about Prometheus in general, I don't remember
anything specific about error handling in alert templates.
I only remember that trying out an alert "in production" is really
hard since you need to trigger it. And if the moment you notice that
your template doesn't work is also the moment when your alert is
supposed to fire, that's really bad.
So better test tooling might help here, but even if we had that, I
think there should be a safe fall-back so that no alert is ever
swallowed because of a templating error.
--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email]
bjo...@rabenste.in