> If something was worth alerting on then it's worth investigating: even if
> the alert condition is no longer present, it clearly was earlier. Just
> saying "oh look, it's gone away, never mind" is not helping to understand
> or fix the problem (with the system and/or with the alert itself).
> Seriously: turning off resolved messages is great. At very least, it
> reduces your notification volume by 50%.
As a counterpoint: we send resolved alerts so that we can know when a
problem stopped as well as when it started (which helps for diagnosis),
and so we can know that a problem is not happening *right now*, which
would make it more urgent for our environment and changes our response.
If a machine is down right now, we need to go get it back up. If a
machine went down and then came back up, we need to investigate why,
which involves a fairly different set of activities.
(But we're not a 24/7 operation where people are paged if something is
down; we're a university department running physical servers on a more
or less 8/5 basis.)
- cks