Alert history in Alert Manager

3,290 views
Skip to first unread message

Paul Traylor

unread,
Aug 20, 2017, 11:15:25 PM8/20/17
to Prometheus Developers
Investigating the different ways (and future direction) for keeping a log of alerts from Prometheus/Alertmanager

Seems like having a history was maybe discussed during the dev summit [1]


My initial thought was to route all messages from AlertManager through the default webhook functionality

Prometheus -> AlertManager -(web hook)-> Alert Log

Since I wanted to avoid duplicates and at the time, I don’t think IDs show up in the JSON data, I created my own ID as a hash of all the labels + startsAt to provide a fingerprint.


Django code

# Alert Model
class Alert(models.Model):
    id = models.CharField(max_length=64, primary_key=True)
    startsAt = models.DateTimeField()
    endsAt = models.DateTimeField()
    status = models.CharField(max_length=32)
    raw = models.TextField()

# Fingerprint
# Calculate a signature primarily based on the labels, but add some
# additional data to help with the uniqueness
signature = copy.deepcopy(body['labels'])
signature['startsAt'] = body['startsAt']
signature['generatorURL'] = body['generatorURL']
alert_id = hashlib.sha1(json.dumps(signature, sort_keys=True).encode('utf8')).hexdigest()



This typically worked ok, though I realized that I would never get an endsAt for alerts that were silenced, so I have since changed that to accept events directly from Prometheus

            |-> Alert Log
Prometheus — -> Alert Manager

This seems to work well, and as a bonus, it means I do not have to add my “AlertLog” web hook to every receiver in my AlertManager configuration. I did have to add some logic to keep track of Firing/Resolved since the Prometheus->AlertManager api does not have a status field, so I only know things are resolved when I get an updated endsAt value.


I’m submitting this to the developer’s mailing list as an example to start a discussion, to see if some kind of AlertLog functionality would be on the roadmap for AlertManager or some kind of shortcut logic (so you do not have to add -continue to each endpoint) for logging.

I notice that cloudflare has an AM->ES [2] plugin to do something along similar lines where they specifically note the affect silences have


Reply all
Reply to author
Forward
0 new messages