I've noticed this a couple times and last time this happened I was sure to upgrade Alertmanager to 0.1.1. Alertmanager is using %80 - %125 of CPU, very sluggish to recognize silences/changes (slow AJAX responses I believe). It's dumping the following to the log but not often:
{"level":"error","msg":"Error on notify: context deadline exceeded","source":"notify.go:152","time":"2016-04-28T17:03:03Z"}
{"level":"error","msg":"Notify for 2011 alerts failed: context deadline exceeded","source":"dispatch.go:238","time":"2016-04-28T17:03:03Z"}
{"level":"error","msg":"Error on notify: context deadline exceeded","source":"notify.go:152","time":"2016-04-28T18:11:31Z"}
{"level":"error","msg":"Notify for 2011 alerts failed: context deadline exceeded","source":"dispatch.go:238","time":"2016-04-28T18:11:31Z"}
A restart used to get things back to normal, but the behavior seems to return quickly. Any clues here?
--
Jack Neely
Operations Engineer
42 Lines, Inc.