Silence disappear after restarts, despite PersistentClaimVolume

64 views
Skip to first unread message

Flavio Deroo

unread,
Dec 2, 2020, 8:10:07 AM12/2/20
to Prometheus Users
Hello all,

We use Prometheus + Alertmanager inside an EKS cluster. The whole setup works fine, except when we turn off the cluster at night, we lose the silences. They are not persisted.
Prometheus and Alertmanager are deployed as StatefulSets, with a volumeClaimTemplates.

Prometheus data is persisted fine in EBS using a StorageClass item in Kubernetes + volumeClaimTemplates in the StatefulSet + mountPath: /prometheus

But for alertmanager, it does not work. The EBS volume is mounted and attached to the pod, but the data doesnt seem to get persisted.

I have no clue why, and I have no idea how to debug this, no logs

Version / image  : prom/alertmanager:v0.20.0
Arguments :
- --config.file=/etc/alertmanager/config.yml
- --storage.path=/alertmanager

Mounted Volume :
volumeMounts:
- name: alertmanager
   mountPath: /alertmanager

Volume claim :
volumeClaimTemplates:
- metadata:
name: alertmanager
annotations:
volume.beta.kubernetes.io/storage-class: encrypted-alertmanager-ebs
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi

and finally storage class:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: encrypted-alertmanager-ebs
namespace: monitoring
provisioner: kubernetes.io/aws-ebs

I understand this question touches both Alertmanager configuration and Kubernetes, but I could not find a single definitive guide / doc on the alertmanager storage persisting options, and therefore I think it can be valuable for the community to solve this here

Flavio Deroo

unread,
Dec 2, 2020, 12:40:52 PM12/2/20
to Prometheus Users
I found the issue, and I will share it here if anymore come here with the same problem :

The problem is permission, make sure you add

securityContext:
   fsGroup: 65534

this is the user Prometheus uses behind the hood to write into files, see https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ for more details

One thing that threw me off, is that Alertmanager does not attempt to persist the silences instantly, it first caches it and then some time later run the Maintenance, which writes it in a file.
So the error log "Permission Denied" only appeared some 15 minutes after restarting my pod (that is why i missed this error log the first time).

Reply all
Reply to author
Forward
0 new messages