Replicas more than 2 corrupting the wal directory

19 views
Skip to first unread message

Mohan Nagandlla

unread,
Dec 21, 2020, 11:52:58 PM12/21/20
to Prometheus Users
HI team I am using the Prometheus instance having the more than 1 replica, When replica as 1 there is no wal corruption in data directory and now for the sake of zero down time updates for instance I make the replicas count as 2 the instance is up now but the wal corruptions are happening the logs are below
level=error ts=2020-12-22T04:36:00.860Z caller=scrape.go:1076 component="scrape manager" scrape_pool=depl/node-exporter/0 target=http://x.x.x.x:9100/metrics msg="Scrape commit failed" err="write to WAL: log samples: write /prometheus/wal/00000003: stale NFS file handle"
level=error ts=2020-12-22T04:36:00.862Z caller=scrape.go:1076 component="scrape manager" scrape_pool=depl/prometheus-kubelet/0 target=https://x.x.x.x:10250/metrics msg="Scrape commit failed" err="write to WAL: log samples: write /prometheus/wal/00000003: stale NFS file handle"
level=error ts=2020-12-22T04:36:00.881Z caller=scrape.go:1076 component="scrape manager" scrape_pool=depl/prometheus-kubelet/0 target=https://x.x.x.x:10250/metrics msg="Scrape commit failed" err="write to WAL: log samples: write /prometheus/wal/00000003: stale NFS file handle"
level=error ts=2020-12-22T04:36:00.898Z caller=scrape.go:1076 component="scrape manager" scrape_pool=depl/prometheus-kubelet/0 target=https://x.x.x.x:10250/metrics msg="Scrape commit failed" err="write to WAL: log samples: write /prometheus/wal/00000003: stale NFS file handle"
level=error ts=2020-12-22T04:36:00.970Z caller=scrape.go:1076 component="scrape manager" scrape_pool=depl/node-exporter/0 target=http://x.x.x.x:9100/metrics msg="Scrape commit failed" err="write to WAL: log samples: write /prometheus/wal/00000003: stale NFS file handle"

Getting more logs like this at one replica no errors but when i am using the more than 1 replica getting above errors.

Or is there any other way for prometheus zero down time and why does i am getting this errors but if i used the replicas as 1 there is no errors in data directory this is happening more than 1 replica

Thank you
mohan nagandlla

Stuart Clark

unread,
Dec 22, 2020, 3:36:51 AM12/22/20
to Mohan Nagandlla, Prometheus Users

Prometheus must not share a data directory with another running instance, as you will see data corruption. Each Prometheus instance must have a unique data directory. Additionally NFS isn't supported, so you should use a local hard drive/EBS volume.

Reply all
Reply to author
Forward
0 new messages