opening storage failed: read WAL: repair corrupted WAL: cannot handle error

4,229 views
Skip to first unread message

Pete Leese

unread,
Sep 18, 2018, 4:14:07 AM9/18/18
to Prometheus Users
Hi Guys,

Looks like one of my prometheus servers ran out of storage over the weekend - I've extended the volume however prometheus now fails to start.

caller=head.go:415 component=tsdb msg="encountered WAL error, attempting repair" err="read records: corruption in segment 573 at 64205313: unexpected checksum 5327a8, expected 970642dd"

caller=main.go:617 err="opening storage failed: read WAL: repair corrupted WAL: cannot handle error"

How do I handle this issue? 

Cheers

Pete






Chris Marchbanks

unread,
Sep 18, 2018, 6:49:59 PM9/18/18
to petel...@googlemail.com, Prometheus Users
Hi Pete,

The easiest thing to do would be to delete the wal files in your database. They are located at <data-directory>/wal/.
You could try to just delete segments >= 573 to keep as much WAL as possible, but I am not sure if that would work or not.

Let me know if that doesn't work,

Chris

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/4bafc1cf-7500-434d-be71-ae9101b8752c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Chris Marchbanks | Engineer
FreshTracks.io - Intelligent Alerting for Kubernetes and Prometheus

Pete Leese

unread,
Sep 19, 2018, 5:41:21 AM9/19/18
to Prometheus Users
Appears to be a known bug that will be fixed in 2.4.1 - expected to be released today. 
Reply all
Reply to author
Forward
0 new messages