Deleting all segments newer than corrupted segment

42 views
Skip to first unread message

Jérôme Loyet

unread,
May 12, 2023, 6:14:59 AM5/12/23
to Prometheus Users
Hi,

this morning we noticed a prometheus server with 3.3TB of metrics stopped to returned metrics older than ~2h30. Disk was still full with 3.3TB of data.

when I restarted the prometheus servers, it started to replay the WAL and find a corrupted segment. Then it deleted all segments after the corrupted one ... at the end the 3.3TB of data have been flushed to 48GB ...

I don't understand why a corrupted segment imply deleting all newer segment. To me this make non sense and make the prom tsdb not reliable. I would have expect the TSDB to be rock solid and be able to recover in case of segment corruption or worst case just losing the segment ... no all segments that are newer than the corrupted one.

What is the technical reason behind this ?

Thanks you

Regards
++ Jerome
Reply all
Reply to author
Forward
0 new messages