prometheus issue

Xavier Mayol

unread,

Aug 18, 2022, 1:29:39 AM8/18/22

to Prometheus Users

Hi,

My Prometheus enviroment is working fine, until prometheus service was restarting automaticatly and stoping working.

The error in my log file is:

s=2022-08-12T17:06:30.837Z caller=repair.go:57 level=info component=tsdb msg="Found healthy block" mint=1660291200228 maxt=1660298400000 ulid=01GA8VC04PF4Z14NY2KWNJENET
ts=2022-08-12T17:06:30.839Z caller=repair.go:57 level=info component=tsdb msg="Found healthy block" mint=1660262400229 maxt=1660284000000 ulid=01GA8VCCZVVXCW0JKVVZZM5GPE
ts=2022-08-12T17:06:30.841Z caller=repair.go:57 level=info component=tsdb msg="Found healthy block" mint=1660298400000 maxt=1660305600000 ulid=01GA948VJW25YBS9PQJX0027ND
ts=2022-08-12T17:06:30.851Z caller=db.go:777 level=info component=tsdb msg="Found and deleted tmp block dir" dir=data/01GA9GAZ99N1KF0SGEMT46RK2M.tmp-for-creation
ts=2022-08-12T17:06:30.851Z caller=dir_locker.go:77 level=warn component=tsdb msg="A lockfile from a previous execution already existed. It was replaced" file=/data/lock
ts=2022-08-12T17:06:31.451Z caller=head.go:493 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
ts=2022-08-12T17:06:32.017Z caller=head.go:520 level=error component=tsdb msg="Loading on-disk chunks failed" err="iterate on on-disk chunks: out of sequence m-mapped chunk for series ref 2821831, last chunk: [1660312732989, 1660312792989], new: [1660312732989, 1660312792989]"
ts=2022-08-12T17:06:32.019Z caller=head.go:689 level=info component=tsdb msg="Deleting mmapped chunk files"
ts=2022-08-12T17:06:32.019Z caller=head.go:692 level=info component=tsdb msg="Deletion of corrupted mmap chunk files failed, discarding chunk files completely" err="cannot handle error: iterate on on-disk chunks: out of sequence m-mapped chunk for series ref 2821831, last chunk: [1660312732989, 1660312792989], new: [1660312732989, 1660312792989]"
ts=2022-08-12T17:06:32.027Z caller=head.go:536 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=575.898846ms
ts=2022-08-12T17:06:32.027Z caller=head.go:542 level=info component=tsdb msg="Replaying WAL, this may take a while"
ts=2022-08-12T17:09:59.062Z caller=head.go:578 level=info component=tsdb msg="WAL checkpoint loaded"
ts=2022-08-12T17:09:59.062Z caller=head.go:613 level=info component=tsdb msg="WAL segment loaded" segment=4645 maxSegment=4694
ts=2022-08-12T17:09:59.088Z caller=head.go:613 level=info component=tsdb msg="WAL segment loaded" segment=4646 maxSegment=4694

Any idea whats the problem?

Thanks

Brian Candler

unread,

Aug 18, 2022, 4:48:27 AM8/18/22

to Prometheus Users

You'll need to show what happens in logs *after* that point (i.e. the reloading of the WAL) - or does it freeze completely?

Otherwise, what you're showing is normal error recovery. However it does also suggest that there is something suspect about your storage. What sort of storage are you using? Is it some remote NAS filesystem like NFS? If so, that's not recommended.

Dinesh Koritela

unread,

Mar 3, 2023, 8:59:26 AM3/3/23

to Prometheus Users

Hello @Brian

I have a similar issue and I am seeing WAL Error, please take a look at the logs below. Can you please suggest one?

ts=2023-03-02T11:39:25.119Z caller=db.go:772 level=info component=tsdb msg="Found and deleted tmp block dir" dir=/data/01GTH1E28STFEJDTCCQVA8R813.tmp-for-creation
ts=2023-03-02T11:39:25.120Z caller=dir_locker.go:77 level=warn component=tsdb msg="A lockfile from a previous execution already existed. It was replaced" file=/data/lock
ts=2023-03-02T11:39:32.713Z caller=head.go:493 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
ts=2023-03-02T11:39:38.194Z caller=head.go:527 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=5.480307055s
ts=2023-03-02T11:39:38.194Z caller=head.go:533 level=info component=tsdb msg="Replaying WAL, this may take a while"

Brian Candler

unread,

Mar 3, 2023, 11:03:52 AM3/3/23

to Prometheus Users

I don't see any error there. Did you restart prometheus? In that case, replaying the WAL and tidying up leftover files is normal.

Reply all

Reply to author

Forward