Prometheus Invalid Checksum Error

896 views
Skip to first unread message

keshav19...@gmail.com

unread,
Sep 3, 2018, 11:31:36 PM9/3/18
to Prometheus Users
HI team,

I am using prometheus docker image (https://hub.docker.com/r/prom/prometheus/) and using PostgreSQL as a remote storage in kubernetes cluster.
Due to some faulty data block I am getting below error :

level=error ts=2018-09-02T06:58:52.144782486Z caller=main.go:596 err="Opening storage failed open block /prometheus/persisted_storage/data/01CP9ADK94EPBC2RQZDFAK2HAP: read postings table: invalid checksum"


Also, as I am using Persistence Volume for tsdb path, when I restarted the Pod, it is again taking up the corrupt block and failing again. 

Prometheus version: 2.3.2

Could you please let me know, how can I resolve it.
I believe if I am using prometheus binary, It should handle error by itself.

Any help/suggestion would be appreciated.

Regards,
KeshaV Sharma


 

Simon Pasquier

unread,
Sep 4, 2018, 4:26:00 AM9/4/18
to keshav19...@gmail.com, Prometheus Users
The "invalid checksum" error means that the block index's file is corrupted and you have no other choice than removing the whole directory "/prometheus/persisted_storage/data/01CP9ADK94EPBC2RQZDFAK2HAP".

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/d830be49-507c-4b9d-a315-5693a4c4e440%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

keshav19...@gmail.com

unread,
Sep 4, 2018, 4:27:20 PM9/4/18
to Prometheus Users
Hi Simon,

Thanks for the reply.
1 more question, since I have both local tsdb and remote storage in prometheus, how prometheus sync data with both the local and remote disk.

Regards,
Keshav Sharma

Simon Pasquier

unread,
Sep 5, 2018, 3:43:52 AM9/5/18
to keshav19...@gmail.com, Prometheus Users
The way Prometheus writes to the remote storage is described in the documentation here:

Prometheus forwards the scraped samples to the remote system in real time. In case the remote is down, Prometheus buffers data up to a fixed limit.
If the corrupted data has been forwarded to the remote storage and you have configured remote_read, you will still be able to query it through the remote storage.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.

keshav19...@gmail.com

unread,
Sep 6, 2018, 4:41:26 PM9/6/18
to Prometheus Users
Hi Simon,

I tried deleting the block, but now I am getting all together different error - Time range overlap

level=error ts=2018-09-06T20:36:35.097896513Z caller=main.go:596 err="Opening storage failed invalid block sequence: block time ranges overlap: [mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s, blocks: 71]: <ulid: 01CPPM1HXD8HCFDJGMXTV4JAHX, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPM1MFQK501XP3FWS0S23N3, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPM1RVV3N76EY0EEMVQ23XV, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPM21803XRYH4STPKYTCWZ5, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPM2HBV0TPHQ8CRXRAXCKCD, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPM3H74K8M7JD2VX9TFT31Y, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPM5CJMRT52Z21X01NE2A62, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPM77REDS8MJ88B7RRAABWK, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPM92Y3DQV8CTM6CVEBT58Q, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPMAY46B1Y3XMMAP59C7RAZ, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPMCS9NZ6EAJ7TC31F63DWQ, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPMEMF259YHQZA533B17BS1, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPMGFN4G7BD1DT8DQBZXX61, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPMJAV009N85PMDTWHZ6ZWF, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPMM607J9Q3QF44M9N5VMT8, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPMP165NJ8NYYQNZR8BWRK6, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPMQWC22533T3Z2YHCGZGBD, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPMSQEYKN4DYQEYKC4WP4JQ, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPMVJMWFDH155TD06HG5H3R, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPMXDT2PDF60NVNZ7JRZ02D, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPMZ908501BQY2AKTQEEGGP, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPN142SES04HKG5QZEQA81Q, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPN2Z586QX5JPP621VS2048, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPN4TAVPV7KTGMVKG35HJNZ, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPN6NPWPMCGQ0NB6XHK2TQV, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPN8GZQ39REY7VN9J0N2YWQ, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPNAC580ZRNVK15WBS6K3SY, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPNC7BB0D0X1RKT5MAK5CZQ, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPNE2H7RB1DWCM310E7CE0P, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPNFXPXF07ZHKZMRG14B3VA, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPNHRSPQ0J1NYG481X1YX3F, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPNKM03Q978HGHFX3SC2T29, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPNNF546D7EHFA6CA00NZE3, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPNQAAVAN2JN0B60B87JQMC, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPNS5GQ4W07FZ1DYGC82CRG, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPNV0P630AJQTC4ZT9YERD6, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPNWVW1VEN5FPZD3HAVWCTX, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPNYQ2AYCR8PD2RWHCCJZG0, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPP0J7GDH6CW6S7A380RM66, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPP2DG1ZX294QDSD5GQQ8YM, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPP48RXGFJ7RPGRY9KKCKE8, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPP63YNSBG5Y3AKGGE2YSCA, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPP7Z4DXH6F8W3Q3H344N5M, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPP9TAJ8SJEC4XHVMH2230A, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPPBNG7RK719BK04PDWYZTV, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPPDGP4RGJ55TCESZ2XEEGR, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPPFBRSSHV5WPDKHTNV6G1E, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPPH6VD7R6GDXVXVM3NY643, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPPK214HJ9475TGGYXFQ9TQ, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPPMX6TW079187Z4J99K0Q3, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPPPRCM48ZJ2TZZ2WMAD2GM, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPPRKF605REXCNX4Z6XS5DP, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPPTEN1ZY6SVKX7KX544S3Z, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPPW9Z89702D7N385V0G8PX, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPPY56PWWEWBTTHZ6QM3F8S, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPQ00DQDX7KZ6BH2CD28HJS, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPQ1VP34B6MW7967YVKZP7Q, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPQ3PW2E17N5B79ZNWMP44P, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPQ5J2091EN2TEJBYYA6H72, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPQ7D87N8JDYF7QXEQV1E8R, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPQ98EA5Z3TFWKKZK9T7W1D, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPQB3PXEB4DVMKFSPSXF0AZ, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPQCYTPPPFNMYMW29XXRK7N, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPQET525AZTB100SN5W93KQ, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPQGN8EBDZZ80BQ989MTPMD, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPQJGMRH8H0B6YP3TRBH9T1, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPQMBWGY752WQ4N4BR9DA7R, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPQP73WG0KK87DETPMDE62B, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPQR2BHG55NJ2A8MP0M3M23, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPQSXHW3S17SBDWZTPNX7H4, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>, <ulid: 01CPPQVRYXC7R6PCGN6SWMAWS4, mint: 1536199200000, maxt: 1536206400000, range: 2h0m0s>"


Could you please provide your input on this.

Regards,
Keshav Sharma
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.

Simon Pasquier

unread,
Sep 7, 2018, 4:05:37 AM9/7/18
to Keshav Sharma, Prometheus Users
Hmm, it seems that your Prometheus failed to truncate the head block properly which can generate many overlapping blocks at a high rate (a few seconds in your case).
You can try removing all the offending directories except the most recent one and starting Prometheus again.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/8e0c7073-fc20-425e-aa3b-aaa3c7178106%40googlegroups.com.

keshav19...@gmail.com

unread,
Sep 14, 2018, 5:35:09 PM9/14/18
to Prometheus Users
Thanks Simon,

Could you please let me know why we get Invalid checksum error basically the reason or any documentation?

Regards,
Keshav Sharma

On Monday, September 3, 2018 at 8:31:36 PM UTC-7, keshav19...@gmail.com wrote:

Simon Pasquier

unread,
Sep 17, 2018, 5:07:52 PM9/17/18
to Keshav Sharma, Prometheus Users
Basically Prometheus adds a checksum information to the data stored on disk. If it doesn't match, it means that the data has been corrupted.
It may happen after a disk full event for instance.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages