Prometheus not able to start and gives error "opening storage failed:: found unsequential head chunk error"

neel patel

unread,

Jun 11, 2020, 2:50:46 AM6/11/20

to Prometheus Users

Prometheus was working fine before 2 days and today when i stop and start it again, it shows below error and not able to start the prometheus.

I build prometheus from master branch and using that binary since 2 days. Below are the logs when i start prometheus.

[neel@localhost prometheus]$ ./prometheus --config.file=prometheus.yml --storage.tsdb.path=data/

level=info ts=2020-06-11T06:25:10.967Z caller=main.go:302 msg="No time or size retention was set so using the default time retention" duration=15d

level=info ts=2020-06-11T06:25:10.967Z caller=main.go:337 msg="Starting Prometheus" version="(version=2.18.1, branch=master, revision=18d9ebf0ffc26b8bd0e136f552c8e9886d29ade4)"

level=info ts=2020-06-11T06:25:10.967Z caller=main.go:338 build_context="(go=go1.14.3, user=ne...@localhost.localdomain, date=20200604-05:51:34)"

level=info ts=2020-06-11T06:25:10.967Z caller=main.go:339 host_details="(Linux 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49 UTC 2019 x86_64 localhost.localdomain (none))"

level=info ts=2020-06-11T06:25:10.967Z caller=main.go:340 fd_limits="(soft=1024, hard=4096)"

level=info ts=2020-06-11T06:25:10.967Z caller=main.go:341 vm_limits="(soft=unlimited, hard=unlimited)"

level=info ts=2020-06-11T06:25:10.973Z caller=main.go:678 msg="Starting TSDB ..."

level=info ts=2020-06-11T06:25:10.973Z caller=web.go:524 component=web msg="Start listening for connections" address=0.0.0.0:9090

level=info ts=2020-06-11T06:25:10.974Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1591250134254 maxt=1591257600000 ulid=01EACBSE6H4EP5CS2K2G6KWJ0W

level=info ts=2020-06-11T06:25:10.974Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1591682400000 maxt=1591747200000 ulid=01EAE4FPDZ932CDRBAGH019TK7

level=info ts=2020-06-11T06:25:10.974Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1591747200000 maxt=1591768800000 ulid=01EAEQAB8X70893C6ZC2T2ZK38

level=info ts=2020-06-11T06:25:10.975Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1591790400000 maxt=1591797600000 ulid=01EAGR8C1PMZZACQMN185NC254

level=info ts=2020-06-11T06:25:10.975Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1591797600000 maxt=1591804800000 ulid=01EAGR8D3YVDV517XCMBMMKCS7

level=info ts=2020-06-11T06:25:10.975Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1591768800000 maxt=1591790400000 ulid=01EAGR8ETQQC8MZ09SSC62N1D3

level=info ts=2020-06-11T06:25:10.982Z caller=main.go:547 msg="Stopping scrape discovery manager..."

level=info ts=2020-06-11T06:25:10.983Z caller=main.go:561 msg="Stopping notify discovery manager..."

level=info ts=2020-06-11T06:25:10.983Z caller=main.go:583 msg="Stopping scrape manager..."

level=info ts=2020-06-11T06:25:10.983Z caller=main.go:557 msg="Notify discovery manager stopped"

level=info ts=2020-06-11T06:25:10.983Z caller=main.go:543 msg="Scrape discovery manager stopped"

level=info ts=2020-06-11T06:25:10.983Z caller=manager.go:882 component="rule manager" msg="Stopping rule manager..."

level=info ts=2020-06-11T06:25:10.983Z caller=manager.go:892 component="rule manager" msg="Rule manager stopped"

level=info ts=2020-06-11T06:25:10.983Z caller=notifier.go:601 component=notifier msg="Stopping notification manager..."

level=info ts=2020-06-11T06:25:10.983Z caller=main.go:749 msg="Notifier manager stopped"

level=info ts=2020-06-11T06:25:10.983Z caller=main.go:577 msg="Scrape manager stopped"

level=error ts=2020-06-11T06:25:10.983Z caller=main.go:758 err="opening storage failed: found unsequential head chunk files 144 and 148"

#########################

Data directory content as below

01EACBSE6H4EP5CS2K2G6KWJ0W 01EAEQAB8X70893C6ZC2T2ZK38 01EAGR8D3YVDV517XCMBMMKCS7 chunks_head queries.active

01EAE4FPDZ932CDRBAGH019TK7 01EAGR8C1PMZZACQMN185NC254 01EAGR8ETQQC8MZ09SSC62N1D3 lock wal

##########################

Prometheus config file as below.

# my global config

global:

scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.

evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

# scrape_timeout is set to the global default (10s).

# Alertmanager configuration

alerting:

alertmanagers:

- static_configs:

- targets:

# - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

rule_files:

# - "first_rules.yml"

# - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:

# Here it's Prometheus itself.

scrape_configs:

# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.

- job_name: 'prometheus'

# Override the global default and scrape targets from this job every 5 seconds.

scrape_interval: 5s

static_configs:

- targets: ['localhost:9090']

# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.

- job_name: 'postgres-exporter'

# Override the global default and scrape targets from this job every 5 seconds.

scrape_interval: 15s

static_configs:

- targets: ['localhost:9187']

#############################################

Let me know, is the TSDB is corrupted ? If yes, is there anyway to recover ?

Thanks in Advance

Julien Pivotto

unread,

Jun 11, 2020, 3:14:40 AM6/11/20

to neel patel, Prometheus Users

That could have been triggered by something else (oom, disk full..) Do
you have the logs from before that point?

> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a7b51fdd-1b14-4c15-8342-ea113ab24315o%40googlegroups.com.

--
Julien Pivotto
@roidelapluie

neel patel

unread,

Jun 11, 2020, 3:26:06 AM6/11/20

to Prometheus Users

Hi Julien,

Unfortunately i don't have previous logs. And i don't think, it is related to disk as i have ~28GB storage available. Below is the output of my root file system.

/dev/sda3 96G 69G 28G 72% /

/dev/sda1 297M 207M 91M 70% /boot

tmpfs 378M 0 378M 0% /run/user/26

tmpfs 378M 24K 378M 1% /run/user/1000

Let me know in case of more help.

On Thursday, June 11, 2020 at 12:20:46 PM UTC+5:30, neel patel wrote:

Prometheus was working fine before 2 days and today when i stop and start it again, it shows below error and not able to start the prometheus.

I build prometheus from master branch and using that binary since 2 days. Below are the logs when i start prometheus.

[neel@localhost prometheus]$ ./prometheus --config.file=prometheus.yml --storage.tsdb.path=data/
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:302 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:337 msg="Starting Prometheus" version="(version=2.18.1, branch=master, revision=18d9ebf0ffc26b8bd0e136f552c8e9886d29ade4)"

level=info ts=2020-06-11T06:25:10.967Z caller=main.go:338 build_context="(go=go1.14.3, user=neel@localhost.localdomain, date=20200604-05:51:34)"

Murali Krishna Kanagala

unread,

Jun 11, 2020, 8:50:13 PM6/11/20

to Prometheus Users

Try deleting the contents WAL folder and start the service. Ideally you can delete the last WAL where the service is failing to start. It is in the log just before Prometheus stopping.

On Thu, Jun 11, 2020, 2:26 AM neel patel <neel...@gmail.com> wrote:

Hi Julien,

Unfortunately i don't have previous logs. And i don't think, it is related to disk as i have ~28GB storage available. Below is the output of my root file system.

/dev/sda3 96G 69G 28G 72% /
/dev/sda1 297M 207M 91M 70% /boot
tmpfs 378M 0 378M 0% /run/user/26
tmpfs 378M 24K 378M 1% /run/user/1000

Let me know in case of more help.

On Thursday, June 11, 2020 at 12:20:46 PM UTC+5:30, neel patel wrote:

Prometheus was working fine before 2 days and today when i stop and start it again, it shows below error and not able to start the prometheus.

I build prometheus from master branch and using that binary since 2 days. Below are the logs when i start prometheus.

[neel@localhost prometheus]$ ./prometheus --config.file=prometheus.yml --storage.tsdb.path=data/
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:302 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:337 msg="Starting Prometheus" version="(version=2.18.1, branch=master, revision=18d9ebf0ffc26b8bd0e136f552c8e9886d29ade4)"

level=info ts=2020-06-11T06:25:10.967Z caller=main.go:338 build_context="(go=go1.14.3, user=ne...@localhost.localdomain, date=20200604-05:51:34)"

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/e5d4d0b1-b916-42d1-81fc-7c4809d0abd6o%40googlegroups.com.

Reply all

Reply to author

Forward