Prometheus was working fine before 2 days and today when i stop and start it again, it shows below error and not able to start the prometheus.
I build prometheus from master branch and using that binary since 2 days. Below are the logs when i start prometheus.
[neel@localhost prometheus]$ ./prometheus --config.file=prometheus.yml --storage.tsdb.path=data/
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:302 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:337 msg="Starting Prometheus" version="(version=2.18.1, branch=master, revision=18d9ebf0ffc26b8bd0e136f552c8e9886d29ade4)"
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:338 build_context="(go=go1.14.3, user=ne...@localhost.localdomain, date=20200604-05:51:34)"
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:339 host_details="(Linux 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49 UTC 2019 x86_64 localhost.localdomain (none))"
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:340 fd_limits="(soft=1024, hard=4096)"
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:341 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-06-11T06:25:10.973Z caller=main.go:678 msg="Starting TSDB ..."
level=info ts=2020-06-11T06:25:10.973Z caller=web.go:524 component=web msg="Start listening for connections" address=
0.0.0.0:9090level=info ts=2020-06-11T06:25:10.974Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1591250134254 maxt=1591257600000 ulid=01EACBSE6H4EP5CS2K2G6KWJ0W
level=info ts=2020-06-11T06:25:10.974Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1591682400000 maxt=1591747200000 ulid=01EAE4FPDZ932CDRBAGH019TK7
level=info ts=2020-06-11T06:25:10.974Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1591747200000 maxt=1591768800000 ulid=01EAEQAB8X70893C6ZC2T2ZK38
level=info ts=2020-06-11T06:25:10.975Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1591790400000 maxt=1591797600000 ulid=01EAGR8C1PMZZACQMN185NC254
level=info ts=2020-06-11T06:25:10.975Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1591797600000 maxt=1591804800000 ulid=01EAGR8D3YVDV517XCMBMMKCS7
level=info ts=2020-06-11T06:25:10.975Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1591768800000 maxt=1591790400000 ulid=01EAGR8ETQQC8MZ09SSC62N1D3
level=info ts=2020-06-11T06:25:10.982Z caller=main.go:547 msg="Stopping scrape discovery manager..."
level=info ts=2020-06-11T06:25:10.983Z caller=main.go:561 msg="Stopping notify discovery manager..."
level=info ts=2020-06-11T06:25:10.983Z caller=main.go:583 msg="Stopping scrape manager..."
level=info ts=2020-06-11T06:25:10.983Z caller=main.go:557 msg="Notify discovery manager stopped"
level=info ts=2020-06-11T06:25:10.983Z caller=main.go:543 msg="Scrape discovery manager stopped"
level=info ts=2020-06-11T06:25:10.983Z caller=manager.go:882 component="rule manager" msg="Stopping rule manager..."
level=info ts=2020-06-11T06:25:10.983Z caller=manager.go:892 component="rule manager" msg="Rule manager stopped"
level=info ts=2020-06-11T06:25:10.983Z caller=notifier.go:601 component=notifier msg="Stopping notification manager..."
level=info ts=2020-06-11T06:25:10.983Z caller=main.go:749 msg="Notifier manager stopped"
level=info ts=2020-06-11T06:25:10.983Z caller=main.go:577 msg="Scrape manager stopped"
level=error ts=2020-06-11T06:25:10.983Z caller=main.go:758 err="opening storage failed: found unsequential head chunk files 144 and 148"
#########################
Data directory content as below
01EACBSE6H4EP5CS2K2G6KWJ0W 01EAEQAB8X70893C6ZC2T2ZK38 01EAGR8D3YVDV517XCMBMMKCS7 chunks_head queries.active
01EAE4FPDZ932CDRBAGH019TK7 01EAGR8C1PMZZACQMN185NC254 01EAGR8ETQQC8MZ09SSC62N1D3 lock wal
##########################
Prometheus config file as below.
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'postgres-exporter'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 15s
static_configs:
- targets: ['localhost:9187']
#############################################
Let me know, is the TSDB is corrupted ? If yes, is there anyway to recover ?
Thanks in Advance