Prometheus not able to start and gives error "opening storage failed:: found unsequential head chunk error"

89 views
Skip to first unread message

neel patel

unread,
Jun 11, 2020, 2:50:46 AM6/11/20
to Prometheus Users
Prometheus was working fine before 2 days and today when i stop and start it again, it shows below error and not able to start the prometheus.

I build prometheus from master branch and using that binary since 2 days. Below are the logs when i start prometheus.


[neel@localhost prometheus]$ ./prometheus --config.file=prometheus.yml --storage.tsdb.path=data/
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:302 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:337 msg="Starting Prometheus" version="(version=2.18.1, branch=master, revision=18d9ebf0ffc26b8bd0e136f552c8e9886d29ade4)"
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:338 build_context="(go=go1.14.3, user=ne...@localhost.localdomain, date=20200604-05:51:34)"
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:339 host_details="(Linux 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49 UTC 2019 x86_64 localhost.localdomain (none))"
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:340 fd_limits="(soft=1024, hard=4096)"
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:341 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2020-06-11T06:25:10.973Z caller=main.go:678 msg="Starting TSDB ..."
level=info ts=2020-06-11T06:25:10.973Z caller=web.go:524 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-06-11T06:25:10.974Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1591250134254 maxt=1591257600000 ulid=01EACBSE6H4EP5CS2K2G6KWJ0W
level=info ts=2020-06-11T06:25:10.974Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1591682400000 maxt=1591747200000 ulid=01EAE4FPDZ932CDRBAGH019TK7
level=info ts=2020-06-11T06:25:10.974Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1591747200000 maxt=1591768800000 ulid=01EAEQAB8X70893C6ZC2T2ZK38
level=info ts=2020-06-11T06:25:10.975Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1591790400000 maxt=1591797600000 ulid=01EAGR8C1PMZZACQMN185NC254
level=info ts=2020-06-11T06:25:10.975Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1591797600000 maxt=1591804800000 ulid=01EAGR8D3YVDV517XCMBMMKCS7
level=info ts=2020-06-11T06:25:10.975Z caller=repair.go:59 component=tsdb msg="Found healthy block" mint=1591768800000 maxt=1591790400000 ulid=01EAGR8ETQQC8MZ09SSC62N1D3
level=info ts=2020-06-11T06:25:10.982Z caller=main.go:547 msg="Stopping scrape discovery manager..."
level=info ts=2020-06-11T06:25:10.983Z caller=main.go:561 msg="Stopping notify discovery manager..."
level=info ts=2020-06-11T06:25:10.983Z caller=main.go:583 msg="Stopping scrape manager..."
level=info ts=2020-06-11T06:25:10.983Z caller=main.go:557 msg="Notify discovery manager stopped"
level=info ts=2020-06-11T06:25:10.983Z caller=main.go:543 msg="Scrape discovery manager stopped"
level=info ts=2020-06-11T06:25:10.983Z caller=manager.go:882 component="rule manager" msg="Stopping rule manager..."
level=info ts=2020-06-11T06:25:10.983Z caller=manager.go:892 component="rule manager" msg="Rule manager stopped"
level=info ts=2020-06-11T06:25:10.983Z caller=notifier.go:601 component=notifier msg="Stopping notification manager..."
level=info ts=2020-06-11T06:25:10.983Z caller=main.go:749 msg="Notifier manager stopped"
level=info ts=2020-06-11T06:25:10.983Z caller=main.go:577 msg="Scrape manager stopped"
level=error ts=2020-06-11T06:25:10.983Z caller=main.go:758 err="opening storage failed: found unsequential head chunk files 144 and 148"

#########################

Data directory content as below

01EACBSE6H4EP5CS2K2G6KWJ0W  01EAEQAB8X70893C6ZC2T2ZK38  01EAGR8D3YVDV517XCMBMMKCS7  chunks_head  queries.active
01EAE4FPDZ932CDRBAGH019TK7  01EAGR8C1PMZZACQMN185NC254  01EAGR8ETQQC8MZ09SSC62N1D3  lock         wal

##########################



Prometheus config file as below.

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    static_configs:
      - targets: ['localhost:9090']

  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'postgres-exporter'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 15s

    static_configs:
      - targets: ['localhost:9187']
#############################################

Let me know, is the TSDB is corrupted ? If yes, is there anyway to recover ?

Thanks in Advance

Julien Pivotto

unread,
Jun 11, 2020, 3:14:40 AM6/11/20
to neel patel, Prometheus Users

That could have been triggered by something else (oom, disk full..) Do
you have the logs from before that point?
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a7b51fdd-1b14-4c15-8342-ea113ab24315o%40googlegroups.com.


--
Julien Pivotto
@roidelapluie

neel patel

unread,
Jun 11, 2020, 3:26:06 AM6/11/20
to Prometheus Users
Hi Julien,

Unfortunately i don't have previous logs. And i don't think, it is related to disk as i have ~28GB storage available. Below is the output of my root file system.

/dev/sda3        96G   69G   28G  72% /
/dev/sda1       297M  207M   91M  70% /boot
tmpfs           378M     0  378M   0% /run/user/26
tmpfs           378M   24K  378M   1% /run/user/1000


Let me know in case of more help.

On Thursday, June 11, 2020 at 12:20:46 PM UTC+5:30, neel patel wrote:
Prometheus was working fine before 2 days and today when i stop and start it again, it shows below error and not able to start the prometheus.

I build prometheus from master branch and using that binary since 2 days. Below are the logs when i start prometheus.


[neel@localhost prometheus]$ ./prometheus --config.file=prometheus.yml --storage.tsdb.path=data/
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:302 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:337 msg="Starting Prometheus" version="(version=2.18.1, branch=master, revision=18d9ebf0ffc26b8bd0e136f552c8e9886d29ade4)"
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:338 build_context="(go=go1.14.3, user=neel@localhost.localdomain, date=20200604-05:51:34)"

Murali Krishna Kanagala

unread,
Jun 11, 2020, 8:50:13 PM6/11/20
to Prometheus Users
Try deleting the contents WAL folder and start the service. Ideally you can delete the last WAL where the service is failing to start. It is in the log just before Prometheus stopping.

On Thu, Jun 11, 2020, 2:26 AM neel patel <neel...@gmail.com> wrote:
Hi Julien,

Unfortunately i don't have previous logs. And i don't think, it is related to disk as i have ~28GB storage available. Below is the output of my root file system.

/dev/sda3        96G   69G   28G  72% /
/dev/sda1       297M  207M   91M  70% /boot
tmpfs           378M     0  378M   0% /run/user/26
tmpfs           378M   24K  378M   1% /run/user/1000


Let me know in case of more help.

On Thursday, June 11, 2020 at 12:20:46 PM UTC+5:30, neel patel wrote:
Prometheus was working fine before 2 days and today when i stop and start it again, it shows below error and not able to start the prometheus.

I build prometheus from master branch and using that binary since 2 days. Below are the logs when i start prometheus.


[neel@localhost prometheus]$ ./prometheus --config.file=prometheus.yml --storage.tsdb.path=data/
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:302 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:337 msg="Starting Prometheus" version="(version=2.18.1, branch=master, revision=18d9ebf0ffc26b8bd0e136f552c8e9886d29ade4)"
level=info ts=2020-06-11T06:25:10.967Z caller=main.go:338 build_context="(go=go1.14.3, user=ne...@localhost.localdomain, date=20200604-05:51:34)"

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages