Prometheus stopped sraping

801 views

Skip to first unread message

for...@google.com

unread,

Aug 28, 2018, 1:13:58 PM8/28/18

to Prometheus Users

I've been using Prometheus for a couple of weeks, but suddenly it stopped scraping. I have no idea how to fix it.

Here is an illustration that shows that:

Scrape collected data fine for some time, then stopped
The query engine is working

Screenshot from 2018-08-28 13-02-06.png

The metrics are being reported fine by the two jobs I'm tracking:

$ grep ^process_start_time_seconds <(curl -s localhost:9090/metrics) <(curl -s localhost:9091/metrics)

/dev/fd/63:process_start_time_seconds 1.53547379296e+09

/dev/fd/62:process_start_time_seconds 1.53547379297e+09

I see that no new data is being generated in the data directory (today is August 28):

$ ls -lh data

total 56K

drwxr-xr-x 3 root root 4.0K Aug 5 05:00 01CM479SDFCJH9H365T0X84P5J

drwxr-xr-x 3 root root 4.0K Aug 7 09:00 01CM9STMSARF5WWS54J6J6XS6G

drwxr-xr-x 3 root root 4.0K Aug 9 15:00 01CMFK779AZZ4EWN95GQ137DPM

drwxr-xr-x 3 root root 4.0K Aug 11 21:00 01CMNCKX5QFWCETZZJW9445K6X

drwxr-xr-x 3 root root 4.0K Aug 14 03:00 01CMV60GXGD6WAW646E0Z24BF7

drwxr-xr-x 3 root root 4.0K Aug 16 09:00 01CN0ZD7GBWJGX1B7N0VJD5KFC

drwxr-xr-x 3 root root 4.0K Aug 18 15:00 01CN6RSZR2XEBXDKHRS1X2QSXB

drwxr-xr-x 3 root root 4.0K Aug 20 21:00 01CNCJ6KEV5W0GEWK3YHFMB9S1

drwxr-xr-x 3 root root 4.0K Aug 23 03:00 01CNJBK37VGEM8SN22M3TXMWA9

drwxr-xr-x 3 root root 4.0K Aug 25 09:00 01CNR4ZMMP2SVG984YRSWMJM58

drwxr-xr-x 3 root root 4.0K Aug 27 15:00 01CNXYC8CJ3DZP997YYSC7030Q

drwxr-xr-x 3 root root 4.0K Aug 27 21:00 01CNYJZDXF027QZPSRB8D4YCQX

drwxr-xr-x 3 root root 4.0K Aug 27 21:00 01CNYJZDZM347C8TF0CFMM8NRH

-rw-r--r-- 1 root root 0 Aug 2 22:44 lock

drwxr-xr-x 3 root root 4.0K Aug 27 04:26 wal

Why is the scraping not working?

How can I fix it?

I have no idea how to investigate. I tried restarting several times and it didn't help. The logs message don't give me much clue:

level=info ts=2018-08-28T16:29:53.261349989Z caller=main.go:235 msg="Starting Prometheus" version="(version=, branch=, revision=)"

level=info ts=2018-08-28T16:29:53.26172932Z caller=main.go:236 build_context="(go=go1.11, user=, date=)"

level=info ts=2018-08-28T16:29:53.26187424Z caller=main.go:237 host_details="(Linux 4.4.0-64-generic #85~14.04.1-Ubuntu SMP Mon Feb 20 12:10:54 UTC 2017 x86_64 OutlineServerLondon (none))"

level=info ts=2018-08-28T16:29:53.262057888Z caller=main.go:238 fd_limits="(soft=32768, hard=32768)"

level=info ts=2018-08-28T16:29:53.262217347Z caller=main.go:239 vm_limits="(soft=unlimited, hard=unlimited)"

level=info ts=2018-08-28T16:29:53.262861046Z caller=main.go:551 msg="Starting TSDB ..."

level=info ts=2018-08-28T16:29:53.26313949Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1533247200000 maxt=1533427200000 ulid=01CM479SDFCJH9H365T0X84P5J

level=info ts=2018-08-28T16:29:53.263225604Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1533427200000 maxt=1533621600000 ulid=01CM9STMSARF5WWS54J6J6XS6G

level=info ts=2018-08-28T16:29:53.263313514Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1533621600000 maxt=1533816000000 ulid=01CMFK779AZZ4EWN95GQ137DPM

level=info ts=2018-08-28T16:29:53.263385221Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1533816000000 maxt=1534010400000 ulid=01CMNCKX5QFWCETZZJW9445K6X

level=info ts=2018-08-28T16:29:53.263450771Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1534010400000 maxt=1534204800000 ulid=01CMV60GXGD6WAW646E0Z24BF7

level=info ts=2018-08-28T16:29:53.263514619Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1534204800000 maxt=1534399200000 ulid=01CN0ZD7GBWJGX1B7N0VJD5KFC

level=info ts=2018-08-28T16:29:53.263580807Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1534399200000 maxt=1534593600000 ulid=01CN6RSZR2XEBXDKHRS1X2QSXB

level=info ts=2018-08-28T16:29:53.263646985Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1534593600000 maxt=1534788000000 ulid=01CNCJ6KEV5W0GEWK3YHFMB9S1

level=info ts=2018-08-28T16:29:53.263710946Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1534788000000 maxt=1534982400000 ulid=01CNJBK37VGEM8SN22M3TXMWA9

level=info ts=2018-08-28T16:29:53.263798776Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1534982400000 maxt=1535176800000 ulid=01CNR4ZMMP2SVG984YRSWMJM58

level=info ts=2018-08-28T16:29:53.263867286Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1535176800000 maxt=1535371200000 ulid=01CNXYC8CJ3DZP997YYSC7030Q

level=info ts=2018-08-28T16:29:53.263911328Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1535392800000 maxt=1535400000000 ulid=01CNYJZDXF027QZPSRB8D4YCQX

level=info ts=2018-08-28T16:29:53.263963225Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1535371200000 maxt=1535392800000 ulid=01CNYJZDZM347C8TF0CFMM8NRH

level=info ts=2018-08-28T16:29:53.27777935Z caller=web.go:395 component=web msg="Start listening for connections" address=localhost:9090

level=warn ts=2018-08-28T16:29:59.956269599Z caller=head.go:371 component=tsdb msg="unknown series references" count=30264

level=info ts=2018-08-28T16:29:59.957119786Z caller=main.go:561 msg="TSDB started"

level=info ts=2018-08-28T16:29:59.957318863Z caller=main.go:621 msg="Loading configuration file" filename=/root/shadowbox/persisted-state/prometheus/config.yml

level=info ts=2018-08-28T16:29:59.95790212Z caller=main.go:647 msg="Completed loading of configuration file" filename=/root/shadowbox/persisted-state/prometheus/config.yml

level=info ts=2018-08-28T16:29:59.958042283Z caller=main.go:520 msg="Server is ready to receive web requests."

level=debug ts=2018-08-28T16:29:59.958303627Z caller=manager.go:151 component="discovery manager scrape" msg="discovery receiver's channel was full"

level=debug ts=2018-08-28T16:29:59.958504519Z caller=manager.go:151 component="discovery manager scrape" msg="discovery receiver's channel was full"

This is my config file:

global:
  scrape_interval: 15s
  scrape_timeout: 10s
  evaluation_interval: 1m
scrape_configs:
- job_name: prometheus
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  static_configs:
  - targets:
    - localhost:9090
- job_name: outline-ss-server
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  static_configs:
  - targets:
    - localhost:9091

Any help is greatly appreciated

Thank you!

Vinicius Fortuna

Christian Hoffmann

unread,

Aug 29, 2018, 2:04:33 AM8/29/18

to for...@google.com, Prometheus Users

Am 28. August 2018 19:13:58 MESZ schrieb fortuna via Prometheus Users <promethe...@googlegroups.com>:
>level=debug ts=2018-08-28T16:29:59.958504519Z caller=manager.go:151
>component="discovery manager scrape" msg="discovery receiver's channel
>was
>full"

Could this be related to the issue below? Are you on master? If so, can you try a fresh build? The PR seems to have been merged yesterday.

https://github.com/prometheus/prometheus/pull/4523
https://github.com/prometheus/prometheus/issues/4551

Vinicius Fortuna [vee-NEE-see.oos]

unread,

Aug 29, 2018, 12:09:01 PM8/29/18

to ma...@hoffmann-christian.info, promethe...@googlegroups.com

Thanks for the pointers. I think you identified the issue.

I was using `go get` instead of the released version. Switching to the released version solved the problem.

The absence of the "discovery receiver's channel was full" message seems to indicate you are in a good version.

Reply all

Reply to author

Forward

0 new messages