Prometheus stopped sraping

801 views
Skip to first unread message

for...@google.com

unread,
Aug 28, 2018, 1:13:58 PM8/28/18
to Prometheus Users
I've been using Prometheus for a couple of weeks, but suddenly it stopped scraping. I have no idea how to fix it.

Here is an illustration that shows that:
  • Scrape collected data fine for some time, then stopped
  • The query engine is working

Screenshot from 2018-08-28 13-02-06.png



The metrics are being reported fine by the two jobs I'm tracking:

$ grep ^process_start_time_seconds <(curl -s localhost:9090/metrics) <(curl -s localhost:9091/metrics)
/dev/fd/63:process_start_time_seconds 1.53547379296e+09
/dev/fd/62:process_start_time_seconds 1.53547379297e+09


I see that no new data is being generated in the data directory (today is August 28):
$ ls -lh data
total 56K
drwxr-xr-x 3 root root 4.0K Aug  5 05:00 01CM479SDFCJH9H365T0X84P5J
drwxr-xr-x 3 root root 4.0K Aug  7 09:00 01CM9STMSARF5WWS54J6J6XS6G
drwxr-xr-x 3 root root 4.0K Aug  9 15:00 01CMFK779AZZ4EWN95GQ137DPM
drwxr-xr-x 3 root root 4.0K Aug 11 21:00 01CMNCKX5QFWCETZZJW9445K6X
drwxr-xr-x 3 root root 4.0K Aug 14 03:00 01CMV60GXGD6WAW646E0Z24BF7
drwxr-xr-x 3 root root 4.0K Aug 16 09:00 01CN0ZD7GBWJGX1B7N0VJD5KFC
drwxr-xr-x 3 root root 4.0K Aug 18 15:00 01CN6RSZR2XEBXDKHRS1X2QSXB
drwxr-xr-x 3 root root 4.0K Aug 20 21:00 01CNCJ6KEV5W0GEWK3YHFMB9S1
drwxr-xr-x 3 root root 4.0K Aug 23 03:00 01CNJBK37VGEM8SN22M3TXMWA9
drwxr-xr-x 3 root root 4.0K Aug 25 09:00 01CNR4ZMMP2SVG984YRSWMJM58
drwxr-xr-x 3 root root 4.0K Aug 27 15:00 01CNXYC8CJ3DZP997YYSC7030Q
drwxr-xr-x 3 root root 4.0K Aug 27 21:00 01CNYJZDXF027QZPSRB8D4YCQX
drwxr-xr-x 3 root root 4.0K Aug 27 21:00 01CNYJZDZM347C8TF0CFMM8NRH
-rw-r--r-- 1 root root    0 Aug  2 22:44 lock
drwxr-xr-x 3 root root 4.0K Aug 27 04:26 wal


Why is the scraping not working?
How can I fix it?

I have no idea how to investigate. I tried restarting several times and it didn't help. The logs message don't give me much clue:

level=info ts=2018-08-28T16:29:53.261349989Z caller=main.go:235 msg="Starting Prometheus" version="(version=, branch=, revision=)"
level=info ts=2018-08-28T16:29:53.26172932Z caller=main.go:236 build_context="(go=go1.11, user=, date=)"
level=info ts=2018-08-28T16:29:53.26187424Z caller=main.go:237 host_details="(Linux 4.4.0-64-generic #85~14.04.1-Ubuntu SMP Mon Feb 20 12:10:54 UTC 2017 x86_64 OutlineServerLondon (none))"
level=info ts=2018-08-28T16:29:53.262057888Z caller=main.go:238 fd_limits="(soft=32768, hard=32768)"
level=info ts=2018-08-28T16:29:53.262217347Z caller=main.go:239 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2018-08-28T16:29:53.262861046Z caller=main.go:551 msg="Starting TSDB ..."
level=info ts=2018-08-28T16:29:53.26313949Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1533247200000 maxt=1533427200000 ulid=01CM479SDFCJH9H365T0X84P5J
level=info ts=2018-08-28T16:29:53.263225604Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1533427200000 maxt=1533621600000 ulid=01CM9STMSARF5WWS54J6J6XS6G
level=info ts=2018-08-28T16:29:53.263313514Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1533621600000 maxt=1533816000000 ulid=01CMFK779AZZ4EWN95GQ137DPM
level=info ts=2018-08-28T16:29:53.263385221Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1533816000000 maxt=1534010400000 ulid=01CMNCKX5QFWCETZZJW9445K6X
level=info ts=2018-08-28T16:29:53.263450771Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1534010400000 maxt=1534204800000 ulid=01CMV60GXGD6WAW646E0Z24BF7
level=info ts=2018-08-28T16:29:53.263514619Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1534204800000 maxt=1534399200000 ulid=01CN0ZD7GBWJGX1B7N0VJD5KFC
level=info ts=2018-08-28T16:29:53.263580807Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1534399200000 maxt=1534593600000 ulid=01CN6RSZR2XEBXDKHRS1X2QSXB
level=info ts=2018-08-28T16:29:53.263646985Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1534593600000 maxt=1534788000000 ulid=01CNCJ6KEV5W0GEWK3YHFMB9S1
level=info ts=2018-08-28T16:29:53.263710946Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1534788000000 maxt=1534982400000 ulid=01CNJBK37VGEM8SN22M3TXMWA9
level=info ts=2018-08-28T16:29:53.263798776Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1534982400000 maxt=1535176800000 ulid=01CNR4ZMMP2SVG984YRSWMJM58
level=info ts=2018-08-28T16:29:53.263867286Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1535176800000 maxt=1535371200000 ulid=01CNXYC8CJ3DZP997YYSC7030Q
level=info ts=2018-08-28T16:29:53.263911328Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1535392800000 maxt=1535400000000 ulid=01CNYJZDXF027QZPSRB8D4YCQX
level=info ts=2018-08-28T16:29:53.263963225Z caller=repair.go:39 component=tsdb msg="found healthy block" mint=1535371200000 maxt=1535392800000 ulid=01CNYJZDZM347C8TF0CFMM8NRH
level=info ts=2018-08-28T16:29:53.27777935Z caller=web.go:395 component=web msg="Start listening for connections" address=localhost:9090
level=warn ts=2018-08-28T16:29:59.956269599Z caller=head.go:371 component=tsdb msg="unknown series references" count=30264
level=info ts=2018-08-28T16:29:59.957119786Z caller=main.go:561 msg="TSDB started"
level=info ts=2018-08-28T16:29:59.957318863Z caller=main.go:621 msg="Loading configuration file" filename=/root/shadowbox/persisted-state/prometheus/config.yml
level=info ts=2018-08-28T16:29:59.95790212Z caller=main.go:647 msg="Completed loading of configuration file" filename=/root/shadowbox/persisted-state/prometheus/config.yml
level=info ts=2018-08-28T16:29:59.958042283Z caller=main.go:520 msg="Server is ready to receive web requests."
level=debug ts=2018-08-28T16:29:59.958303627Z caller=manager.go:151 component="discovery manager scrape" msg="discovery receiver's channel was full"
level=debug ts=2018-08-28T16:29:59.958504519Z caller=manager.go:151 component="discovery manager scrape" msg="discovery receiver's channel was full"

This is my config file:
global:
  scrape_interval: 15s
  scrape_timeout: 10s
  evaluation_interval: 1m
scrape_configs:
- job_name: prometheus
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  static_configs:
  - targets:
    - localhost:9090
- job_name: outline-ss-server
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  static_configs:
  - targets:
    - localhost:9091

Any help is greatly appreciated

Thank you!

Vinicius Fortuna

Christian Hoffmann

unread,
Aug 29, 2018, 2:04:33 AM8/29/18
to for...@google.com, Prometheus Users
Am 28. August 2018 19:13:58 MESZ schrieb fortuna via Prometheus Users <promethe...@googlegroups.com>:
>level=debug ts=2018-08-28T16:29:59.958504519Z caller=manager.go:151
>component="discovery manager scrape" msg="discovery receiver's channel
>was
>full"

Could this be related to the issue below? Are you on master? If so, can you try a fresh build? The PR seems to have been merged yesterday.

https://github.com/prometheus/prometheus/pull/4523
https://github.com/prometheus/prometheus/issues/4551

Vinicius Fortuna [vee-NEE-see.oos]

unread,
Aug 29, 2018, 12:09:01 PM8/29/18
to ma...@hoffmann-christian.info, promethe...@googlegroups.com
Thanks for the pointers. I think you identified the issue.

I was using `go get` instead of the released version. Switching to the released version solved the problem.
The absence of the "discovery receiver's channel was full" message seems to indicate you are in a good version.
Reply all
Reply to author
Forward
0 new messages