Scrape failed with context deadline exceeded, but host is reachable

322 views
Skip to first unread message

michael...@gmail.com

unread,
Apr 30, 2019, 5:43:10 AM4/30/19
to Prometheus Users
I'm using node exporter 0.17 on a few debian endpoints, and polling it with prometheus 2.9.2 in a docker container, these should all be the latest versions.

All but one hosts can be polled fine, one host (a proxmox server) is having issues. The evaluation interval is on 1m, but I've also tried 30s.

When I do a wget http://proxmox.local:9100/metrics from inside the prometheus container, I always get results fine within 200ms. It might be worth to mention prometheus is running inside a docker container inside a LXC container on this host, but since wget is working fine, I think this isn't related.

I suspect it has something to do with the keep-alive of the connections to the exporters tha, but I'm not really sure how to further troubleshoot this.


prometheus configuration:

global:
  scrape_interval
:     30s
  evaluation_interval
: 1m


scrape_configs
:
 
- job_name: 'node'
    static_configs
:
     
- targets: ['proxmox.local:9100'] # proxmox


Proxmox log:

prometheus    | level=debug ts=2019-04-30T09:12:17.713Z caller=scrape.go:927 component="scrape manager" scrape_pool=node target=http://proxmox.local:9100/metrics msg="Scrape failed" err="Get http://proxmox.local:9100/metrics: context deadline exceeded"
prometheus    | level=debug ts=2019-04-30T09:13:47.713Z caller=scrape.go:927 component="scrape manager" scrape_pool=node target=http://proxmox.local:9100/metrics msg="Scrape failed" err="Get http://proxmox.local:9100/metrics: context deadline exceeded"
prometheus    | level=debug ts=2019-04-30T09:15:17.713Z caller=scrape.go:927 component="scrape manager" scrape_pool=node target=http://proxmox.local:9100/metrics msg="Scrape failed" err="Get http://proxmox.local:9100/metrics: context deadline exceeded"
prometheus    | level=debug ts=2019-04-30T09:16:17.713Z caller=scrape.go:927 component="scrape manager" scrape_pool=node target=http://proxmox.local:9100/metrics msg="Scrape failed" err="context deadline exceeded"
prometheus    | level=debug ts=2019-04-30T09:17:47.713Z caller=scrape.go:927 component="scrape manager" scrape_pool=node target=http://proxmox.local:9100/metrics msg="Scrape failed" err="Get http://proxmox.local:9100/metrics: context deadline exceeded"
prometheus    | level=debug ts=2019-04-30T09:19:17.713Z caller=scrape.go:927 component="scrape manager" scrape_pool=node target=http://proxmox.local:9100/metrics msg="Scrape failed" err="Get http://proxmox.local:9100/metrics: context deadline exceeded"
prometheus    | level=debug ts=2019-04-30T09:20:47.713Z caller=scrape.go:927 component="scrape manager" scrape_pool=node target=http://proxmox.local:9100/metrics msg="Scrape failed" err="Get http://proxmox.local:9100/metrics: context deadline exceeded"
prometheus    | level=debug ts=2019-04-30T09:22:17.713Z caller=scrape.go:927 component="scrape manager" scrape_pool=node target=http://proxmox.local:9100/metrics msg="Scrape failed" err="Get http://proxmox.local:9100/metrics: context deadline exceeded"
prometheus    | level=debug ts=2019-04-30T09:23:47.713Z caller=scrape.go:927 component="scrape manager" scrape_pool=node target=http://proxmox.local:9100/metrics msg="Scrape failed" err="Get http://proxmox.local:9100/metrics: context deadline exceeded"
prometheus    | level=debug ts=2019-04-30T09:25:17.713Z caller=scrape.go:927 component="scrape manager" scrape_pool=node target=http://proxmox.local:9100/metrics msg="Scrape failed" err="Get http://proxmox.local:9100/metrics: context deadline exceeded"


This is how my graphs look:


Screenshot from 2019-04-30 11-24-41.png


Simon Pasquier

unread,
Apr 30, 2019, 11:31:31 AM4/30/19
to michael...@gmail.com, Prometheus Users
I would try tcpdump/wireshark to see what happens on the wire.
Did it work properly with versions older than Prometheus v2.9.2?

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/d1b4c6d1-affb-4e01-9f61-5897876b028e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michaël Arnauts

unread,
Apr 30, 2019, 2:29:58 PM4/30/19
to Simon Pasquier, Prometheus Users
I made a pcap, and I could see that prometheus opens one connection, and does 3 requests (seperated by my interval time), after the 3rd request, node_exporter doesn't respond anymore.

I've also tried 2.9.1, and that gave the same results. I didn't try another node_exporter but it more looks like the issue lies there, unless the HTTP connection isn't supposed to be kept open so long? 

I'm not sure what the policy is regarding attaching pcaps in this group, but here is a screenshot (captured on proxmox):

image.png

Op di 30 apr. 2019 om 17:31 schreef Simon Pasquier <spas...@redhat.com>:

Michaël Arnauts

unread,
May 1, 2019, 12:23:07 PM5/1/19
to Simon Pasquier, Prometheus Users
This seems to be caused by a proxmox virtual switch. It disconnects a tcp connection after 30 seconds.

I've changed network config, and it seems to be working fine now.

Op di 30 apr. 2019 20:29 schreef Michaël Arnauts <michael...@gmail.com>:
Reply all
Reply to author
Forward
0 new messages