Fresh Install of Node Exporter on CentOS 7 Hangs When Clicking on the Metrics Link

537 views
Skip to first unread message

Daniel Needles

unread,
Oct 17, 2017, 7:44:36 PM10/17/17
to Prometheus Users
Hello,
  I tried a fresh install of the current Linux binaries on Prometheus.io on a CentOS Linux release 7.2.1511 (Core) guest system running over Windows Workstation 10. 
  When I access the Web Page via Firefox and click the Metrics link, the browser hangs.   Everything works fine on CentOS 6. 

  Is there a known issue on CentOS 7?

  If not, is there a nifty troubleshooting link for Node_Exporter somewhere?

Thanks,
Daniel

Daniel Needles

unread,
Oct 17, 2017, 8:56:39 PM10/17/17
to Prometheus Users
I see in the strace the receipt of the comand from the browser:
    read(5, "GET /metrics HTTP/1.1\r\nHost: 192.168.237.159:9100\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate\r\nReferer: http://192.168.237.159:9100/\r\nDNT: 1\r\nConnection: keep-alive\r\nUpgrade-Insecure-Requests: 1\r\n\r\n", 4096) = 384

Node-Exporter then collects vmstat responses   And then it sits and waits, not returning this data.    I did see an earlier error:

write(2, "time=\"2017-10-17T17:36:15-07:00\" level=info msg=\"No directory specified, see --collector.textfile.directory\" source=\"textfile.go:57\"\n", 133time="2017-10-17T17:36:15-07:00" level=info msg="No directory specified, see --collector.textfile.directory" source="textfile.go:57"

Which is in this code:
   https://github.com/prometheus/node_exporter/blob/master/collector/textfile.go

Could that be related?  I used the same yml file for both CentOS6 and CentOS7 but only CentOS7 failed.


Here's the two prometheus YML I tried:
global:
  scrape_interval: 10s

scrape_configs:
  - job_name: node
    static_configs:
      - targets: ["localhost:9100"]
  - job_name: prometheus
    static_configs:
      - targets: ["localhost:9090"]


And
scrape_configs:
  - job_name: 'node resources'
    scrape_interval: 15s
    static_configs:
      - targets:
        - '192.168.1.2:9100'
    params:
      collect[]:
        - cpu
        - meminfo
        - diskstats
        - netdev
        - netstat
  - job_name: prometheus
    static_configs:
      - targets: ["localhost:9090"]

Nicholas Capo

unread,
Oct 17, 2017, 9:00:20 PM10/17/17
to Daniel Needles, Prometheus Users

Does it also hang if you try it with curl/wget?

You could try process of elimination on the enabled collectors. Disable them all, test scrape, then enable them one at a time, scraping again at each step.

Nicholas


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/03bff472-283e-40bb-b6c4-b0f7bfe23e25%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Needles

unread,
Oct 17, 2017, 10:18:33 PM10/17/17
to Prometheus Users
Will not work on either just CPU or MEMINFO.  I did a snapshot rollback and reinstalled everything.  Something is broke on CentOS7 verses CentOS6.

Daniel Needles

unread,
Oct 17, 2017, 10:30:54 PM10/17/17
to Prometheus Users
As a work around I reverted back to node_exporter-0.14.0.linux-amd64 which I downloaded last week.  That was the version that was running on CentOS 6.  It looks like something broke on the last update to Node-Exporter in the context of my environment.


On Tuesday, October 17, 2017 at 4:44:36 PM UTC-7, Daniel Needles wrote:

Daniel Needles

unread,
Oct 17, 2017, 11:25:05 PM10/17/17
to Prometheus Users
Hmm so I noted a different between the GET for the working 14 version and the broken 15 version:

WORKING 0.14.0
read(6, "GET /metrics?collect%5B%5D=meminfo HTTP/1.1\r\nHost: 127.0.0.1:9100\r\nUser-Agent: Prometheus/1.8.0\r\nAccept: application/vnd.google.protobuf;proto=io.prometheus.client.MetricFamily;encoding=delimited;q=0.7,text/plain;version=0.0.4;q=0.3,*/*;q=0.1\r\nX-Prometheus-Scrape-Timeout-Seconds: 10.000000\r\nAccept-Encoding: gzip\r\nConnection: close\r\n\r\n", 4096) = 336

BROKEN 0.15.0
read(5, "GET /metrics HTTP/1.1\r\nHost: 127.0.0.1:9100\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate\r\nConnection: keep-alive\r\nUpgrade-Insecure-Requests: 1\r\nCache-Control: max-age=0\r\n\r\n", 4096) = 347

When I refreshed the browser somehow the request for metrics dropped off and the agent changed from Prometheus to the Firefox browser.  However, I simply killed the Node Exporter 0.14 and Prometheus, Ran Node Exporter 0.15 and Prometheus and finally refreshed the browser link to http://((HOST)):9100/metrics.  Hmm.

Daniel Needles

unread,
Oct 18, 2017, 1:18:46 AM10/18/17
to Prometheus Users
FOUND THE ANSWER.  THIS IS KNOWN PROBLEM.  CURRENTLY TRACKED HERE:

      https://github.com/prometheus/node_exporter/issues/697?_pjax=%23js-repo-pjax-container
Reply all
Reply to author
Forward
0 new messages