Get http://hostname:9090/metrics: dial tcp IP_ADDRESS:9090: i/o timeout
I don't see any kind of error message printed in the logs related to this. Any pointers on how to troubleshoot/resolve this?
Thanks,
Jarod
It is not. The weird thing is I can curl the metrics end point from another host but not from the Prometheus machine. Also if I attempt to ping a machine that gives me that error, or even localhost, I get the following:
$ ping localhost
PING localhost (127.0.0.1) 56(84) bytes of data.
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argumentTotal number of TCP connections on the box is low (~150) at any given time. Open file handles appears to be sane as well. IPtables is enabled with conntrack but again they are within the configured limits.If I reduce the number of hosts I monitor the issue appears to go away. Are there any kernel parameters I should tune when I am monitoring thousands of hosts?Thanks,JarodOn Mar 24, 2016, at 4:52 PM, Brian Brazil <brian....@robustperception.io> wrote:On 24 March 2016 at 20:46, <ja...@42lines.net> wrote:Hello,I have a Prometheus instance scraping ~2500 nodes and a few of them (including the local Prometheus instance) will show the following Error on /status:Get http://hostname:9090/metrics: dial tcp IP_ADDRESS:9090: i/o timeout
I don't see any kind of error message printed in the logs related to this. Any pointers on how to troubleshoot/resolve this?
The host on the other end didn't respond to the http connection, is it down?--Brian Brazil
--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.
>
>
> Hi
I followed the instructions to set sysctl -w net.ipv4.neigh.default.gc_thresh3=4096 but for some reason i get the same error as mentioned above read tcp: i/o . I can curl the metrics from the host machine where prometheus is installed but in the status it shows as 'Down' with read tcp time out error.
Any clue on this?
>
>
> --
>
> Jack Neely
> Operations Engineer
> 42 Lines, Inc.
Thanks!!