consul_exporter - network connections

37 views
Skip to first unread message

Dennis Kelly

unread,
Jun 17, 2020, 12:37:28 AM6/17/20
to Prometheus Users
We have a consul cluster of 3 members and about 1k services. consul_exporter has been using significantly more CPU and is also logging this:

level=error ts=2020-06-16T23:56:46.593Z caller=consul_exporter.go:400 msg="Failed to query service health" err="Get \"http://consul.service:8500/v1/health/service/[service name]?stale=\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"

It is running as a docker container in Nomad. I bumped the CPU resource from the default to 900 MHz and also the consul.timeout to 2s. This has improved things, but we still sporadically receive this error. I haven't had a chance to dig through the entire source yet, but wondering why too consult_exporter has so many open connections to the same 3 consul servers:

$ netstat | grep :8500 | wc -l

13653

Why would the connections remain, and also if they do remain, not reused? I suspect we may be hitting up against this issue, but hoping for further clarification:

https://github.com/prometheus/consul_exporter/issues/102

Thanks!

Dennis


Ben Kochie

unread,
Jun 17, 2020, 2:39:05 AM6/17/20
to Dennis Kelly, Prometheus Users
Have you tried setting the --consul.request-limit to limit the number of concurrent connections?

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/ece427fb-99ea-4deb-a99c-60707f2c807dn%40googlegroups.com.

Dennis Kelly

unread,
Jun 17, 2020, 5:48:02 PM6/17/20
to Prometheus Users
I have not yet, that was next on my list. I was more so curious why consul_exporter would need so many connections to the same three servers for only 1,000 services (i.e. why not reruse a connection? why don't the close when done... most are TIME_WAIT). 

Ben Kochie

unread,
Jun 20, 2020, 7:57:03 AM6/20/20
to Dennis Kelly, Prometheus Users
I honestly have no idea how Consul works internally.

Reply all
Reply to author
Forward
0 new messages