tcp dial & dns lookup errors

1,652 views
Skip to first unread message

Ano

unread,
Feb 27, 2014, 7:32:31 AM2/27/14
to golan...@googlegroups.com
Hello,

I suddenly started to get loads of following errors from a long running daemon that fires some concurrent HTTP request:
* "lookup example.com: Non-recoverable failure in name resolution "
* "lookup example.com: no such host"
* "GET <url> dial tcp xxx.xxx.xxx.xxx:80: operation not permitted"
* "dial tcp 127.0.0.1:5432: operation not permitted"

Apart from a few connections to PostgreSQL and similar, there are 12 go routine workers. Each of them calls a method that fires max 3 HTTP requests at a time. These methods all use "defer resp.Body.Close()" (and they return - so defer is called).

System is FreeBSD 9.2.
"kern.openfiles" is around 850 all the time, "kern.maxfiles" is 250000, "ulimit -n" ~11500. 
Additionally i checked open files of the process with "lsof" and "fstat" - there are no more than expected, it shouldn't have any problems establishing new connections. 


I really don't know what the problem is - it does not seem to be related to my code. Any help appreciated. 

Benjamin Measures

unread,
Feb 28, 2014, 3:59:50 AM2/28/14
to golan...@googlegroups.com
Each TCP connection uses a local port which must remain in TIME_WAIT state for a time after local closes the connection.
If you're making lots of outbound TCP connections rapidly, you may be running out of ephemeral ports. I recall there are one or two threads about this topic in this group.

Benjamin Measures

unread,
Feb 28, 2014, 4:45:15 AM2/28/14
to golan...@googlegroups.com
On Friday, 28 February 2014 08:59:50 UTC, Benjamin Measures wrote:
If you're making lots of outbound TCP connections rapidly, you may be running out of ephemeral ports. I recall there are one or two threads about this topic in this group.

I'd just like to add that this isn't specific to Go but a TCP "thing to know".

umg...@gmail.com

unread,
Feb 28, 2014, 7:03:34 AM2/28/14
to golan...@googlegroups.com
Thank you for your answer!

First I thought this can't be the reason, because I'm using only around 50 connections at once. 
They nearly all connect to the same host and my HTTP client has MaxIdleConnsPerHost set to 256. So it should just reuse them.

But "netstat -an" is indeed showing loads of TIME_WAITs. Investigating further.
Reply all
Reply to author
Forward
0 new messages