HTTP client appears to hang forever

1,477 views
Skip to first unread message

ch...@500px.com

unread,
Feb 12, 2015, 9:09:30 PM2/12/15
to golan...@googlegroups.com
We have a Go application with a pool of worker goroutines.  Over time, we've observed that the number of "busy" workers slowly grows until the pool is exhausted.  After a lot of debugging, it appears that the "busy" workers are all becoming stuck during a read on an HTTP client (to an S3 bucket, but I'm not sure that's relevant other than it causes it to happen fairly often)

I set up net/http/pprof and, after letting the process run for a day or so, captured the stacks of all the goroutines.  This is what stood out:

goroutine 414 [IO wait, 933 minutes]:
net.(*pollDesc).Wait(0xc208582140, 0x72, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc208582140, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).Read(0xc2085820e0, 0xc219af5219, 0xf8be7, 0xf8be7, 0x0, 0x7fbc7a68adc8, 0xc20c4fb790)
/usr/local/go/src/net/fd_unix.go:242 +0x40f
net.(*conn).Read(0xc20875f108, 0xc219af5219, 0xf8be7, 0xf8be7, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:121 +0xdc
net/http.noteEOFReader.Read(0x7fbc7a68c988, 0xc20875f108, 0xc20868b7b8, 0xc219af5219, 0xf8be7, 0xf8be7, 0xc20870bc50, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:1270 +0x6e
net/http.(*noteEOFReader).Read(0xc20c74a800, 0xc219af5219, 0xf8be7, 0xf8be7, 0xc2086666e0, 0x0, 0x0)
<autogenerated>:125 +0xd4
bufio.(*Reader).Read(0xc20c702840, 0xc219af5219, 0xf8be7, 0xf8be7, 0x1, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:164 +0x13a
io.(*LimitedReader).Read(0xc20c73e8e0, 0xc219af5219, 0xf8be7, 0xf8be7, 0x41599a, 0x0, 0x0)
/usr/local/go/src/io/io.go:408 +0xce
net/http.(*body).readLocked(0xc20e9ff0c0, 0xc219af5219, 0xf8be7, 0xf8be7, 0xffffffff, 0x0, 0x0)
/usr/local/go/src/net/http/transfer.go:584 +0x7a
net/http.(*body).Read(0xc20e9ff0c0, 0xc219af5219, 0xf8be7, 0xf8be7, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/transfer.go:579 +0x115
net/http.(*bodyEOFSignal).Read(0xc20e9ff180, 0xc219af5219, 0xf8be7, 0xf8be7, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:1193 +0x285
github.com/500px/s3/s3util.(*metricsReadCloserDecorator).Read(0xc20c73e960, 0xc219af5219, 0xf8be7, 0xf8be7, 0x5b4, 0x0, 0x0)
/home/travis/gopath/src/github.com/500px/s3/s3util/open.go:45 +0xae

933 minutes is a rather long time to be waiting on a socket read.  This is using Go 1.4 on Ubuntu precise.  The function we're calling is seen here: https://github.com/500px/s3/blob/master/s3util/open.go#L45

It seems pretty strange to me that any network request could be stuck this long.  I'm pretty stumped, though. 

Any suggestions?  Is this a bug in the http client?

Dave Cheney

unread,
Feb 12, 2015, 10:36:13 PM2/12/15
to golan...@googlegroups.com, ch...@500px.com
I had a quick look and couldn't see anywhere in the code where you set a timeout. In that case the http client will wait forever for the remote side to respond.

Brad Fitzpatrick

unread,
Feb 13, 2015, 12:50:39 AM2/13/15
to ch...@500px.com, golang-nuts
This is why the DefaultClient's DefaultTransport has a 30 second TCP-level keepalive:

I imagine you're using your own Transport without setting that.  TCP is defined to wait forever unless you say otherwise.  I used to actually used to ssh in the morning, go to class all day with my laptop without network (but be using my laptop), and then go home in the evening and my ssh connections (pre-screen, pre-mosh) would still be alive, because both the client & server obeyed TCP.  Nowadays there's often lots of spec-violation NAT crap in the middle with timeouts, or you have to use mosh.



--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ch...@500px.com

unread,
Feb 17, 2015, 10:12:23 AM2/17/15
to golan...@googlegroups.com, ch...@500px.com
Thanks Brad and Dave, that was indeed the problem.

Chris
Reply all
Reply to author
Forward
0 new messages