We have a Go application with a pool of worker goroutines. Over time, we've observed that the number of "busy" workers slowly grows until the pool is exhausted. After a lot of debugging, it appears that the "busy" workers are all becoming stuck during a read on an HTTP client (to an S3 bucket, but I'm not sure that's relevant other than it causes it to happen fairly often)
I set up net/http/pprof and, after letting the process run for a day or so, captured the stacks of all the goroutines. This is what stood out:
goroutine 414 [IO wait, 933 minutes]:
net.(*pollDesc).Wait(0xc208582140, 0x72, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc208582140, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).Read(0xc2085820e0, 0xc219af5219, 0xf8be7, 0xf8be7, 0x0, 0x7fbc7a68adc8, 0xc20c4fb790)
/usr/local/go/src/net/fd_unix.go:242 +0x40f
net.(*conn).Read(0xc20875f108, 0xc219af5219, 0xf8be7, 0xf8be7, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:121 +0xdc
net/http.noteEOFReader.Read(0x7fbc7a68c988, 0xc20875f108, 0xc20868b7b8, 0xc219af5219, 0xf8be7, 0xf8be7, 0xc20870bc50, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:1270 +0x6e
net/http.(*noteEOFReader).Read(0xc20c74a800, 0xc219af5219, 0xf8be7, 0xf8be7, 0xc2086666e0, 0x0, 0x0)
<autogenerated>:125 +0xd4
bufio.(*Reader).Read(0xc20c702840, 0xc219af5219, 0xf8be7, 0xf8be7, 0x1, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:164 +0x13a
io.(*LimitedReader).Read(0xc20c73e8e0, 0xc219af5219, 0xf8be7, 0xf8be7, 0x41599a, 0x0, 0x0)
/usr/local/go/src/io/io.go:408 +0xce
net/http.(*body).readLocked(0xc20e9ff0c0, 0xc219af5219, 0xf8be7, 0xf8be7, 0xffffffff, 0x0, 0x0)
/usr/local/go/src/net/http/transfer.go:584 +0x7a
net/http.(*body).Read(0xc20e9ff0c0, 0xc219af5219, 0xf8be7, 0xf8be7, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/transfer.go:579 +0x115
net/http.(*bodyEOFSignal).Read(0xc20e9ff180, 0xc219af5219, 0xf8be7, 0xf8be7, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:1193 +0x285
It seems pretty strange to me that any network request could be stuck this long. I'm pretty stumped, though.