Possible race caused by http persistent connections

635 views
Skip to first unread message

Nick Craig-Wood

unread,
Jun 8, 2015, 6:26:59 PM6/8/15
to golang-dev
I've been having a problem with strange panics in my code. These happen
when Transport.CancelRequest is called.

I've managed to make a reproducer here.

http://play.golang.org/p/vB-a96PX1O

If you run this without the race detector, you'll get a panic like this
most of the time (this was the original symptom)

ncw@dogger:~$ go build pconn-race.go
ncw@dogger:~$ ./pconn-race
2015/06/08 23:09:13 Cancelling request
panic: d.nx != 0

goroutine 1 [running]:
crypto/md5.(*digest).checkSum(0xc208082ca0, 0x0, 0x0)
/usr/local/go/src/crypto/md5/md5.go:116 +0x1da
crypto/md5.(*digest).Sum(0xc208064060, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/crypto/md5/md5.go:93 +0x73
main.(*devZero).Close(0xc20801e400, 0x0, 0x0)
/home/ncw/pconn-race.go:60 +0x84
main.doRequest(0x7fa68742d900, 0xc2080f5bd0)
/home/ncw/pconn-race.go:84 +0x535
main.main()
/home/ncw/pconn-race.go:92 +0x3d

goroutine 5 [IO wait]:
net.(*pollDesc).Wait(0xc2080a8060, 0x72, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc2080a8060, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).accept(0xc2080a8000, 0x0, 0x7fa687431b70, 0xc208092078)
/usr/local/go/src/net/fd_unix.go:419 +0x40b
net.(*TCPListener).AcceptTCP(0xc2080b8000, 0x474b5e, 0x0, 0x0)
/usr/local/go/src/net/tcpsock_posix.go:234 +0x4e
net/http.tcpKeepAliveListener.Accept(0xc2080b8000, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/server.go:1976 +0x4c
net/http.(*Server).Serve(0xc20808e000, 0x7fa687432fe0, 0xc2080b8000,
0x0, 0x0)
/usr/local/go/src/net/http/server.go:1728 +0x92
net/http.(*Server).ListenAndServe(0xc20808e000, 0x0, 0x0)
/usr/local/go/src/net/http/server.go:1718 +0x154
net/http.ListenAndServe(0x6afd30, 0xe, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/server.go:1808 +0xba
main.startServer()
/home/ncw/pconn-race.go:38 +0x6e
created by main.main
/home/ncw/pconn-race.go:91 +0x36

goroutine 33 [runnable]:
fmt.(*fmt).init(0xc208106058, 0xc208106010)
/usr/local/go/src/fmt/format.go:71
fmt.newPrinter(0xe128001)
/usr/local/go/src/fmt/print.go:136 +0x96
fmt.Sprintf(0x6c5ab0, 0x11, 0xc20807fc08, 0x1, 0x1, 0x0, 0x0)
/usr/local/go/src/fmt/print.go:202 +0x34
log.Printf(0x6c5ab0, 0x11, 0xc20807fc08, 0x1, 0x1)
/usr/local/go/src/log/log.go:276 +0x50
main.func·001(0x7fa6874364e0, 0xc2080c60a0, 0xc2080da000)
/home/ncw/pconn-race.go:31 +0x174
net/http.HandlerFunc.ServeHTTP(0x71f238, 0x7fa6874364e0, 0xc2080c60a0,
0xc2080da000)
/usr/local/go/src/net/http/server.go:1265 +0x41
net/http.(*ServeMux).ServeHTTP(0xc20803a6c0, 0x7fa6874364e0,
0xc2080c60a0, 0xc2080da000)
/usr/local/go/src/net/http/server.go:1541 +0x17d
net/http.serverHandler.ServeHTTP(0xc20808e000, 0x7fa6874364e0,
0xc2080c60a0, 0xc2080da000)
/usr/local/go/src/net/http/server.go:1703 +0x19a
net/http.(*conn).serve(0xc2080c6000)
/usr/local/go/src/net/http/server.go:1204 +0xb57
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:1751 +0x35e

goroutine 9 [runnable]:
net/http/internal.(*chunkedWriter).Write(0xc2080a60a0, 0xc2080be000,
0x8000, 0x8000, 0x8000, 0x0, 0x0)
/usr/local/go/src/net/http/internal/chunked.go:159
io.Copy(0x7fa687436300, 0xc2080a60a0, 0x7fa687436220, 0xc2080a4060,
0xe128001, 0x0, 0x0)
/usr/local/go/src/io/io.go:364 +0x278
net/http.(*transferWriter).WriteBody(0xc2080a80e0, 0x7fa687436180,
0xc208054280, 0x0, 0x0)
/usr/local/go/src/net/http/transfer.go:207 +0x226
net/http.(*Request).write(0xc208032270, 0x7fa687436180, 0xc208054280,
0x7fa687436100, 0xc20803ab40, 0x0, 0x0)
/usr/local/go/src/net/http/request.go:448 +0xada
net/http.(*persistConn).writeLoop(0xc2080aa000)
/usr/local/go/src/net/http/transport.go:951 +0x28d
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:661 +0xcbc
ncw@dogger:~$


And if you run it with the race detector you get 3 races detected

ncw@dogger:~$ ./pconn-race
2015/06/08 23:13:51 Cancelling request
2015/06/08 23:13:51 Err from Copy: http: unexpected EOF reading trailer
2015/06/08 23:13:51 Close says we read 22609921 bytes, sum
"02aa75f767b2725af6cd7de227e55c27"
2015/06/08 23:13:51 error Do error :Get http://127.0.0.1:9543/: read tcp
127.0.0.1:9543: use of closed network connection
==================
WARNING: DATA RACE
Write by goroutine 10:
crypto/md5.(*digest).Write()
/usr/local/go/src/crypto/md5/md5.go:63 +0x8b
main.(*devZero).Read()
/home/ncw/pconn-race.go:53 +0x112
io/ioutil.(*nopCloser).Read()
<autogenerated>:4 +0xa7
io.(*multiReader).Read()
/usr/local/go/src/io/multi.go:13 +0x125
io.Copy()
/usr/local/go/src/io/io.go:362 +0x230
net/http.(*transferWriter).WriteBody()
/usr/local/go/src/net/http/transfer.go:207 +0x2be
net/http.(*Request).write()
/usr/local/go/src/net/http/request.go:448 +0xd8d
net/http.(*persistConn).writeLoop()
/usr/local/go/src/net/http/transport.go:951 +0x327

Previous read by main goroutine:
crypto/md5.(*digest).Sum()
/usr/local/go/src/crypto/md5/md5.go:92 +0x74
main.(*devZero).Close()
/home/ncw/pconn-race.go:60 +0xaa
main.doRequest()
/home/ncw/pconn-race.go:84 +0x634
main.main()
/home/ncw/pconn-race.go:92 +0x4a

Goroutine 10 (running) created at:
net/http.(*Transport).dialConn()
/usr/local/go/src/net/http/transport.go:661 +0x10f1
net/http.func·019()
/usr/local/go/src/net/http/transport.go:520 +0x8d
==================
==================
WARNING: DATA RACE
Write by goroutine 10:
crypto/md5.(*digest).Write()
/usr/local/go/src/crypto/md5/md5.go:70 +0x19a
main.(*devZero).Read()
/home/ncw/pconn-race.go:53 +0x112
io/ioutil.(*nopCloser).Read()
<autogenerated>:4 +0xa7
io.(*multiReader).Read()
/usr/local/go/src/io/multi.go:13 +0x125
io.Copy()
/usr/local/go/src/io/io.go:362 +0x230
net/http.(*transferWriter).WriteBody()
/usr/local/go/src/net/http/transfer.go:207 +0x2be
net/http.(*Request).write()
/usr/local/go/src/net/http/request.go:448 +0xd8d
net/http.(*persistConn).writeLoop()
/usr/local/go/src/net/http/transport.go:951 +0x327

Previous read by main goroutine:
crypto/md5.(*digest).Sum()
/usr/local/go/src/crypto/md5/md5.go:92 +0x74
main.(*devZero).Close()
/home/ncw/pconn-race.go:60 +0xaa
main.doRequest()
/home/ncw/pconn-race.go:84 +0x634
main.main()
/home/ncw/pconn-race.go:92 +0x4a

Goroutine 10 (running) created at:
net/http.(*Transport).dialConn()
/usr/local/go/src/net/http/transport.go:661 +0x10f1
net/http.func·019()
/usr/local/go/src/net/http/transport.go:520 +0x8d
==================
==================
WARNING: DATA RACE
Write by goroutine 10:
crypto/md5.(*digest).Write()
/usr/local/go/src/crypto/md5/md5.go:72 +0x25c
main.(*devZero).Read()
/home/ncw/pconn-race.go:53 +0x112
io/ioutil.(*nopCloser).Read()
<autogenerated>:4 +0xa7
io.(*multiReader).Read()
/usr/local/go/src/io/multi.go:13 +0x125
io.Copy()
/usr/local/go/src/io/io.go:362 +0x230
net/http.(*transferWriter).WriteBody()
/usr/local/go/src/net/http/transfer.go:207 +0x2be
net/http.(*Request).write()
/usr/local/go/src/net/http/request.go:448 +0xd8d
net/http.(*persistConn).writeLoop()
/usr/local/go/src/net/http/transport.go:951 +0x327

Previous read by main goroutine:
crypto/md5.(*digest).Sum()
/usr/local/go/src/crypto/md5/md5.go:92 +0x74
main.(*devZero).Close()
/home/ncw/pconn-race.go:60 +0xaa
main.doRequest()
/home/ncw/pconn-race.go:84 +0x634
main.main()
/home/ncw/pconn-race.go:92 +0x4a

Goroutine 10 (running) created at:
net/http.(*Transport).dialConn()
/usr/local/go/src/net/http/transport.go:661 +0x10f1
net/http.func·019()
/usr/local/go/src/net/http/transport.go:520 +0x8d
==================
Found 3 data race(s)
ncw@dogger:~$

The 3 races are essentially all the same. They all include the go
routine started by the http transport layer - persistConn.writeLoop.
This go routine isn't under the user's control.

I think there might need to be a bit of synchronisation to make sure
this loop finishes before the request returns to avoid the race above.

Is that right? Or is my code violating some calling condition I'm not
aware of?

Thanks

Nick

PS It is late here so I won't look at replies until tomorrow!
--
Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick

285...@wessie.info

unread,
Jun 9, 2015, 9:55:41 AM6/9/15
to golan...@googlegroups.com
What it looks like to me is that your (*devZero).Read and (*devZero).Close are racing on fields n and hash, protecting the fields with a mutex cures the problem on my end.

Nick Craig-Wood

unread,
Jun 9, 2015, 10:22:52 AM6/9/15
to golan...@googlegroups.com
That is perfectly true, but it isn't my code doing the racing - it is
the persistent connection loop in the http transport layer.

As far as my code is concerned when the Do() call returns it is safe to
do operations on the io.ReadCloser. However in a background loop, the
http transport layer continues to do Read() operations which causes the
problem when my code calls Close() concurrently.

In general io.ReadCloser isn't expected to be safe for concurrent use
and no-where in net/http does it say it has to be.

I think the fact that net/http is using an io.ReadCloser passed in to it
in a concurrent fashion is a bug either in the docs or in net/http.
> 127.0.0.1:9543 <http://127.0.0.1:9543>: use of closed network
> Nick Craig-Wood <ni...@craig-wood.com <javascript:>> --
> http://www.craig-wood.com/nick
>
> --
> You received this message because you are subscribed to the Google
> Groups "golang-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to golang-dev+...@googlegroups.com
> <mailto:golang-dev+...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Peter Waller

unread,
Jun 9, 2015, 10:48:24 AM6/9/15
to Nick Craig-Wood, golang-dev
I don't know if this is related to your problems, but note that you
can't cancel an outgoing http request until after it is in flight. The
documentation string sort of says this for the function, and it's
clear(ish) from the CancelRequest

http://golang.org/src/net/http/transport.go?s=8815:8862#L269

This is annoying, in my opinion, and makes CancelRequest a little
awkward to use.

As an example of how to get around this, the ReverseProxy only tries
to cancel the request after the first Body.Read():

https://github.com/golang/go/blob/master/src/net/http/httputil/reverseproxy.go#L137

Peter Waller

unread,
Jun 9, 2015, 11:08:59 AM6/9/15
to Nick Craig-Wood, golang-dev
Sorry, I missed that you have a time.Sleep in there which presumably
does the trick :)

Nick Craig-Wood

unread,
Jun 9, 2015, 11:40:52 AM6/9/15
to Peter Waller, golang-dev
Yes I'm sure that the request has started, but I wasn't aware of the
cancel problem - thanks for the headsup!

I'm my original problem I'm dealing with data timeouts after the connect
phase.

Daniel Morsing

unread,
Jun 9, 2015, 12:29:21 PM6/9/15
to golan...@googlegroups.com, Brad Fitzpatrick, pe...@scraperwiki.com
The design of the net/http package makes fixing this race a bit awkward. If you issue CancelRequest while blocked on the Reader, you can either return from the RoundTrip immediately and have the read be concurrent with the caller (what we do now), or you can wait for the read to finish before returning from RoundTrip, in which case you introduce a lot of opportunities to shoot yourself in the foot and leak persistent connections. Consider the case where you cancel a request, then stop putting data into the write side of a pipe connected to the reader. In that case you'll end up waiting indefinitely.

I think the best solution might be a documentation change saying that the state of the reader is unpredictable after CancelRequest has been called and it's up to the caller to synchronize.

Nick Craig-Wood

unread,
Jun 9, 2015, 12:58:16 PM6/9/15
to Daniel Morsing, golan...@googlegroups.com, Brad Fitzpatrick, pe...@scraperwiki.com
On 09/06/15 17:29, Daniel Morsing wrote:
> The design of the net/http package makes fixing this race a bit awkward.
> If you issue CancelRequest while blocked on the Reader, you can either
> return from the RoundTrip immediately and have the read be concurrent
> with the caller (what we do now), or you can wait for the read to finish
> before returning from RoundTrip, in which case you introduce a lot of
> opportunities to shoot yourself in the foot and leak persistent
> connections. Consider the case where you cancel a request, then stop
> putting data into the write side of a pipe connected to the reader. In
> that case you'll end up waiting indefinitely.

I see!

> I think the best solution might be a documentation change saying that
> the state of the reader is unpredictable after CancelRequest has been
> called and it's up to the caller to synchronize.

Or maybe state that io.ReadClosers used by the http transport should be
safe for concurrent use in the presence of CancelRequest. It is quite
difficult to synchronise with a go routine you didn't start (the one in
the transport layer) and making your ReadCloser safe for concurrency is
easier.

I wonder whether this race is triggerable with a standard io.ReadCloser,
eg an os.File...

Wesley Bitter

unread,
Jun 9, 2015, 1:42:32 PM6/9/15
to Nick Craig-Wood, Daniel Morsing, golan...@googlegroups.com, Brad Fitzpatrick, pe...@scraperwiki.com
An os.File and a net.Conn are both (documented) safe for concurrent method access. Due to this fact I originally thought this requirement extended to all the io interfaces used in net and its sub-packages. Finding this was not the case is actually a surprise to me due to the concurrent nature of net/http.

A documentation update to require a concurrent safe implementation seems to be the most sane fix for the issue, especially with the potential problems Daniel listed on handling this inside RoundTrip instead.

--
You received this message because you are subscribed to a topic in the Google Groups "golang-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-dev/0Nl6k5Sj6UU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-dev+...@googlegroups.com.

Brad Fitzpatrick

unread,
Jun 9, 2015, 1:45:04 PM6/9/15
to Wesley Bitter, Nick Craig-Wood, Daniel Morsing, golang-dev, Peter Waller
On Tue, Jun 9, 2015 at 10:42 AM, Wesley Bitter <285...@wessie.info> wrote:
An os.File and a net.Conn are both (documented) safe for concurrent method access.

No such blanket documentation guarantee exists.

Perhaps certain methods, but definitely not all.

 

Wesley Bitter

unread,
Jun 9, 2015, 1:59:23 PM6/9/15
to Brad Fitzpatrick, Nick Craig-Wood, Daniel Morsing, golang-dev, Peter Waller
an os.File indeed does not seem to be documented like that, I'm not sure why I keep thinking these things then. A net.Conn is though, as given "Multiple goroutines may invoke methods on a Conn simultaneously." on the interface documentation.

Excuses for the confusion, I've made too much assumptions from memory it seems.

Caleb Spare

unread,
Jun 9, 2015, 2:18:09 PM6/9/15
to Brad Fitzpatrick, golang-dev
> No such blanket documentation guarantee exists.

It does for net.Conn.

Brad Fitzpatrick

unread,
Jun 9, 2015, 2:23:15 PM6/9/15
to Caleb Spare, golang-dev
On Tue, Jun 9, 2015 at 10:52 AM, Caleb Spare <ces...@gmail.com> wrote:
> No such blanket documentation guarantee exists.

It does for net.Conn.

Ugh. I see.

I will clarify the docs. It means that (for example) a Read and Write may both be happening. But not (for example) two concurrent Reads.

 

Nick Craig-Wood

unread,
Jun 10, 2015, 11:29:25 AM6/10/15
to Brad Fitzpatrick, Caleb Spare, golang-dev
Shall I make an issue out of the original post?

roger peppe

unread,
Jun 13, 2015, 8:06:40 AM6/13/15
to Brad Fitzpatrick, Caleb Spare, golang-dev

I have always assumed that the blanket guarantee is true for net.Conn. One data point: io.Pipe certainly seems to take steps to ensure that behaviour with multiple concurrent readers and writers is sane.

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages