[go] net/http: fix Transport.MaxConnsPerHost limits & idle pool races

946 views
Skip to first unread message

Russ Cox (Gerrit)

unread,
Jun 29, 2019, 9:37:09 AM6/29/19
to Ian Lance Taylor, goph...@pubsubhelper.golang.org, Russ Cox, golang-co...@googlegroups.com

Russ Cox has uploaded this change for review.

View Change

net/http: fix Transport.MaxConnsPerHost limits & idle pool races

There were at least three races in the implementation of the pool of
idle HTTP connections before this CL.

The first race is that HTTP/2 connections can be shared for many
requests, but each requesting goroutine would take the connection out
of the pool and then immediately return it before using it; this
created unnecessary, tiny little race windows during which another
goroutine might dial a second connection instead of reusing the first.
This CL changes the idle pool to just leave the HTTP/2 connection in
the pool permanently (until there is reason to close it), instead of
doing the take-it-out-put-it-back dance race.

The second race is that “is there an idle connection?” and
“register to wait for an idle connection” were implemented as two
separate steps, in different critical sections. So a client could end
up registered to wait for an idle connection and be waiting or perhaps
dialing, not having noticed the idle connection sitting in the pool
that arrived between the two steps.

The third race is that t.getIdleConnCh assumes that the inability to
send on the channel means the client doesn't need the result, when it
could mean that the client has not yet entered the select.
That is, the main dial does:

idleConnCh := t.getIdleConnCh(cm)
select {
case v := <-dialc:
...
case pc := <-idleConnCh
...
...
}

But then tryPutIdleConn does:

waitingDialer := t.idleConnCh[key] // what getIdleConnCh(cm) returned
select {
case waitingDialer <- pconn:
// We're done ...
return nil
default:
if waitingDialer != nil {
// They had populated this, but their dial won
// first, so we can clean up this map entry.
delete(t.idleConnCh, key)
}
}

If the client has returned from getIdleConnCh but not yet reached the
select, tryPutIdleConn will be unable to do the send, incorrectly
conclude that the client does not care anymore, and put the connection
in the idle pool instead, again leaving the client dialing unnecessarily
while a connection sits in the idle pool.

(It's also odd that the success case does not clean up the map entry,
and also that the map has room for only a single waiting goroutine for
a given host.)

None of these races mattered too much before Go 1.11: at most they
meant that connections were not reused quite as promptly as possible,
or a few more than necessary would be created. But Go 1.11 added
Transport.MaxConnsPerHost, which limited the number of connections
created for a given host. The default is 0 (unlimited), but if a user
did explicitly impose a low limit (2 is common), all these misplaced
conns could easily add up to the entire limit, causing a deadlock.
This was causing intermittent timeouts in TestTransportMaxConnsPerHost.

The addition of the MaxConnsPerHost support added its own races.

For example, here t.incHostConnCount could increment the count
and return a channel ready for receiving, and then the client would
not receive from it nor ever issue the decrement, because the select
need not evaluate these two cases in order:

select {
case <-t.incHostConnCount(cmKey):
// count below conn per host limit; proceed
case pc := <-t.getIdleConnCh(cm):
if trace != nil && trace.GotConn != nil {
trace.GotConn(httptrace.GotConnInfo{Conn: pc.conn, Reused: pc.isReused()})
}
return pc, nil
...
}

Obviously, unmatched increments are another way to get to a deadlock.
TestTransportMaxConnsPerHost deadlocked approximately 100% of
the time with a small random sleep added between incHostConnCount
and the select:

ch := t.incHostConnCount(cmKey):
time.Sleep(time.Duration(rand.Intn(10))*time.Millisecond)
select {
case <-ch
// count below conn per host limit; proceed
case pc := <-t.getIdleConnCh(cm):
...
}

The limit also did not properly apply to HTTP/2, because of the
decrement being attached to the underlying net.Conn.Close
and net/http not having access to the underlying HTTP/2 conn.
The alternate decrements for HTTP/2 may also have introduced
spurious decrements (discussion in #29889). Perhaps those
spurious decrements or other races caused the other intermittent
non-deadlock failures in TestTransportMaxConnsPerHost,
in which the HTTP/2 phase created too many connections (#31982).

This CL replaces the buggy, racy code with new code that is hopefully
neither buggy nor racy.

Fixes #29889.
Fixes #31982.
Fixes #32336.

Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
---
M src/net/http/export_test.go
M src/net/http/serve_test.go
M src/net/http/transport.go
M src/net/http/transport_test.go
4 files changed, 389 insertions(+), 281 deletions(-)

diff --git a/src/net/http/export_test.go b/src/net/http/export_test.go
index f0dfa8c..bdb91db 100644
--- a/src/net/http/export_test.go
+++ b/src/net/http/export_test.go
@@ -166,30 +166,35 @@
return 0
}

-func (t *Transport) IdleConnChMapSizeForTesting() int {
+func (t *Transport) IdleConnWaitMapSizeForTesting() int {
t.idleMu.Lock()
defer t.idleMu.Unlock()
- return len(t.idleConnCh)
+ return len(t.idleConnWait)
}

func (t *Transport) IsIdleForTesting() bool {
t.idleMu.Lock()
defer t.idleMu.Unlock()
- return t.wantIdle
+ return t.closeIdle
}

-func (t *Transport) RequestIdleConnChForTesting() {
- t.getIdleConnCh(connectMethod{nil, "http", "example.com", false})
+func (t *Transport) QueueForIdleConnForTesting() {
+ t.queueForIdleConn(nil)
}

func (t *Transport) PutIdleTestConn(scheme, addr string) bool {
c, _ := net.Pipe()
key := connectMethodKey{"", scheme, addr, false}
- select {
- case <-t.incHostConnCount(key):
- default:
- return false
+
+ if t.MaxConnsPerHost > 0 {
+ t.connsPerHostMu.Lock()
+ if t.connsPerHost == nil {
+ t.connsPerHost = make(map[connectMethodKey]int)
+ }
+ t.connsPerHost[key]++
+ t.connsPerHostMu.Unlock()
}
+
return t.tryPutIdleConn(&persistConn{
t: t,
conn: c, // dummy
diff --git a/src/net/http/serve_test.go b/src/net/http/serve_test.go
index e7ed15c..1de255a 100644
--- a/src/net/http/serve_test.go
+++ b/src/net/http/serve_test.go
@@ -2407,6 +2407,7 @@
}

// See issues 8209 and 8414.
+// Both issues involved panics in the implementation of TimeoutHandler.
func TestTimeoutHandlerRaceHeader(t *testing.T) {
setParallel(t)
defer afterTest(t)
@@ -2434,7 +2435,9 @@
defer func() { <-gate }()
res, err := c.Get(ts.URL)
if err != nil {
- t.Error(err)
+ // We see ECONNRESET from the connection occasionally,
+ // and that's OK: this test is checking that the server does not panic.
+ t.Log(err)
return
}
defer res.Body.Close()
@@ -5507,9 +5510,18 @@
if a1 != a2 {
t.Fatal("expected first two requests on same connection")
}
+ addr := strings.TrimPrefix(ts.URL, "http://")
+ n := tr.IdleConnCountForTesting("http", addr)
+ if n != 1 {
+ t.Fatalf("idle count for %q after 2 gets = %d, want 1", addr, n)
+ }
var idle0 int
if !waitCondition(2*time.Second, 10*time.Millisecond, func() bool {
idle0 = tr.IdleConnKeyCountForTesting()
+ n := tr.IdleConnCountForTesting("http", addr)
+ if n > 1 {
+ t.Fatalf("idle count for %q after 2 gets = %d, want ≤ 1", addr, n)
+ }
return idle0 == 1
}) {
t.Fatalf("idle count before SetKeepAlivesEnabled called = %v; want 1", idle0)
@@ -5520,6 +5532,10 @@
var idle1 int
if !waitCondition(2*time.Second, 10*time.Millisecond, func() bool {
idle1 = tr.IdleConnKeyCountForTesting()
+ n := tr.IdleConnCountForTesting("http", addr)
+ if n > 1 {
+ t.Fatalf("idle count for %q after 2 gets = %d, want ≤ 1", addr, n)
+ }
return idle1 == 0
}) {
t.Fatalf("idle count after SetKeepAlivesEnabled called = %v; want 0", idle1)
diff --git a/src/net/http/transport.go b/src/net/http/transport.go
index 26f642a..fd0651e 100644
--- a/src/net/http/transport.go
+++ b/src/net/http/transport.go
@@ -57,15 +57,6 @@
// MaxIdleConnsPerHost.
const DefaultMaxIdleConnsPerHost = 2

-// connsPerHostClosedCh is a closed channel used by MaxConnsPerHost
-// for the property that receives from a closed channel return the
-// zero value.
-var connsPerHostClosedCh = make(chan struct{})
-
-func init() {
- close(connsPerHostClosedCh)
-}
-
// Transport is an implementation of RoundTripper that supports HTTP,
// HTTPS, and HTTP proxies (for either HTTP or HTTPS with CONNECT).
//
@@ -102,11 +93,11 @@
// request is treated as idempotent but the header is not sent on the
// wire.
type Transport struct {
- idleMu sync.Mutex
- wantIdle bool // user has requested to close all idle conns
- idleConn map[connectMethodKey][]*persistConn // most recently used at end
- idleConnCh map[connectMethodKey]chan *persistConn
- idleLRU connLRU
+ idleMu sync.Mutex
+ closeIdle bool // user has requested to close all idle conns
+ idleConn map[connectMethodKey][]*persistConn // most recently used at end
+ idleConnWait map[connectMethodKey]wantConnQueue // waiting getConns
+ idleLRU connLRU

reqMu sync.Mutex
reqCanceler map[*Request]func(error)
@@ -114,9 +105,9 @@
altMu sync.Mutex // guards changing altProto only
altProto atomic.Value // of nil or map[string]RoundTripper, key is URI scheme

- connCountMu sync.Mutex
- connPerHostCount map[connectMethodKey]int
- connPerHostAvailable map[connectMethodKey]chan struct{}
+ connsPerHostMu sync.Mutex
+ connsPerHost map[connectMethodKey]int
+ connsPerHostWait map[connectMethodKey]wantConnQueue // waiting getConns

// Proxy specifies a function to return a proxy for a given
// Request. If the function returns a non-nil error, the
@@ -203,11 +194,6 @@
// active, and idle states. On limit violation, dials will block.
//
// Zero means no limit.
- //
- // For HTTP/2, this currently only controls the number of new
- // connections being created at a time, instead of the total
- // number. In practice, hosts using HTTP/2 only have about one
- // idle connection, though.
MaxConnsPerHost int

// IdleConnTimeout is the maximum amount of time an idle
@@ -543,7 +529,6 @@
var resp *Response
if pconn.alt != nil {
// HTTP/2 path.
- t.putOrCloseIdleConn(pconn)
t.setReqCanceler(req, nil) // not cancelable with CancelRequest
resp, err = pconn.alt.RoundTrip(req)
} else {
@@ -554,7 +539,6 @@
}
if http2isNoCachedConnError(err) {
t.removeIdleConn(pconn)
- t.decHostConnCount(cm.key()) // clean up the persistent connection
} else if !pconn.shouldRetryRequest(req, err) {
// Issue 16465: return underlying net.Conn.Read error from peek,
// as we've historically done.
@@ -665,8 +649,7 @@
t.idleMu.Lock()
m := t.idleConn
t.idleConn = nil
- t.idleConnCh = nil
- t.wantIdle = true
+ t.closeIdle = true
t.idleLRU = connLRU{}
t.idleMu.Unlock()
for _, conns := range m {
@@ -762,7 +745,7 @@
var (
errKeepAlivesDisabled = errors.New("http: putIdleConn: keep alives disabled")
errConnBroken = errors.New("http: putIdleConn: connection is in bad state")
- errWantIdle = errors.New("http: putIdleConn: CloseIdleConnections was called")
+ errCloseIdle = errors.New("http: putIdleConn: CloseIdleConnections was called")
errTooManyIdle = errors.New("http: putIdleConn: too many idle connections")
errTooManyIdleHost = errors.New("http: putIdleConn: too many idle connections for host")
errCloseIdleConns = errors.New("http: CloseIdleConnections called")
@@ -821,29 +804,53 @@
return errConnBroken
}
pconn.markReused()
- key := pconn.cacheKey
+
+ // HTTP/2 (pconn.alt != nil) connections do not come out of the idle list,
+ // because multiple goroutines can use them simultaneously.
+ // If this is an HTTP/2 connection being “returned,” we're done.
+ if pconn.alt != nil && t.idleLRU.m[pconn] != nil {
+ return nil
+ }

t.idleMu.Lock()
defer t.idleMu.Unlock()

- waitingDialer := t.idleConnCh[key]
- select {
- case waitingDialer <- pconn:
- // We're done with this pconn and somebody else is
- // currently waiting for a conn of this type (they're
- // actively dialing, but this conn is ready
- // first). Chrome calls this socket late binding. See
- // https://insouciant.org/tech/connection-management-in-chromium/
- return nil
- default:
- if waitingDialer != nil {
- // They had populated this, but their dial won
- // first, so we can clean up this map entry.
- delete(t.idleConnCh, key)
+ // Deliver pconn to goroutine waiting for idle connection, if any.
+ key := pconn.cacheKey
+ if q, ok := t.idleConnWait[key]; ok {
+ done := false
+ if pconn.alt == nil {
+ // HTTP/1.
+ // Loop over the waiting list until we find a w that isn't done already, and hand it pconn.
+ for q.len() > 0 {
+ w := q.popFront()
+ if w.canDeliver(pconn, nil) {
+ done = true
+ break
+ }
+ }
+ } else {
+ // HTTP/2.
+ // Can hand the same pconn to everyone in the waiting list,
+ // and we still won't be done: we want to put it in the idle
+ // list unconditionally, for any future clients too.
+ for q.len() > 0 {
+ w := q.popFront()
+ w.canDeliver(pconn, nil)
+ }
+ }
+ if q.len() == 0 {
+ delete(t.idleConnWait, key)
+ } else {
+ t.idleConnWait[key] = q
+ }
+ if done {
+ return nil
}
}
- if t.wantIdle {
- return errWantIdle
+
+ if t.closeIdle {
+ return errCloseIdle
}
if t.idleConn == nil {
t.idleConn = make(map[connectMethodKey][]*persistConn)
@@ -864,71 +871,81 @@
oldest.close(errTooManyIdle)
t.removeIdleConnLocked(oldest)
}
- if t.IdleConnTimeout > 0 {
+
+ // Set idle timer, but only for HTTP/1 (pconn.alt == nil).
+ if t.IdleConnTimeout > 0 && pconn.alt == nil {
if pconn.idleTimer != nil {
pconn.idleTimer.Reset(t.IdleConnTimeout)
} else {
- // idleTimer does not apply to HTTP/2
- if pconn.alt == nil {
- pconn.idleTimer = time.AfterFunc(t.IdleConnTimeout, pconn.closeConnIfStillIdle)
- }
+ pconn.idleTimer = time.AfterFunc(t.IdleConnTimeout, pconn.closeConnIfStillIdle)
}
}
pconn.idleAt = time.Now()
return nil
}

-// getIdleConnCh returns a channel to receive and return idle
-// persistent connection for the given connectMethod.
-// It may return nil, if persistent connections are not being used.
-func (t *Transport) getIdleConnCh(cm connectMethod) chan *persistConn {
+// queueForIdleConn queues w to receive the next idle connection for w.cm.
+// As an optimization hint to the caller, queueForIdleConn reports whether
+// it successfully delivered an already-idle connection.
+func (t *Transport) queueForIdleConn(w *wantConn) bool {
if t.DisableKeepAlives {
- return nil
+ return false
}
- key := cm.key()
- t.idleMu.Lock()
- defer t.idleMu.Unlock()
- t.wantIdle = false
- if t.idleConnCh == nil {
- t.idleConnCh = make(map[connectMethodKey]chan *persistConn)
- }
- ch, ok := t.idleConnCh[key]
- if !ok {
- ch = make(chan *persistConn)
- t.idleConnCh[key] = ch
- }
- return ch
-}

-func (t *Transport) getIdleConn(cm connectMethod) (pconn *persistConn, idleSince time.Time) {
- key := cm.key()
t.idleMu.Lock()
defer t.idleMu.Unlock()
- for {
- pconns, ok := t.idleConn[key]
- if !ok {
- return nil, time.Time{}
- }
- if len(pconns) == 1 {
- pconn = pconns[0]
- delete(t.idleConn, key)
- } else {
- // 2 or more cached connections; use the most
- // recently used one at the end.
- pconn = pconns[len(pconns)-1]
- t.idleConn[key] = pconns[:len(pconns)-1]
- }
- t.idleLRU.remove(pconn)
- if pconn.isBroken() {
- // There is a tiny window where this is
- // possible, between the connecting dying and
- // the persistConn readLoop calling
- // Transport.removeIdleConn. Just skip it and
- // carry on.
- continue
- }
- return pconn, pconn.idleAt
+ t.closeIdle = false
+
+ if w == nil {
+ // Happens in test hook.
+ return false
}
+
+ // Look for most recently-used idle connection.
+ if list, ok := t.idleConn[w.key]; ok {
+ stop := false
+ delivered := false
+ for len(list) > 0 && !stop {
+ pconn := list[len(list)-1]
+ if pconn.isBroken() {
+ // persistConn.readLoop has marked the connection broken,
+ // but Transport.removeIdleConn has not yet removed it from the idle list.
+ // Drop on floor on behalf of Transport.removeIdleConn.
+ list = list[:len(list)-1]
+ continue
+ }
+ delivered = w.canDeliver(pconn, nil)
+ if delivered {
+ if pconn.alt != nil {
+ // HTTP/2: multiple clients can share pconn.
+ // Leave it in the list.
+ } else {
+ // HTTP/1: only one client can use pconn.
+ // Remove it from the list.
+ t.idleLRU.remove(pconn)
+ list = list[:len(list)-1]
+ }
+ }
+ stop = true
+ }
+ if len(list) > 0 {
+ t.idleConn[w.key] = list
+ } else {
+ delete(t.idleConn, w.key)
+ }
+ if stop {
+ return delivered
+ }
+ }
+
+ // Register to receive next connection that becomes idle.
+ if t.idleConnWait == nil {
+ t.idleConnWait = make(map[connectMethodKey]wantConnQueue)
+ }
+ q := t.idleConnWait[w.key]
+ q.pushBack(w)
+ t.idleConnWait[w.key] = q
+ return false
}

// removeIdleConn marks pconn as dead.
@@ -1015,20 +1032,135 @@
return zeroDialer.DialContext(ctx, network, addr)
}

+// A wantConn records state about a wanted connection
+// (that is, an active call to getConn).
+// The conn may be gotten by dialing or by finding an idle connection,
+// or a cancellation may make the conn no longer wanted.
+// These three options are racing against each other and use
+// wantConn to coordinate and agree about the winning outcome.
+type wantConn struct {
+ cm connectMethod
+ key connectMethodKey // cm.key()
+ ctx context.Context // context for dial
+ ready chan struct{} // closed when pc, err pair is delivered
+
+ // hooks for testing to know when dials are done
+ // beforeDial is called in the getConn goroutine when the dial is queued.
+ // afterDial is called when the dial is completed or cancelled.
+ beforeDial func()
+ afterDial func()
+
+ mu sync.Mutex // protects pc, err, close(ready)
+ pc *persistConn
+ err error
+}
+
+// waiting reports whether w is still waiting for an answer (connection or error).
+func (w *wantConn) waiting() bool {
+ w.mu.Lock()
+ waiting := w.pc == nil && w.err == nil
+ w.mu.Unlock()
+ return waiting
+}
+
+// canDeliver attempts to deliver pc, err to w and reports whether it succeeded.
+func (w *wantConn) canDeliver(pc *persistConn, err error) bool {
+ w.mu.Lock()
+ defer w.mu.Unlock()
+
+ if w.pc != nil || w.err != nil {
+ return false
+ }
+
+ w.pc = pc
+ w.err = err
+ if w.pc == nil && w.err == nil {
+ panic("net/http: internal error: misuse of canDeliver")
+ }
+ close(w.ready)
+ return true
+}
+
+// cancel marks w as no longer wanting a result (for example, due to cancellation).
+// If a connection has been delivered already, cancel returns it with t.putOrCloseIdleConn.
+func (w *wantConn) cancel(t *Transport, err error) {
+ w.mu.Lock()
+ if w.pc == nil && w.err == nil {
+ close(w.ready) // catch misbehavior in future delivery
+ }
+ pc := w.pc
+ w.err = err
+ w.mu.Unlock()
+
+ if pc != nil {
+ t.putOrCloseIdleConn(pc)
+ }
+}
+
+// A wantConnQueue is a queue of wantConns.
+type wantConnQueue struct {
+ // head[headPos:] is the first section of the queue, where popFront happens.
+ head []*wantConn
+ headPos int
+ // tail is the second section of the queue, where pushBack happens
+ tail []*wantConn
+}
+
+// len returns the number of items in the queue.
+func (q *wantConnQueue) len() int {
+ return len(q.head) - q.headPos + len(q.tail)
+}
+
+// pushBack adds w to the back of the queue.
+func (q *wantConnQueue) pushBack(w *wantConn) {
+ q.tail = append(q.tail, w)
+}
+
+// popFront removes and returns the w at the front of the queue.
+func (q *wantConnQueue) popFront() *wantConn {
+ if q.headPos >= len(q.head) {
+ if len(q.tail) == 0 {
+ return nil
+ }
+ // Pick up tail as new head, clear tail.
+ q.head, q.headPos, q.tail = q.tail, 0, q.head[:0]
+ }
+ w := q.head[q.headPos]
+ q.headPos++
+ return w
+}
+
// getConn dials and creates a new persistConn to the target as
// specified in the connectMethod. This includes doing a proxy CONNECT
// and/or setting up TLS. If this doesn't return an error, the persistConn
// is ready to write requests to.
-func (t *Transport) getConn(treq *transportRequest, cm connectMethod) (*persistConn, error) {
+func (t *Transport) getConn(treq *transportRequest, cm connectMethod) (pc *persistConn, err error) {
req := treq.Request
trace := treq.trace
ctx := req.Context()
if trace != nil && trace.GetConn != nil {
trace.GetConn(cm.addr())
}
- if pc, idleSince := t.getIdleConn(cm); pc != nil {
+
+ w := &wantConn{
+ cm: cm,
+ key: cm.key(),
+ ctx: ctx,
+ ready: make(chan struct{}, 1),
+ beforeDial: testHookPrePendingDial,
+ afterDial: testHookPostPendingDial,
+ }
+ defer func() {
+ if err != nil {
+ w.cancel(t, err)
+ }
+ }()
+
+ // Queue for idle connection.
+ if delivered := t.queueForIdleConn(w); delivered {
+ pc := w.pc
if trace != nil && trace.GotConn != nil {
- trace.GotConn(pc.gotIdleConnTrace(idleSince))
+ trace.GotConn(pc.gotIdleConnTrace(pc.idleAt))
}
// set request canceler to some non-nil function so we
// can detect whether it was cleared between now and when
@@ -1037,108 +1169,44 @@
return pc, nil
}

- type dialRes struct {
- pc *persistConn
- err error
- }
- dialc := make(chan dialRes)
- cmKey := cm.key()
-
- // Copy these hooks so we don't race on the postPendingDial in
- // the goroutine we launch. Issue 11136.
- testHookPrePendingDial := testHookPrePendingDial
- testHookPostPendingDial := testHookPostPendingDial
-
- handlePendingDial := func() {
- testHookPrePendingDial()
- go func() {
- if v := <-dialc; v.err == nil {
- t.putOrCloseIdleConn(v.pc)
- } else {
- t.decHostConnCount(cmKey)
- }
- testHookPostPendingDial()
- }()
- }
-
cancelc := make(chan error, 1)
t.setReqCanceler(req, func(err error) { cancelc <- err })

- if t.MaxConnsPerHost > 0 {
- select {
- case <-t.incHostConnCount(cmKey):
- // count below conn per host limit; proceed
- case pc := <-t.getIdleConnCh(cm):
- if trace != nil && trace.GotConn != nil {
- trace.GotConn(httptrace.GotConnInfo{Conn: pc.conn, Reused: pc.isReused()})
- }
- return pc, nil
- case <-req.Cancel:
- return nil, errRequestCanceledConn
- case <-req.Context().Done():
- return nil, req.Context().Err()
- case err := <-cancelc:
- if err == errRequestCanceled {
- err = errRequestCanceledConn
- }
- return nil, err
- }
- }
+ // Queue for permission to dial.
+ t.queueForDial(w)

- go func() {
- pc, err := t.dialConn(ctx, cm)
- dialc <- dialRes{pc, err}
- }()
-
- idleConnCh := t.getIdleConnCh(cm)
+ // Wait for completion or cancellation.
select {
- case v := <-dialc:
- // Our dial finished.
- if v.pc != nil {
- if trace != nil && trace.GotConn != nil && v.pc.alt == nil {
- trace.GotConn(httptrace.GotConnInfo{Conn: v.pc.conn})
+ case <-w.ready:
+ // Trace success but only for HTTP/1.
+ // HTTP/2 calls trace.GotConn itself.
+ if w.pc != nil && w.pc.alt == nil && trace != nil && trace.GotConn != nil {
+ trace.GotConn(httptrace.GotConnInfo{Conn: w.pc.conn, Reused: w.pc.isReused()})
+ }
+ if w.err != nil {
+ // If the request has been cancelled, that's probably
+ // what caused w.err; if so, prefer to return the
+ // cancellation error (see golang.org/issue/16049).
+ select {
+ case <-req.Cancel:
+ return nil, errRequestCanceledConn
+ case <-req.Context().Done():
+ return nil, req.Context().Err()
+ case err := <-cancelc:
+ if err == errRequestCanceled {
+ err = errRequestCanceledConn
+ }
+ return nil, err
+ default:
+ // return below
}
- return v.pc, nil
}
- // Our dial failed. See why to return a nicer error
- // value.
- t.decHostConnCount(cmKey)
- select {
- case <-req.Cancel:
- // It was an error due to cancellation, so prioritize that
- // error value. (Issue 16049)
- return nil, errRequestCanceledConn
- case <-req.Context().Done():
- return nil, req.Context().Err()
- case err := <-cancelc:
- if err == errRequestCanceled {
- err = errRequestCanceledConn
- }
- return nil, err
- default:
- // It wasn't an error due to cancellation, so
- // return the original error message:
- return nil, v.err
- }
- case pc := <-idleConnCh:
- // Another request finished first and its net.Conn
- // became available before our dial. Or somebody
- // else's dial that they didn't use.
- // But our dial is still going, so give it away
- // when it finishes:
- handlePendingDial()
- if trace != nil && trace.GotConn != nil {
- trace.GotConn(httptrace.GotConnInfo{Conn: pc.conn, Reused: pc.isReused()})
- }
- return pc, nil
+ return w.pc, w.err
case <-req.Cancel:
- handlePendingDial()
return nil, errRequestCanceledConn
case <-req.Context().Done():
- handlePendingDial()
return nil, req.Context().Err()
case err := <-cancelc:
- handlePendingDial()
if err == errRequestCanceled {
err = errRequestCanceledConn
}
@@ -1146,81 +1214,97 @@
}
}

-// incHostConnCount increments the count of connections for a
-// given host. It returns an already-closed channel if the count
-// is not at its limit; otherwise it returns a channel which is
-// notified when the count is below the limit.
-func (t *Transport) incHostConnCount(cmKey connectMethodKey) <-chan struct{} {
+// queueForDial queues w to wait for permission to begin dialing.
+// Once w receives permission to dial, it will do so in a separate goroutine.
+func (t *Transport) queueForDial(w *wantConn) {
+ w.beforeDial()
if t.MaxConnsPerHost <= 0 {
- return connsPerHostClosedCh
+ go t.dialConnFor(w)
+ return
}
- t.connCountMu.Lock()
- defer t.connCountMu.Unlock()
- if t.connPerHostCount[cmKey] == t.MaxConnsPerHost {
- if t.connPerHostAvailable == nil {
- t.connPerHostAvailable = make(map[connectMethodKey]chan struct{})
+
+ t.connsPerHostMu.Lock()
+ defer t.connsPerHostMu.Unlock()
+
+ if n := t.connsPerHost[w.key]; n < t.MaxConnsPerHost {
+ if t.connsPerHost == nil {
+ t.connsPerHost = make(map[connectMethodKey]int)
}
- ch, ok := t.connPerHostAvailable[cmKey]
- if !ok {
- ch = make(chan struct{})
- t.connPerHostAvailable[cmKey] = ch
- }
- return ch
+ t.connsPerHost[w.key] = n + 1
+ go t.dialConnFor(w)
+ return
}
- if t.connPerHostCount == nil {
- t.connPerHostCount = make(map[connectMethodKey]int)
+
+ if t.connsPerHostWait == nil {
+ t.connsPerHostWait = make(map[connectMethodKey]wantConnQueue)
}
- t.connPerHostCount[cmKey]++
- // return a closed channel to avoid race: if decHostConnCount is called
- // after incHostConnCount and during the nil check, decHostConnCount
- // will delete the channel since it's not being listened on yet.
- return connsPerHostClosedCh
+ q := t.connsPerHostWait[w.key]
+ q.pushBack(w)
+ t.connsPerHostWait[w.key] = q
}

-// decHostConnCount decrements the count of connections
-// for a given host.
-// See Transport.MaxConnsPerHost.
-func (t *Transport) decHostConnCount(cmKey connectMethodKey) {
+// dialConnFor dials on behalf of w and delivers the result to w.
+// dialConnFor has received permission to dial w.cm and is counted in t.connCount[w.cm.key()].
+// If the dial is cancelled or unsuccessful, dialConnFor decrements t.connCount[w.cm.key()].
+func (t *Transport) dialConnFor(w *wantConn) {
+ defer w.afterDial()
+
+ pc, err := t.dialConn(w.ctx, w.cm)
+ delivered := w.canDeliver(pc, err)
+ if err == nil && (!delivered || pc.alt != nil) {
+ // pconn was not passed to w,
+ // or it is HTTP/2 and can be shared.
+ // Add to the idle connection pool.
+ t.putOrCloseIdleConn(pc)
+ }
+ if err != nil {
+ t.decConnsPerHost(w.key)
+ }
+}
+
+// decConnsPerHost decrements the per-host connection count for key,
+// which may in turn give a different waiting goroutine permission to dial.
+func (t *Transport) decConnsPerHost(key connectMethodKey) {
if t.MaxConnsPerHost <= 0 {
return
}
- t.connCountMu.Lock()
- defer t.connCountMu.Unlock()
- t.connPerHostCount[cmKey]--
- select {
- case t.connPerHostAvailable[cmKey] <- struct{}{}:
- default:
- // close channel before deleting avoids getConn waiting forever in
- // case getConn has reference to channel but hasn't started waiting.
- // This could lead to more than MaxConnsPerHost in the unlikely case
- // that > 1 go routine has fetched the channel but none started waiting.
- if t.connPerHostAvailable[cmKey] != nil {
- close(t.connPerHostAvailable[cmKey])
+
+ t.connsPerHostMu.Lock()
+ defer t.connsPerHostMu.Unlock()
+ n := t.connsPerHost[key]
+ if n == 0 {
+ // Shouldn't happen, but if it does, the counting is buggy and could
+ // easily lead to a silent deadlock, so report the problem loudly.
+ panic("net/http: internal error: connCount underflow")
+ }
+
+ // Can we hand this count to a goroutine waiting to dial?
+ if q := t.connsPerHostWait[key]; q.len() > 0 {
+ done := false
+ for q.len() > 0 {
+ w := q.popFront()
+ if w.waiting() {
+ go t.dialConnFor(w)
+ done = true
+ break
+ }
}
- delete(t.connPerHostAvailable, cmKey)
+ if q.len() == 0 {
+ delete(t.connsPerHostWait, key)
+ } else {
+ t.connsPerHostWait[key] = q
+ }
+ if done {
+ return
+ }
}
- if t.connPerHostCount[cmKey] == 0 {
- delete(t.connPerHostCount, cmKey)
- }
-}

-// connCloseListener wraps a connection, the transport that dialed it
-// and the connected-to host key so the host connection count can be
-// transparently decremented by whatever closes the embedded connection.
-type connCloseListener struct {
- net.Conn
- t *Transport
- cmKey connectMethodKey
- didClose int32
-}
-
-func (c *connCloseListener) Close() error {
- if atomic.AddInt32(&c.didClose, 1) != 1 {
- return nil
+ // Otherwise, decrement the recorded count.
+ if n--; n == 0 {
+ delete(t.connsPerHost, key)
+ } else {
+ t.connsPerHost[key] = n
}
- err := c.Conn.Close()
- c.t.decHostConnCount(c.cmKey)
- return err
}

// The connect method and the transport can both specify a TLS
@@ -1283,8 +1367,8 @@
return nil
}

-func (t *Transport) dialConn(ctx context.Context, cm connectMethod) (*persistConn, error) {
- pconn := &persistConn{
+func (t *Transport) dialConn(ctx context.Context, cm connectMethod) (pconn *persistConn, err error) {
+ pconn = &persistConn{
t: t,
cacheKey: cm.key(),
reqch: make(chan requestAndChan, 1),
@@ -1423,9 +1507,6 @@
}
}

- if t.MaxConnsPerHost > 0 {
- pconn.conn = &connCloseListener{Conn: pconn.conn, t: t, cmKey: pconn.cacheKey}
- }
pconn.br = bufio.NewReaderSize(pconn, t.readBufferSize())
pconn.bw = bufio.NewWriterSize(persistConnWriter{pconn}, t.writeBufferSize())

@@ -1631,7 +1712,7 @@
return pc.canceledErr
}

-// isReused reports whether this connection is in a known broken state.
+// isReused reports whether this connection has been used before.
func (pc *persistConn) isReused() bool {
pc.mu.Lock()
r := pc.reused
@@ -2119,10 +2200,12 @@
// but the server has already replied. In this case, we don't
// want to wait too long, and we want to return false so this
// connection isn't re-used.
+ t := time.NewTimer(maxWriteWaitBeforeConnReuse)
+ defer t.Stop()
select {
case err := <-pc.writeErrCh:
return err == nil
- case <-time.After(maxWriteWaitBeforeConnReuse):
+ case <-t.C:
return false
}
}
@@ -2374,10 +2457,10 @@
pc.broken = true
if pc.closed == nil {
pc.closed = err
- if pc.alt != nil {
- // Clean up any host connection counting.
- pc.t.decHostConnCount(pc.cacheKey)
- } else {
+ pc.t.decConnsPerHost(pc.cacheKey)
+ // Close HTTP/1 (pc.alt == nil) connection.
+ // HTTP/2 closes its connection itself.
+ if pc.alt == nil {
if err != errCallerOwnsConn {
pc.conn.Close()
}
diff --git a/src/net/http/transport_test.go b/src/net/http/transport_test.go
index 2b58e1d..ef89c88 100644
--- a/src/net/http/transport_test.go
+++ b/src/net/http/transport_test.go
@@ -655,13 +655,17 @@

expected := int32(tr.MaxConnsPerHost)
if dialCnt != expected {
- t.Errorf("Too many dials (%s): %d", scheme, dialCnt)
+ t.Errorf("round 1: too many dials (%s): %d != %d", scheme, dialCnt, expected)
}
if gotConnCnt != expected {
- t.Errorf("Too many get connections (%s): %d", scheme, gotConnCnt)
+ t.Errorf("round 1: too many get connections (%s): %d != %d", scheme, gotConnCnt, expected)
}
if ts.TLS != nil && tlsHandshakeCnt != expected {
- t.Errorf("Too many tls handshakes (%s): %d", scheme, tlsHandshakeCnt)
+ t.Errorf("round 1: too many tls handshakes (%s): %d != %d", scheme, tlsHandshakeCnt, expected)
+ }
+
+ if t.Failed() {
+ t.FailNow()
}

(<-connCh).Close()
@@ -670,13 +674,13 @@
doReq()
expected++
if dialCnt != expected {
- t.Errorf("Too many dials (%s): %d", scheme, dialCnt)
+ t.Errorf("round 2: too many dials (%s): %d", scheme, dialCnt)
}
if gotConnCnt != expected {
- t.Errorf("Too many get connections (%s): %d", scheme, gotConnCnt)
+ t.Errorf("round 2: too many get connections (%s): %d != %d", scheme, gotConnCnt, expected)
}
if ts.TLS != nil && tlsHandshakeCnt != expected {
- t.Errorf("Too many tls handshakes (%s): %d", scheme, tlsHandshakeCnt)
+ t.Errorf("round 2: too many tls handshakes (%s): %d != %d", scheme, tlsHandshakeCnt, expected)
}
}

@@ -2795,8 +2799,8 @@
<-didRead
}

- if got := tr.IdleConnChMapSizeForTesting(); got != 0 {
- t.Fatalf("ForDisableKeepAlives = %v, map size = %d; want 0", disableKeep, got)
+ if got := tr.IdleConnWaitMapSizeForTesting(); got != 0 {
+ t.Fatalf("for DisableKeepAlives = %v, map size = %d; want 0", disableKeep, got)
}
}
}
@@ -3378,7 +3382,7 @@
}
wantIdle("after second put", 0)

- tr.RequestIdleConnChForTesting() // should toggle the transport out of idle mode
+ tr.QueueForIdleConnForTesting() // should toggle the transport out of idle mode
if tr.IsIdleForTesting() {
t.Error("shouldn't be idle after RequestIdleConnChForTesting")
}
@@ -3802,8 +3806,8 @@
ln := newLocalListener(t)
defer ln.Close()

- handledPendingDial := make(chan bool, 1)
- SetPendingDialHooks(nil, func() { handledPendingDial <- true })
+ var wg sync.WaitGroup
+ SetPendingDialHooks(func() { wg.Add(1) }, wg.Done)
defer SetPendingDialHooks(nil, nil)

testDone := make(chan struct{})
@@ -3873,7 +3877,7 @@

doReturned <- true
<-madeRoundTripper
- <-handledPendingDial
+ wg.Wait()
}

func TestTransportReuseConnection_Gzip_Chunked(t *testing.T) {

To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

Gerrit-Project: go
Gerrit-Branch: master
Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
Gerrit-Change-Number: 184262
Gerrit-PatchSet: 1
Gerrit-Owner: Russ Cox <r...@golang.org>
Gerrit-Reviewer: Russ Cox <r...@golang.org>
Gerrit-MessageType: newchange

Gobot Gobot (Gerrit)

unread,
Jun 29, 2019, 9:37:23 AM6/29/19
to Russ Cox, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

TryBots beginning. Status page: https://farmer.golang.org/try?commit=909c6df1

View Change

    To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

    Gerrit-Project: go
    Gerrit-Branch: master
    Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
    Gerrit-Change-Number: 184262
    Gerrit-PatchSet: 1
    Gerrit-Owner: Russ Cox <r...@golang.org>
    Gerrit-Reviewer: Russ Cox <r...@golang.org>
    Gerrit-CC: Gobot Gobot <go...@golang.org>
    Gerrit-Comment-Date: Sat, 29 Jun 2019 13:37:19 +0000
    Gerrit-HasComments: No
    Gerrit-Has-Labels: No
    Gerrit-MessageType: comment

    Russ Cox (Gerrit)

    unread,
    Jun 29, 2019, 9:40:06 AM6/29/19
    to Russ Cox, goph...@pubsubhelper.golang.org, Bryan C. Mills, Gobot Gobot, golang-co...@googlegroups.com

    R=bcmills
    I know you're not familiar with net/http but you're good at reviewing concurrency. :-)

    View Change

      To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

      Gerrit-Project: go
      Gerrit-Branch: master
      Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
      Gerrit-Change-Number: 184262
      Gerrit-PatchSet: 1
      Gerrit-Owner: Russ Cox <r...@golang.org>
      Gerrit-Reviewer: Bryan C. Mills <bcm...@google.com>
      Gerrit-Reviewer: Russ Cox <r...@golang.org>
      Gerrit-CC: Gobot Gobot <go...@golang.org>
      Gerrit-Comment-Date: Sat, 29 Jun 2019 13:40:01 +0000

      Gobot Gobot (Gerrit)

      unread,
      Jun 29, 2019, 9:43:58 AM6/29/19
      to Russ Cox, goph...@pubsubhelper.golang.org, Bryan C. Mills, golang-co...@googlegroups.com

      Build is still in progress...
      This change failed on linux-amd64-race:
      See https://storage.googleapis.com/go-build-log/909c6df1/linux-amd64-race_ed89b901.log

      Other builds still in progress; subsequent failure notices suppressed until final report. Consult https://build.golang.org/ to see whether they are new failures. Keep in mind that TryBots currently test *exactly* your git commit, without rebasing. If your commit's git parent is old, the failure might've already been fixed.

      View Change

        To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

        Gerrit-Project: go
        Gerrit-Branch: master
        Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
        Gerrit-Change-Number: 184262
        Gerrit-PatchSet: 1
        Gerrit-Owner: Russ Cox <r...@golang.org>
        Gerrit-Reviewer: Bryan C. Mills <bcm...@google.com>
        Gerrit-Reviewer: Russ Cox <r...@golang.org>
        Gerrit-CC: Gobot Gobot <go...@golang.org>
        Gerrit-Comment-Date: Sat, 29 Jun 2019 13:43:55 +0000

        Gobot Gobot (Gerrit)

        unread,
        Jun 29, 2019, 9:47:25 AM6/29/19
        to Russ Cox, goph...@pubsubhelper.golang.org, Bryan C. Mills, golang-co...@googlegroups.com

        1 of 21 TryBots failed:
        Failed on linux-amd64-race: https://storage.googleapis.com/go-build-log/909c6df1/linux-amd64-race_ed89b901.log

        Consult https://build.golang.org/ to see whether they are new failures. Keep in mind that TryBots currently test *exactly* your git commit, without rebasing. If your commit's git parent is old, the failure might've already been fixed.

        Patch set 1:TryBot-Result -1

        View Change

          To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

          Gerrit-Project: go
          Gerrit-Branch: master
          Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
          Gerrit-Change-Number: 184262
          Gerrit-PatchSet: 1
          Gerrit-Owner: Russ Cox <r...@golang.org>
          Gerrit-Reviewer: Bryan C. Mills <bcm...@google.com>
          Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
          Gerrit-Reviewer: Russ Cox <r...@golang.org>
          Gerrit-Comment-Date: Sat, 29 Jun 2019 13:47:22 +0000
          Gerrit-HasComments: No
          Gerrit-Has-Labels: Yes
          Gerrit-MessageType: comment

          Russ Cox (Gerrit)

          unread,
          Jul 1, 2019, 10:46:27 AM7/1/19
          to Russ Cox, goph...@pubsubhelper.golang.org, Gobot Gobot, Bryan C. Mills, golang-co...@googlegroups.com

          Uploaded patch set 2.

          View Change

            To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

            Gerrit-Project: go
            Gerrit-Branch: master
            Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
            Gerrit-Change-Number: 184262
            Gerrit-PatchSet: 2
            Gerrit-Owner: Russ Cox <r...@golang.org>
            Gerrit-Reviewer: Bryan C. Mills <bcm...@google.com>
            Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
            Gerrit-Reviewer: Russ Cox <r...@golang.org>
            Gerrit-Comment-Date: Mon, 01 Jul 2019 14:46:23 +0000

            Russ Cox (Gerrit)

            unread,
            Jul 1, 2019, 10:46:28 AM7/1/19
            to Russ Cox, Bryan C. Mills, Gobot Gobot, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

            Russ Cox uploaded patch set #2 to this change.

            View Change

            4 files changed, 388 insertions(+), 280 deletions(-)

            To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

            Gerrit-Project: go
            Gerrit-Branch: master
            Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
            Gerrit-Change-Number: 184262
            Gerrit-PatchSet: 2
            Gerrit-Owner: Russ Cox <r...@golang.org>
            Gerrit-Reviewer: Bryan C. Mills <bcm...@google.com>
            Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
            Gerrit-Reviewer: Russ Cox <r...@golang.org>
            Gerrit-MessageType: newpatchset

            Gobot Gobot (Gerrit)

            unread,
            Jul 1, 2019, 10:46:42 AM7/1/19
            to Russ Cox, goph...@pubsubhelper.golang.org, Bryan C. Mills, golang-co...@googlegroups.com

            TryBots beginning. Status page: https://farmer.golang.org/try?commit=7796fba9

            View Change

              To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

              Gerrit-Project: go
              Gerrit-Branch: master
              Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
              Gerrit-Change-Number: 184262
              Gerrit-PatchSet: 2
              Gerrit-Owner: Russ Cox <r...@golang.org>
              Gerrit-Reviewer: Bryan C. Mills <bcm...@google.com>
              Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
              Gerrit-Reviewer: Russ Cox <r...@golang.org>
              Gerrit-Comment-Date: Mon, 01 Jul 2019 14:46:37 +0000

              Gobot Gobot (Gerrit)

              unread,
              Jul 1, 2019, 10:56:43 AM7/1/19
              to Russ Cox, goph...@pubsubhelper.golang.org, Bryan C. Mills, golang-co...@googlegroups.com

              TryBots are happy.

              Patch set 2:TryBot-Result +1

              View Change

                To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

                Gerrit-Project: go
                Gerrit-Branch: master
                Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
                Gerrit-Change-Number: 184262
                Gerrit-PatchSet: 2
                Gerrit-Owner: Russ Cox <r...@golang.org>
                Gerrit-Reviewer: Bryan C. Mills <bcm...@google.com>
                Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
                Gerrit-Reviewer: Russ Cox <r...@golang.org>
                Gerrit-Comment-Date: Mon, 01 Jul 2019 14:56:40 +0000

                Bryan C. Mills (Gerrit)

                unread,
                Jul 1, 2019, 6:39:42 PM7/1/19
                to Russ Cox, goph...@pubsubhelper.golang.org, Bryan C. Mills, Gobot Gobot, golang-co...@googlegroups.com

                View Change

                16 comments:

                  • 		n := tr.IdleConnCountForTesting("http", addr)

                  • 		if n > 1 {


                  • t.Fatalf("idle count for %q after 2 gets = %d, want ≤ 1", addr, n)
                    }

                  • A comment might be helpful here. It looks like the waitCondition here is waiting for the key count to drop to 1, but we're expecting the number of idle connections overall to be exactly 1 right away — are those assumptions really compatible?

                • File src/net/http/transport.go:

                  • Patch Set #2, Line 843: delete(t.idleConnWait, key)

                    I notice that the wantConnQueue implementation is free of steady-state allocations due to the use of double-buffering, but when we delete the wantConnQueue we end up discarding those buffers.

                    Would it make sense to retain empty queues, or is the optimization unnecessary for queues that become idle?

                  • Patch Set #2, Line 875: only for HTTP/1 (pconn.alt == nil)

                    Why only for HTTP/1? (Presumably because the connection may not actually be idle?)

                    I don't see anything about HTTP/2 in the documentation for IdleConnTimeout. Should this behavior be mentioned there? (Or should we track how long the connection is actually-idle and close it even for HTTP/2?)

                  • Patch Set #2, Line 888:

                    queueForIdleConn reports whether


                  • // it successfully delivered an already-idle connection.

                  • A bit more structure in the return value would be helpful: the boolean on the caller side is easy to misread as “did I put the wantConn in the queue?” rather than “did I already send the wantConn a connection?”.

                    I notice that the case where t.queueForIdleConn returns true corresponds to a true return from w.canDeliver, so w.ready should be closed. Could we omit the boolean return and instead use a select-with-default on w.ready on the caller side? That would also emphasize the similarity to the `case <-w.ready:` block at line 1180.

                  • Patch Set #2, Line 897: t.closeIdle = false

                    I think I may be missing something here. t.closeIdle is set to true in CloseIdleConnections, which (if I understand correctly) is intended to also shut down active connections as they become idle.

                    Assigning it here implies that connections that become idle in the future will not be closed, even if they are connections to different hosts — and even if the number of connections to a given host remains higher than the high-water mark of active requests to that host into the future.

                    It may be that I'm inferring more behavior for CloseIdleConnections than what it is actually intended to do, but more commentary about this mode switch would be helpful either way.

                  • Patch Set #2, Line 899:

                    	if w == nil {


                  • // Happens in test hook.

                  • 		return false
                    }

                    This special case is surprising.

                    It looks like the test only calls QueueForIdleConnForTesting to verify the side-effect of setting closeIdle to false, but then verifies it using another test hook (IsIdleForTesting) rather than any user-observable effect. That part of the test seems like merely a change-detector: we could just as easily replace `QueueForIdleConnForTesting` with a `SetIdleForTesting` and gain pretty much the same information.

                    A more interesting test might be to have the test hook pass a non-nil *wantConn for one host, followed by a PutIdleTestConn for the same or a different host. That could at least exercise the idleConnWait registration logic.

                  • Patch Set #2, Line 1058:

                  • // waiting reports whether w is still waiting for an answer (connection or error).

                  • func (w *wantConn) waiting() bool {
                    w.mu.Lock()


                  • waiting := w.pc == nil && w.err == nil

                  • 	w.mu.Unlock()
                    return waiting
                    }

                    (nit)

                    The `ready` channel is closed when w.pc and w.err are set, so it seems like a select-with-default would provide the same benefit as the `waiting` method with a bit less code overall.

                  • Patch Set #2, Line 1066: attempts to deliver pc,

                    `canDeliver` to me suggests a function that queries conditions without actually performing the delivery. A name like `tryDeliver` or just `deliver` would help me more in understanding the code.

                    (I would probably choose `tryDeliver`, since it's too easy to miss the return-value check otherwise — more evidence for #20803?)

                  • Patch Set #2, Line 1091: pc := w.pc

                    (nit)

                    I don't think there is an actual bug here because t.getConn only calls w.cancel after the last read to w.pc, but I would feel better about it if we nilled out w.pc here to emphasize that a use-after-put cannot occur.

                  • Patch Set #2, Line 1100: // A wantConnQueue is a queue of wantConns.

                    This is a nice queue, but it took me a minute to realize that it was double-buffered rather than a deque, which made the headPos variable a bit of a mystery until I read through the methods.

                    Perhaps mention the double-buffering in the doc comment?

                  • Patch Set #2, Line 1128: w := q.head[q.headPos]

                    I'm a little worried about the queue dragging garbage along if the caller ends up closing connections somewhat frequently. Would it be reasonable to zero out q.head[q.headPos] after reading it?

                  • Patch Set #2, Line 1252: pc, err := t.dialConn(w.ctx, w.cm)

                    Probably doesn't need to be addressed in this CL, but there may be a sort of live-lock possible here.

                    If dialing latency is higher than per-request latency and requests are made with a timeout close to that latency, then it is possible that we will have a continuous sequence of requests that each Dial and then time out, where we could instead start a Dial at the beginning and have a working connection (and make successful requests on it) after only a few failures.

                    Similarly, it seems possible for each request to start dialing a connection, receive an idle connection from the previous request, and then cancel its dial, wasting a lot of CPU/network resources in the steady state.

                  • Patch Set #2, Line 1286: if w.waiting() {

                    This call to w.waiting is just an optimization, not necessary for correctness, right? A comment might help either way.

                    (Otherwise, this seems like a race resembling the second one in the commit message.)

                    As far as I can tell, the failure mode of the race is that we dial an extra connection that we don't end up needing after all (and have to put it back), or that we cancel a dial for the first waiter when we should have kept it going for the second one (and have to restart the dial for the next connection). Both of those seem fairly benign.

                  • Patch Set #2, Line 1295: t.connsPerHostWait[key] = q

                    A comment here would be helpful. (I think this is necessary because q is a value rather than a pointer, with a pointer implicitly taken for the call at line 1285, but the fact that q is a value isn't locally obvious.)

                • File src/net/http/transport_test.go:

                To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

                Gerrit-Project: go
                Gerrit-Branch: master
                Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
                Gerrit-Change-Number: 184262
                Gerrit-PatchSet: 2
                Gerrit-Owner: Russ Cox <r...@golang.org>
                Gerrit-Reviewer: Bryan C. Mills <bcm...@google.com>
                Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
                Gerrit-Reviewer: Russ Cox <r...@golang.org>
                Gerrit-Comment-Date: Mon, 01 Jul 2019 22:39:38 +0000
                Gerrit-HasComments: Yes
                Gerrit-Has-Labels: No
                Gerrit-MessageType: comment

                Russ Cox (Gerrit)

                unread,
                Jul 2, 2019, 1:38:42 PM7/2/19
                to Russ Cox, goph...@pubsubhelper.golang.org, Bryan C. Mills, Gobot Gobot, golang-co...@googlegroups.com

                Uploaded patch set 3.

                (16 comments)

                View Change

                16 comments:

                  • A comment here would be useful: why aren't the individual test-cases responsible for satisfying this […]

                    t.MaxConnsPerHost > 0 just means we are doing accounting at all.
                    And if we are doing accounting then we have to account for the
                    persistConn being created below. Added comment.

                • File src/net/http/serve_test.go:

                  • 		n := tr.IdleConnCountForTesting("http", addr)
                    if n > 1 {


                  • t.Fatalf("idle count for %q after 2 gets = %d, want ≤ 1", addr, n)
                    }

                  • I notice that the wantConnQueue implementation is free of steady-state allocations due to the use of […]

                    In general this code seems not to care about zero-allocation.

                    I wanted to do something that would work well for an extreme case where there are many requests queued up on a small per-host limit, so I was careful in how I wrote the wantConnQueue, but I think we don't need to recycle them here, at least not without evidence.

                  • Why only for HTTP/1? (Presumably because the connection may not actually be idle?) […]

                    Commented. HTTP/2 is taking care of its own idle timeouts.

                  • queueForIdleConn reports whether


                  • // it successfully delivered an already-idle connection.

                  • A bit more structure in the return value would be helpful: the boolean on the caller side is easy to […]

                    The return value is necessary: the existing code seems to want to distinguish for tracing purposes fetching a conn from the idle queue immediately vs having one handed to you as a result of queueing for the next one to become ready. (The structs being passed to trace.GotConn are filled out differently.)

                    I am not sure what you meant by "a bit more structure" but I named the return value.
                    There is only one call site and it only needs this one-bit result, so I am not sure what more structure is needed.

                  • I think I may be missing something here. t. […]

                    CloseIdleConnections is documented as "close the currently idle connections" but also seems to mean "and close connections as they become idle, in the absence of additional uses." I'm not inclined to add that clarification to the public docs - I don't know if it is meant to be guaranteed. But given the existence of that mode bit, it's important to clear here or we'll never get an idle connection again. (There's a test specifically for this.) I added a comment.

                  • 	if w == nil {


                  • // Happens in test hook.

                  • 		return false
                    }

                    This special case is surprising. […]

                    Yes, it's a weird test. It is checking only the presence of the closeIdle = false line. But that was deemed important, and I don't feel comfortable second-guessing that today. I'm inclined to leave the tests as they are.

                  • // waiting reports whether w is still waiting for an answer (connection or error).


                  • func (w *wantConn) waiting() bool {
                    w.mu.Lock()

                  • 	waiting := w.pc == nil && w.err == nil

                  • 	w.mu.Unlock()
                    return waiting
                    }

                    (nit) […]

                    I like being able to call w.waiting() instead of working out that logic at the call site, but I replaced the implementation as you suggested.

                  • `canDeliver` to me suggests a function that queries conditions without actually performing the deliv […]

                    Renamed to tryDeliver.

                  • (nit) […]

                    Done

                  • This is a nice queue, but it took me a minute to realize that it was double-buffered rather than a d […]

                    Done

                  • I'm a little worried about the queue dragging garbage along if the caller ends up closing connection […]

                    Done

                  • Probably doesn't need to be addressed in this CL, but there may be a sort of live-lock possible here […]

                    You are right about the behavior, but I wouldn't characterize this as livelock. The model a client should have is that it might be the only active client and need to do everything itself, including connection setup (dial).

                    Timeouts have to be high enough to cover that baseline. If timeouts are consistently not high enough to cover that case, then operations consistently time out. This seems working as intended. Worst-case CPU/network cost is also derived from that baseline.

                    Connection pooling is an optimization that makes requests run faster, meaning in some cases you can get away with lower timeouts and use less CPU/network. But it's only an optimization, not a guarantee. So I don't think it's much of a bug that whether the program survives too-low timeouts or consumes lower-than-worst-case CPU/network depends on the exact luck of the pooling.

                    Obviously if there were a way to do better across the board, that would be nice, but the cost and complexity of doing so has to be counter-balanced by a benefit in actual practice, and I am skeptical that this comes up in practice too much.

                    I don't mean to discount the problem with predictability. That's hard in essentially every system with a cache.

                  • This call to w. […]

                    It's benign but still unnecessary and externally visible (causes network traffic) to kick off a dial when you know the operation that wanted it no longer cares. Added a comment.

                  • Done

                To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

                Gerrit-Project: go
                Gerrit-Branch: master
                Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
                Gerrit-Change-Number: 184262
                Gerrit-PatchSet: 3
                Gerrit-Owner: Russ Cox <r...@golang.org>
                Gerrit-Reviewer: Bryan C. Mills <bcm...@google.com>
                Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
                Gerrit-Reviewer: Russ Cox <r...@golang.org>
                Gerrit-Comment-Date: Tue, 02 Jul 2019 17:38:35 +0000
                Gerrit-HasComments: Yes
                Gerrit-Has-Labels: No
                Comment-In-Reply-To: Bryan C. Mills <bcm...@google.com>
                Gerrit-MessageType: comment

                Russ Cox (Gerrit)

                unread,
                Jul 2, 2019, 1:38:42 PM7/2/19
                to Russ Cox, Bryan C. Mills, Gobot Gobot, goph...@pubsubhelper.golang.org, golang-co...@googlegroups.com

                Russ Cox uploaded patch set #3 to this change.

                View Change

                4 files changed, 417 insertions(+), 288 deletions(-)

                To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

                Gerrit-Project: go
                Gerrit-Branch: master
                Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
                Gerrit-Change-Number: 184262
                Gerrit-PatchSet: 3
                Gerrit-Owner: Russ Cox <r...@golang.org>
                Gerrit-Reviewer: Bryan C. Mills <bcm...@google.com>
                Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
                Gerrit-Reviewer: Russ Cox <r...@golang.org>
                Gerrit-MessageType: newpatchset

                Gobot Gobot (Gerrit)

                unread,
                Jul 2, 2019, 1:39:55 PM7/2/19
                to Russ Cox, goph...@pubsubhelper.golang.org, Bryan C. Mills, golang-co...@googlegroups.com

                TryBots beginning. Status page: https://farmer.golang.org/try?commit=b1d6afd0

                View Change

                  To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

                  Gerrit-Project: go
                  Gerrit-Branch: master
                  Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
                  Gerrit-Change-Number: 184262
                  Gerrit-PatchSet: 3
                  Gerrit-Owner: Russ Cox <r...@golang.org>
                  Gerrit-Reviewer: Bryan C. Mills <bcm...@google.com>
                  Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
                  Gerrit-Reviewer: Russ Cox <r...@golang.org>
                  Gerrit-Comment-Date: Tue, 02 Jul 2019 17:39:50 +0000

                  Gobot Gobot (Gerrit)

                  unread,
                  Jul 2, 2019, 1:50:32 PM7/2/19
                  to Russ Cox, goph...@pubsubhelper.golang.org, Bryan C. Mills, golang-co...@googlegroups.com

                  TryBots are happy.

                  Patch set 3:TryBot-Result +1

                  View Change

                    To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

                    Gerrit-Project: go
                    Gerrit-Branch: master
                    Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
                    Gerrit-Change-Number: 184262
                    Gerrit-PatchSet: 3
                    Gerrit-Owner: Russ Cox <r...@golang.org>
                    Gerrit-Reviewer: Bryan C. Mills <bcm...@google.com>
                    Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
                    Gerrit-Reviewer: Russ Cox <r...@golang.org>
                    Gerrit-Comment-Date: Tue, 02 Jul 2019 17:50:29 +0000

                    Bryan C. Mills (Gerrit)

                    unread,
                    Jul 2, 2019, 4:55:44 PM7/2/19
                    to Russ Cox, goph...@pubsubhelper.golang.org, Bryan C. Mills, Gobot Gobot, golang-co...@googlegroups.com

                    Patch set 3:Code-Review +2

                    View Change

                    1 comment:


                      • // it successfully delivered an already-idle connection.

                      • The return value is necessary: the existing code seems to want to distinguish for tracing purposes fetching a conn from the idle queue immediately vs having one handed to you as a result of queueing for the next one to become ready. (The structs being passed to trace.GotConn are filled out differently.)

                        I had noticed that, but I'm not sure that it makes sense to distinguish “reused immediately” from “reused within epsilon of immediately”. The important distinction between the two logging cases seems to be “did we even consider initiating a dial?”, and if the answer to that is “no” then I'm not sure there is any value in discriminating them further.


                      • > I am not sure what you meant by "a bit more structure" but I named the return value.
                        > There is only one call site and it only needs this one-bit result, so I am not sure what more structure is needed.

                      • More visibility at the call site: perhaps just a more descriptive function name, or a more descriptive variable name?

                    To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

                    Gerrit-Project: go
                    Gerrit-Branch: master
                    Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
                    Gerrit-Change-Number: 184262
                    Gerrit-PatchSet: 3
                    Gerrit-Owner: Russ Cox <r...@golang.org>
                    Gerrit-Reviewer: Bryan C. Mills <bcm...@google.com>
                    Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
                    Gerrit-Reviewer: Russ Cox <r...@golang.org>
                    Gerrit-Comment-Date: Tue, 02 Jul 2019 20:55:39 +0000
                    Gerrit-HasComments: Yes
                    Gerrit-Has-Labels: Yes
                    Comment-In-Reply-To: Bryan C. Mills <bcm...@google.com>
                    Comment-In-Reply-To: Russ Cox <r...@golang.org>
                    Gerrit-MessageType: comment

                    Michael Fraenkel (Gerrit)

                    unread,
                    Jul 2, 2019, 6:42:58 PM7/2/19
                    to Russ Cox, goph...@pubsubhelper.golang.org, Bryan C. Mills, Gobot Gobot, golang-co...@googlegroups.com


                    View Change

                    1 comment:

                    To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

                    Gerrit-Project: go
                    Gerrit-Branch: master
                    Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
                    Gerrit-Change-Number: 184262
                    Gerrit-PatchSet: 3
                    Gerrit-Owner: Russ Cox <r...@golang.org>
                    Gerrit-Reviewer: Bryan C. Mills <bcm...@google.com>
                    Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
                    Gerrit-Reviewer: Russ Cox <r...@golang.org>
                    Gerrit-CC: Michael Fraenkel <michael....@gmail.com>
                    Gerrit-Comment-Date: Tue, 02 Jul 2019 22:42:55 +0000
                    Gerrit-HasComments: Yes
                    Gerrit-Has-Labels: No
                    Gerrit-MessageType: comment

                    Michael Fraenkel (Gerrit)

                    unread,
                    Jul 5, 2019, 7:43:36 AM7/5/19
                    to Russ Cox, goph...@pubsubhelper.golang.org, Bryan C. Mills, Gobot Gobot, golang-co...@googlegroups.com

                    View Change

                    1 comment:

                      • I believe this now has the potential of dialing when tryDeliver() is applying a connection but hasn' […]

                        Never mind. I see it works because of when this is called.

                    To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

                    Gerrit-Project: go
                    Gerrit-Branch: master
                    Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
                    Gerrit-Change-Number: 184262
                    Gerrit-PatchSet: 3
                    Gerrit-Owner: Russ Cox <r...@golang.org>
                    Gerrit-Reviewer: Bryan C. Mills <bcm...@google.com>
                    Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
                    Gerrit-Reviewer: Russ Cox <r...@golang.org>
                    Gerrit-CC: Michael Fraenkel <michael....@gmail.com>
                    Gerrit-Comment-Date: Fri, 05 Jul 2019 11:43:32 +0000
                    Gerrit-HasComments: Yes
                    Gerrit-Has-Labels: No
                    Comment-In-Reply-To: Michael Fraenkel <michael....@gmail.com>
                    Gerrit-MessageType: comment

                    Russ Cox (Gerrit)

                    unread,
                    Jul 8, 2019, 10:31:40 AM7/8/19
                    to Russ Cox, goph...@pubsubhelper.golang.org, Michael Fraenkel, Bryan C. Mills, Gobot Gobot, golang-co...@googlegroups.com

                    View Change

                    1 comment:


                      • // it successfully delivered an already-idle connection.

                      • > The return value is necessary: the existing code seems to want to distinguish for tracing purposes […]

                        It's not up to us at this moment to change what the code is tracking and reporting, which is "did this connection spend any time at all in the idle pool?".

                        The declared variable name at the call site should help with visiblity I think.

                    To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

                    Gerrit-Project: go
                    Gerrit-Branch: master
                    Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
                    Gerrit-Change-Number: 184262
                    Gerrit-PatchSet: 3
                    Gerrit-Owner: Russ Cox <r...@golang.org>
                    Gerrit-Reviewer: Bryan C. Mills <bcm...@google.com>
                    Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
                    Gerrit-Reviewer: Russ Cox <r...@golang.org>
                    Gerrit-CC: Michael Fraenkel <michael....@gmail.com>
                    Gerrit-Comment-Date: Mon, 08 Jul 2019 14:31:37 +0000
                    Gerrit-HasComments: Yes
                    Gerrit-Has-Labels: No

                    Russ Cox (Gerrit)

                    unread,
                    Jul 8, 2019, 10:31:45 AM7/8/19
                    to Russ Cox, goph...@pubsubhelper.golang.org, golang-...@googlegroups.com, Michael Fraenkel, Bryan C. Mills, Gobot Gobot, golang-co...@googlegroups.com

                    Russ Cox merged this change.

                    View Change

                    Approvals: Bryan C. Mills: Looks good to me, approved Russ Cox: Run TryBots Gobot Gobot: TryBots succeeded
                    Reviewed-on: https://go-review.googlesource.com/c/go/+/184262
                    Run-TryBot: Russ Cox <r...@golang.org>
                    TryBot-Result: Gobot Gobot <go...@golang.org>
                    Reviewed-by: Bryan C. Mills <bcm...@google.com>

                    ---
                    M src/net/http/export_test.go
                    M src/net/http/serve_test.go
                    M src/net/http/transport.go
                    M src/net/http/transport_test.go
                    4 files changed, 417 insertions(+), 288 deletions(-)

                    diff --git a/src/net/http/export_test.go b/src/net/http/export_test.go
                    index f0dfa8c..d265cd3 100644
                    --- a/src/net/http/export_test.go
                    +++ b/src/net/http/export_test.go
                    @@ -166,30 +166,40 @@

                    return 0
                    }

                    -func (t *Transport) IdleConnChMapSizeForTesting() int {
                    +func (t *Transport) IdleConnWaitMapSizeForTesting() int {
                    t.idleMu.Lock()
                    defer t.idleMu.Unlock()
                    - return len(t.idleConnCh)
                    + return len(t.idleConnWait)
                    }

                    func (t *Transport) IsIdleForTesting() bool {
                    t.idleMu.Lock()
                    defer t.idleMu.Unlock()
                    - return t.wantIdle
                    + return t.closeIdle
                    }

                    -func (t *Transport) RequestIdleConnChForTesting() {
                    - t.getIdleConnCh(connectMethod{nil, "http", "example.com", false})
                    +func (t *Transport) QueueForIdleConnForTesting() {
                    + t.queueForIdleConn(nil)
                    }

                    +// PutIdleTestConn reports whether it was able to insert a fresh
                    +// persistConn for scheme, addr into the idle connection pool.

                    func (t *Transport) PutIdleTestConn(scheme, addr string) bool {
                    c, _ := net.Pipe()
                    key := connectMethodKey{"", scheme, addr, false}
                    - select {
                    - case <-t.incHostConnCount(key):
                    - default:
                    - return false
                    +
                    + if t.MaxConnsPerHost > 0 {
                    +		// Transport is tracking conns-per-host.
                    + // Increment connection count to account
                    + // for new persistConn created below.

                    + t.connsPerHostMu.Lock()
                    + if t.connsPerHost == nil {
                    + t.connsPerHost = make(map[connectMethodKey]int)
                    + }
                    + t.connsPerHost[key]++
                    + t.connsPerHostMu.Unlock()
                    }
                    +
                    return t.tryPutIdleConn(&persistConn{
                    t: t,
                    conn: c, // dummy
                    diff --git a/src/net/http/serve_test.go b/src/net/http/serve_test.go
                    index e7ed15c..61adda2 100644

                    --- a/src/net/http/serve_test.go
                    +++ b/src/net/http/serve_test.go
                    @@ -2407,6 +2407,7 @@
                    }

                    // See issues 8209 and 8414.
                    +// Both issues involved panics in the implementation of TimeoutHandler.
                    func TestTimeoutHandlerRaceHeader(t *testing.T) {
                    setParallel(t)
                    defer afterTest(t)
                    @@ -2434,7 +2435,9 @@
                    defer func() { <-gate }()
                    res, err := c.Get(ts.URL)
                    if err != nil {
                    - t.Error(err)
                    + // We see ECONNRESET from the connection occasionally,
                    + // and that's OK: this test is checking that the server does not panic.
                    + t.Log(err)
                    return
                    }
                    defer res.Body.Close()
                    @@ -5507,19 +5510,23 @@

                    if a1 != a2 {
                    t.Fatal("expected first two requests on same connection")
                    }
                    -	var idle0 int
                    - if !waitCondition(2*time.Second, 10*time.Millisecond, func() bool {
                    - idle0 = tr.IdleConnKeyCountForTesting()
                    - return idle0 == 1
                    - }) {
                    - t.Fatalf("idle count before SetKeepAlivesEnabled called = %v; want 1", idle0)

                    + addr := strings.TrimPrefix(ts.URL, "http://")
                    +
                    +	// The two requests should have used the same connection,
                    + // and there should not have been a second connection that
                    + // was created by racing dial against reuse.
                    + // (The first get was completed when the second get started.)

                    + n := tr.IdleConnCountForTesting("http", addr)
                    + if n != 1 {
                    + t.Fatalf("idle count for %q after 2 gets = %d, want 1", addr, n)
                    }

                    +	// SetKeepAlivesEnabled should discard idle conns.
                    ts.Config.SetKeepAlivesEnabled(false)


                    var idle1 int
                    if !waitCondition(2*time.Second, 10*time.Millisecond, func() bool {
                    -		idle1 = tr.IdleConnKeyCountForTesting()
                    + idle1 = tr.IdleConnCountForTesting("http", addr)

                    return idle1 == 0
                    }) {
                    t.Fatalf("idle count after SetKeepAlivesEnabled called = %v; want 0", idle1)
                    diff --git a/src/net/http/transport.go b/src/net/http/transport.go
                    index 26f642a..2f9bdc2 100644
                    +	t.closeIdle = true // close newly idle connections

                    t.idleLRU = connLRU{}
                    t.idleMu.Unlock()
                    for _, conns := range m {
                    @@ -762,7 +745,7 @@
                    var (
                    errKeepAlivesDisabled = errors.New("http: putIdleConn: keep alives disabled")
                    errConnBroken = errors.New("http: putIdleConn: connection is in bad state")
                    - errWantIdle = errors.New("http: putIdleConn: CloseIdleConnections was called")
                    + errCloseIdle = errors.New("http: putIdleConn: CloseIdleConnections was called")
                    errTooManyIdle = errors.New("http: putIdleConn: too many idle connections")
                    errTooManyIdleHost = errors.New("http: putIdleConn: too many idle connections for host")
                    errCloseIdleConns = errors.New("http: CloseIdleConnections called")
                    @@ -821,29 +804,56 @@

                    return errConnBroken
                    }
                    pconn.markReused()
                    - key := pconn.cacheKey

                     	t.idleMu.Lock()
                    defer t.idleMu.Unlock()

                    - waitingDialer := t.idleConnCh[key]
                    - select {
                    - case waitingDialer <- pconn:
                    - // We're done with this pconn and somebody else is
                    - // currently waiting for a conn of this type (they're
                    - // actively dialing, but this conn is ready
                    - // first). Chrome calls this socket late binding. See
                    - // https://insouciant.org/tech/connection-management-in-chromium/
                    +	// HTTP/2 (pconn.alt != nil) connections do not come out of the idle list,
                    + // because multiple goroutines can use them simultaneously.
                    + // If this is an HTTP/2 connection being “returned,” we're done.
                    + if pconn.alt != nil && t.idleLRU.m[pconn] != nil {
                     		return nil
                    - default:
                    - if waitingDialer != nil {
                    - // They had populated this, but their dial won
                    - // first, so we can clean up this map entry.
                    - delete(t.idleConnCh, key)
                    + }
                    +

                    + // Deliver pconn to goroutine waiting for idle connection, if any.
                    +	// (They may be actively dialing, but this conn is ready first.
                    + // Chrome calls this socket late binding.
                    + // See https://insouciant.org/tech/connection-management-in-chromium/.)

                    + key := pconn.cacheKey
                    + if q, ok := t.idleConnWait[key]; ok {
                    + done := false
                    + if pconn.alt == nil {
                    + // HTTP/1.
                    + // Loop over the waiting list until we find a w that isn't done already, and hand it pconn.
                    + for q.len() > 0 {
                    + w := q.popFront()
                    +				if w.tryDeliver(pconn, nil) {

                    + done = true
                    + break
                    + }
                    + }
                    + } else {
                    + // HTTP/2.
                    + // Can hand the same pconn to everyone in the waiting list,
                    + // and we still won't be done: we want to put it in the idle
                    + // list unconditionally, for any future clients too.
                    + for q.len() > 0 {
                    + w := q.popFront()
                    +				w.tryDeliver(pconn, nil)

                    + }
                    + }
                    + if q.len() == 0 {
                    + delete(t.idleConnWait, key)
                    + } else {
                    + t.idleConnWait[key] = q
                    + }
                    + if done {
                    + return nil
                    }
                    }
                    - if t.wantIdle {
                    - return errWantIdle
                    +
                    + if t.closeIdle {
                    + return errCloseIdle
                    }
                    if t.idleConn == nil {
                    t.idleConn = make(map[connectMethodKey][]*persistConn)
                    @@ -864,71 +874,86 @@

                    oldest.close(errTooManyIdle)
                    t.removeIdleConnLocked(oldest)
                    }
                    - if t.IdleConnTimeout > 0 {
                    +
                    + // Set idle timer, but only for HTTP/1 (pconn.alt == nil).
                    +	// The HTTP/2 implementation manages the idle timer itself
                    + // (see idleConnTimeout in h2_bundle.go).

                    + if t.IdleConnTimeout > 0 && pconn.alt == nil {
                    if pconn.idleTimer != nil {
                    pconn.idleTimer.Reset(t.IdleConnTimeout)
                    } else {
                    - // idleTimer does not apply to HTTP/2
                    - if pconn.alt == nil {
                    - pconn.idleTimer = time.AfterFunc(t.IdleConnTimeout, pconn.closeConnIfStillIdle)
                    - }
                    + pconn.idleTimer = time.AfterFunc(t.IdleConnTimeout, pconn.closeConnIfStillIdle)
                    }
                    }
                    pconn.idleAt = time.Now()
                    return nil
                    }

                    -// getIdleConnCh returns a channel to receive and return idle
                    -// persistent connection for the given connectMethod.
                    -// It may return nil, if persistent connections are not being used.
                    -func (t *Transport) getIdleConnCh(cm connectMethod) chan *persistConn {
                    +// queueForIdleConn queues w to receive the next idle connection for w.cm.
                    +// As an optimization hint to the caller, queueForIdleConn reports whether
                    +// it successfully delivered an already-idle connection.
                    +func (t *Transport) queueForIdleConn(w *wantConn) (delivered bool) {
                    +	// Stop closing connections that become idle - we might want one.
                    + // (That is, undo the effect of t.CloseIdleConnections.)

                    + t.closeIdle = false
                    +
                    + if w == nil {
                    + // Happens in test hook.
                    + return false
                    }
                    +
                    + // Look for most recently-used idle connection.
                    + if list, ok := t.idleConn[w.key]; ok {
                    + stop := false
                    + delivered := false
                    + for len(list) > 0 && !stop {
                    + pconn := list[len(list)-1]
                    + if pconn.isBroken() {
                    + // persistConn.readLoop has marked the connection broken,
                    + // but Transport.removeIdleConn has not yet removed it from the idle list.
                    + // Drop on floor on behalf of Transport.removeIdleConn.
                    + list = list[:len(list)-1]
                    + continue
                    + }
                    +			delivered = w.tryDeliver(pconn, nil)
                    @@ -1015,20 +1040,147 @@
                    +	select {
                    + case <-w.ready:
                    + return false
                    + default:

                    + return true
                    + }
                    +}
                    +
                    +// tryDeliver attempts to deliver pc, err to w and reports whether it succeeded.
                    +func (w *wantConn) tryDeliver(pc *persistConn, err error) bool {

                    + w.mu.Lock()
                    + defer w.mu.Unlock()
                    +
                    + if w.pc != nil || w.err != nil {
                    + return false
                    + }
                    +
                    + w.pc = pc
                    + w.err = err
                    + if w.pc == nil && w.err == nil {
                    +		panic("net/http: internal error: misuse of tryDeliver")

                    + }
                    + close(w.ready)
                    + return true
                    +}
                    +
                    +// cancel marks w as no longer wanting a result (for example, due to cancellation).
                    +// If a connection has been delivered already, cancel returns it with t.putOrCloseIdleConn.
                    +func (w *wantConn) cancel(t *Transport, err error) {
                    + w.mu.Lock()
                    + if w.pc == nil && w.err == nil {
                    + close(w.ready) // catch misbehavior in future delivery
                    + }
                    + pc := w.pc
                    +	w.pc = nil

                    + w.err = err
                    + w.mu.Unlock()
                    +
                    + if pc != nil {
                    + t.putOrCloseIdleConn(pc)
                    + }
                    +}
                    +
                    +// A wantConnQueue is a queue of wantConns.
                    +type wantConnQueue struct {
                    +	// This is a queue, not a deque.
                    + // It is split into two stages - head[headPos:] and tail.
                    + // popFront is trivial (headPos++) on the first stage, and
                    + // pushBack is trivial (append) on the second stage.
                    + // If the first stage is empty, popFront can swap the
                    + // first and second stages to remedy the situation.
                    + //
                    + // This two-stage split is analogous to the use of two lists
                    + // in Okasaki's purely functional queue but without the
                    + // overhead of reversing the list when swapping stages.

                    + head []*wantConn
                    + headPos int
                    +	tail    []*wantConn
                    +}
                    +
                    +// len returns the number of items in the queue.
                    +func (q *wantConnQueue) len() int {
                    + return len(q.head) - q.headPos + len(q.tail)
                    +}
                    +
                    +// pushBack adds w to the back of the queue.
                    +func (q *wantConnQueue) pushBack(w *wantConn) {
                    + q.tail = append(q.tail, w)
                    +}
                    +
                    +// popFront removes and returns the w at the front of the queue.
                    +func (q *wantConnQueue) popFront() *wantConn {
                    + if q.headPos >= len(q.head) {
                    + if len(q.tail) == 0 {
                    + return nil
                    + }
                    + // Pick up tail as new head, clear tail.
                    + q.head, q.headPos, q.tail = q.tail, 0, q.head[:0]
                    + }
                    + w := q.head[q.headPos]
                    +	q.head[q.headPos] = nil
                    @@ -1037,108 +1189,44 @@
                    @@ -1146,81 +1234,102 @@
                    +	delivered := w.tryDeliver(pc, err)
                    +	// Can we hand this count to a goroutine still waiting to dial?
                    + // (Some goroutines on the wait list may have timed out or
                    + // gotten a connection another way. If they're all gone,
                    + // we don't want to kick off any spurious dial operations.)

                    + if q := t.connsPerHostWait[key]; q.len() > 0 {
                    + done := false
                    + for q.len() > 0 {
                    + w := q.popFront()
                    + if w.waiting() {
                    + go t.dialConnFor(w)
                    + done = true
                    + break
                    + }
                    }
                    - delete(t.connPerHostAvailable, cmKey)
                    + if q.len() == 0 {
                    + delete(t.connsPerHostWait, key)
                    + } else {
                    +			// q is a value (like a slice), so we have to store
                    + // the updated q back into the map.
                    @@ -1283,8 +1392,8 @@

                    return nil
                    }

                    -func (t *Transport) dialConn(ctx context.Context, cm connectMethod) (*persistConn, error) {
                    - pconn := &persistConn{
                    +func (t *Transport) dialConn(ctx context.Context, cm connectMethod) (pconn *persistConn, err error) {
                    + pconn = &persistConn{
                    t: t,
                    cacheKey: cm.key(),
                    reqch: make(chan requestAndChan, 1),
                    @@ -1423,9 +1532,6 @@

                    }
                    }

                    - if t.MaxConnsPerHost > 0 {
                    - pconn.conn = &connCloseListener{Conn: pconn.conn, t: t, cmKey: pconn.cacheKey}
                    - }
                    pconn.br = bufio.NewReaderSize(pconn, t.readBufferSize())
                    pconn.bw = bufio.NewWriterSize(persistConnWriter{pconn}, t.writeBufferSize())

                    @@ -1631,7 +1737,7 @@

                    return pc.canceledErr
                    }

                    -// isReused reports whether this connection is in a known broken state.
                    +// isReused reports whether this connection has been used before.
                    func (pc *persistConn) isReused() bool {
                    pc.mu.Lock()
                    r := pc.reused
                    @@ -2119,10 +2225,12 @@

                    // but the server has already replied. In this case, we don't
                    // want to wait too long, and we want to return false so this
                    // connection isn't re-used.
                    + t := time.NewTimer(maxWriteWaitBeforeConnReuse)
                    + defer t.Stop()
                    select {
                    case err := <-pc.writeErrCh:
                    return err == nil
                    - case <-time.After(maxWriteWaitBeforeConnReuse):
                    + case <-t.C:
                    return false
                    }
                    }
                    @@ -2374,10 +2482,10 @@

                    pc.broken = true
                    if pc.closed == nil {
                    pc.closed = err
                    - if pc.alt != nil {
                    - // Clean up any host connection counting.
                    - pc.t.decHostConnCount(pc.cacheKey)
                    - } else {
                    + pc.t.decConnsPerHost(pc.cacheKey)
                    + // Close HTTP/1 (pc.alt == nil) connection.
                    + // HTTP/2 closes its connection itself.
                    + if pc.alt == nil {
                    if err != errCallerOwnsConn {
                    pc.conn.Close()
                    }
                    diff --git a/src/net/http/transport_test.go b/src/net/http/transport_test.go
                    index 2b58e1d..ea01a20 100644
                    @@ -3378,9 +3382,9 @@

                    }
                    wantIdle("after second put", 0)

                    - tr.RequestIdleConnChForTesting() // should toggle the transport out of idle mode
                    + tr.QueueForIdleConnForTesting() // should toggle the transport out of idle mode
                    if tr.IsIdleForTesting() {
                    -		t.Error("shouldn't be idle after RequestIdleConnChForTesting")
                    + t.Error("shouldn't be idle after QueueForIdleConnForTesting")
                    }
                    if !tr.PutIdleTestConn("http", "example.com") {
                    t.Fatal("after re-activation")

                    @@ -3802,8 +3806,8 @@
                    ln := newLocalListener(t)
                    defer ln.Close()

                    - handledPendingDial := make(chan bool, 1)
                    - SetPendingDialHooks(nil, func() { handledPendingDial <- true })
                    + var wg sync.WaitGroup
                    + SetPendingDialHooks(func() { wg.Add(1) }, wg.Done)
                    defer SetPendingDialHooks(nil, nil)

                    testDone := make(chan struct{})
                    @@ -3873,7 +3877,7 @@

                    doReturned <- true
                    <-madeRoundTripper
                    - <-handledPendingDial
                    + wg.Wait()
                    }

                    func TestTransportReuseConnection_Gzip_Chunked(t *testing.T) {

                    To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

                    Gerrit-Project: go
                    Gerrit-Branch: master
                    Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
                    Gerrit-Change-Number: 184262
                    Gerrit-PatchSet: 4
                    Gerrit-Owner: Russ Cox <r...@golang.org>
                    Gerrit-Reviewer: Bryan C. Mills <bcm...@google.com>
                    Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
                    Gerrit-Reviewer: Russ Cox <r...@golang.org>
                    Gerrit-CC: Michael Fraenkel <michael....@gmail.com>
                    Gerrit-MessageType: merged

                    Bryan C. Mills (Gerrit)

                    unread,
                    Aug 22, 2019, 2:46:32 AM8/22/19
                    to Russ Cox, goph...@pubsubhelper.golang.org, Michael Fraenkel, Bryan C. Mills, Gobot Gobot, golang-co...@googlegroups.com

                    I'm trying to track down an apparent memory leak reported within Google that I suspect may be related to this CL. Posted some questions for discussion.

                    View Change

                    5 comments:

                    • File src/net/http/transport.go:

                      • Patch Set #4, Line 921: // Drop on floor on behalf of Transport.removeIdleConn.

                        Do we need to remove pconn from idleLRU in this case too?

                      • Patch Set #4, Line 922: list = list[:len(list)-1]

                        Should we nil out the reference to the broken connection to avoid retaining it?

                      • Patch Set #4, Line 934: list = list[:len(list)-1]

                        Nil out the list entry here too?

                      • Patch Set #4, Line 954: q.pushBack(w)

                        What happens if every connection ends up broken in the steady state? (What clears the stale entries from the queue in that case?)

                      • Patch Set #4, Line 1262: q.pushBack(w)

                        If t.MaxConnsPerHost > 0 and all connections are successfully reused in the steady state, what clears entries from the connsPerHostWait queues?

                        As far as I can tell they're only emptied in decConnsPerHost, which is only called when a persistConn is closed (not reused).

                    To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

                    Gerrit-Project: go
                    Gerrit-Branch: master
                    Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
                    Gerrit-Change-Number: 184262
                    Gerrit-PatchSet: 4
                    Gerrit-Owner: Russ Cox <r...@golang.org>
                    Gerrit-Reviewer: Bryan C. Mills <bcm...@google.com>
                    Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
                    Gerrit-Reviewer: Russ Cox <r...@golang.org>
                    Gerrit-CC: Michael Fraenkel <michael....@gmail.com>
                    Gerrit-Comment-Date: Thu, 22 Aug 2019 06:46:28 +0000
                    Gerrit-HasComments: Yes
                    Gerrit-Has-Labels: No
                    Gerrit-MessageType: comment

                    Bryan C. Mills (Gerrit)

                    unread,
                    Aug 27, 2019, 1:11:21 PM8/27/19
                    to Russ Cox, goph...@pubsubhelper.golang.org, Michael Fraenkel, Bryan C. Mills, Gobot Gobot, golang-co...@googlegroups.com

                    View Change

                    3 comments:

                      • Patch Set #4, Line 921: // Drop on floor on behalf of Transport.removeIdleConn.

                        Do we need to remove pconn from idleLRU in this case too?

                      • The answer seems to be no, since removeIdleConn will still be called for the connection.

                    To view, visit change 184262. To unsubscribe, or for help writing mail filters, visit settings.

                    Gerrit-Project: go
                    Gerrit-Branch: master
                    Gerrit-Change-Id: I0dfac3a6fe8a6cdf5f0853722781fe2ec071ac97
                    Gerrit-Change-Number: 184262
                    Gerrit-PatchSet: 4
                    Gerrit-Owner: Russ Cox <r...@golang.org>
                    Gerrit-Reviewer: Bryan C. Mills <bcm...@google.com>
                    Gerrit-Reviewer: Gobot Gobot <go...@golang.org>
                    Gerrit-Reviewer: Russ Cox <r...@golang.org>
                    Gerrit-CC: Michael Fraenkel <michael....@gmail.com>
                    Gerrit-Comment-Date: Tue, 27 Aug 2019 17:11:18 +0000
                    Gerrit-HasComments: Yes
                    Gerrit-Has-Labels: No
                    Comment-In-Reply-To: Bryan C. Mills <bcm...@google.com>
                    Gerrit-MessageType: comment
                    Reply all
                    Reply to author
                    Forward
                    0 new messages