net/http: no Client API to close server connection based on Response #21978

Jim Minter

unread,

Apr 2, 2024, 3:30:10 PM4/2/24

to golang-nuts

Hello,

I was wondering if anyone had any ideas about https://github.com/golang/go/issues/21978 ("net/http: no Client API to close server connection based on Response") -- it's an old issue, but it's something that's biting me currently and I can't see a neat way to solve it.

As an HTTP client, I'm hitting a case where some HTTP server instance behind a load balancer breaks and starts returning 500s (FWIW with no body) and without the "Connection: close" header. I retry, but I end up reusing the same TCP connection to the same broken HTTP instance, so I never hit a different backend server and my retry policy is basically useless.

Obviously I need to get the server owner to fix its behavior, but it would be great if, as a client, there were a way to get net/http not to reuse the connection further, in order to be less beholden to the server's behavior.

This happens with both HTTP/1.1 and HTTP/2.

If appropriate, I could live with the request to close the connection racing with other new requests to the same endpoint. Getting to the point where 2 or 3 requests fail and then the connection is closed is way better than having requests fail ad infinitum.

http.Transport.CloseIdleConnections() doesn't solve the problem well (a) because it's a big hammer, and (b) because there's no guarantee that the connection is idle when CloseIdleConnections() is called.

FWIW I can see in `func (pc *persistConn) readLoop()` there's the following test:

```go
if resp.Close || rc.req.Close || resp.StatusCode <= 199 || bodyWritable {
// Don't do keep-alive on error if either party requested a close
// or we get an unexpected informational (1xx) response.
// StatusCode 100 is already handled above.
alive = false
}
```

I imagine that extending that to `if resp.Close || rc.req.Close || resp.StatusCode <= 199 || bodyWritable || resp.StatusCode >= 500 {` might probably help this specific case, but I imagine that's an unacceptably large behavior change for the rest of the world.

I'm not sure how else this could be done. Does anyone have any thoughts?

Many thanks for the help,

Jim

Sean Liao

unread,

Apr 2, 2024, 5:01:30 PM4/2/24

to golang-nuts

since you already know the server is problematic, you could just set Close on the original request.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/34d597cf-a84c-48eb-b555-537a8768f468n%40googlegroups.com.

- sean

Jim Minter

unread,

Apr 2, 2024, 8:38:06 PM4/2/24

to golang-nuts

That's possible, but the rate of occurrence of the issue is low (but painful when it happens), and the costs of starting a new TLS connection for every HTTP request are significant. I'm looking for a better way.

Jim

Robert Engels

unread,

Apr 2, 2024, 9:34:13 PM4/2/24

to Jim Minter, golang-nuts

My guess is that if you are getting a 500 you have exhausted the server or capacity. So close the client completely and perform an exponential back off.

You can wrap all of this at a higher level to keep the synchronous behavior.

On Apr 2, 2024, at 7:37 PM, Jim Minter <j...@minter.uk> wrote:

That's possible, but the rate of occurrence of the issue is low (but painful when it happens), and the costs of starting a new TLS connection for every HTTP request are significant. I'm looking for a better way.

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/a3d208a6-90c0-42af-9a13-a8f3f0c7b21dn%40googlegroups.com.

Eli Lindsey

unread,

Apr 3, 2024, 12:00:27 AM4/3/24

to Jim Minter, golang-nuts

There isn’t a great way to handle this currently - we maintain out of tree patches to do something similar, though ours are h2 specific. The crux of the problem is that net currently lacks a usable connection pool API (there is some slightly newer discussion here, but it’s similar to the issue you linked https://github.com/golang/go/discussions/60746).

If you want to stay in tree, one option may be using httptrace GotConnInfo and calling Close on the underlying connection (in direct violation of GotConnInfo’s doc). I would expect this to error out anything inflight, but otherwise be benign (though I have not checked :) ).

-eli

Robert Engels

unread,

Apr 3, 2024, 6:42:37 AM4/3/24

to Eli Lindsey, Jim Minter, golang-nuts

That probably wasn’t clear. Why not create a Transport per host. Then when the 500 is encountered stop using that transport completely and create a new instance. Probably want to cancel any requests currently in flight.

The connection pool is per transport.

On Apr 2, 2024, at 11:05 PM, Eli Lindsey <e...@siliconsprawl.com> wrote:

There isn’t a great way to handle this currently - we maintain out of tree patches to do something similar, though ours are h2 specific. The crux of the problem is that net currently lacks a usable connection pool API (there is some slightly newer discussion here, but it’s similar to the issue you linked https://github.com/golang/go/discussions/60746).

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/080B6923-51DA-4DDB-9400-B1054C1DFCE4%40siliconsprawl.com.

Eli Lindsey

unread,

Apr 3, 2024, 8:48:58 AM4/3/24

to Robert Engels, Jim Minter, golang-nuts

It would work, but has potentially high cost since it also causes any healthy conns in the pool to be torn down. How useful it is in practice depends on request rate, number of backends behind the lb, and ratio of healthy to unhealthy (500’ing) connections. It’s hard to tell from the description if it would work here - retrying and reusing the same busted connection could mean that the request rate is very low and there’s only one idle conn (in which case cycling the transport is a good solution), or it could mean that the unhealthy conn is quicker to respond than the pooled healthy conns and gobbles up a disproportionate share of requests.

Tangential question, when the backend servers land in this state does the lb not detect and remove them?

-eli

Robert Engels

unread,

Apr 3, 2024, 2:47:59 PM4/3/24

to Eli Lindsey, Jim Minter, golang-nuts

Just create a recyclable transport for the bad server and put all of the rest on a single shared transport. If one connection is returning 500 for all requests I can’t see how a different connection would solve that - unless the backend is completely broken.

On Apr 3, 2024, at 7:48 AM, Eli Lindsey <e...@siliconsprawl.com> wrote:

It would work, but has potentially high cost since it also causes any healthy conns in the pool to be torn down. How useful it is in practice depends on request rate, number of backends behind the lb, and ratio of healthy to unhealthy (500’ing) connections. It’s hard to tell from the description if it would work here - retrying and reusing the same busted connection could mean that the request rate is very low and there’s only one idle conn (in which case cycling the transport is a good solution), or it could mean that the unhealthy conn is quicker to respond than the pooled healthy conns and gobbles up a disproportionate share of requests.

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/D0363149-4F68-42A9-8B5B-DFAD8AC36B87%40siliconsprawl.com.

Jim Minter

unread,

Apr 3, 2024, 2:50:50 PM4/3/24

to Robert Engels, Eli Lindsey, golang-nuts

Yes, I agree, I think this approach makes sense (and should have been obvious to me as something to try...). It could be implementable as a wrapper transport too. I'll try it out and reply back here if it doesn't work.

Thank-you!

Jim

Eli Lindsey

unread,

Apr 3, 2024, 3:04:00 PM4/3/24

to Robert Engels, Jim Minter, golang-nuts

Just create a recyclable transport for the bad server and put all of the rest on a single shared transport. If one connection is returning 500 for all requests I can’t see how a different connection would solve that - unless the backend is completely broken.

The connection pool is keyed off of hostname. Depending on the topology it’s quite possible that a new connection will be healthy (either to a new host backing that hostname, to a new host behind the LB, etc.), and depending on how many healthy connections are pooled and actively servicing requests it may be unacceptable blast radius to nuke the transport for the sake of one bad conn. So yes, totally valid option to consider, but really dependent on the specifics.

-eli

Robert Engels

unread,

Apr 3, 2024, 11:22:25 PM4/3/24

to Jim Minter, Eli Lindsey, golang-nuts

Happy that it sparked an idea. I also don’t think Eli’s concern is valid. if there are other requests in flight (on different connections I assume) - let those continue - just put any new requests on a new transport for that host (after a 500 error is encountered) - then tear down the bad when it has no more requests being processed.

On Apr 3, 2024, at 1:50 PM, Jim Minter <j...@minter.uk> wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAC1koBjmbiGDTDkGKE0%2BCdEroJcpma3vu128%3DQgrKDSY3N3GBw%40mail.gmail.com.

Jim Minter

unread,

Apr 4, 2024, 11:35:41 AM4/4/24

to Robert Engels, Eli Lindsey, golang-nuts

For reference, https://go.dev/play/p/ijSN9CsQFbc is the sort of thing we're going to try out. I think it might help in our somewhat exceptional circumstance (500s like this are rare, haven't yet been able to diagnose and fix the server-side root cause, transport is bound to specific API client, etc.) but others' mileage may vary. A potentially useful hack but not a general purpose kind of thing.