When does net/http's Client.Do return io.EOF?

880 views
Skip to first unread message

Gregor Best

unread,
Dec 7, 2020, 5:58:49 AM12/7/20
to golang-nuts
Hi!

We're using a 3rd party provider's API to handle some of our customer
requests. Interaction with their API consists of essentially POST'ing
a small XML document to them.

From time to time, `net/http`'s `Client.Do` returns an `io.EOF`
when sending the request. For now, the provider always reported
those instances as "we didn't get your request".

Cursory search in various Github issues and a glance at the source
of `net/http` seems to indicate that `io.EOF` is almost always
caused by the server closing the connection, but the client not
getting the "it's now closed" signal before it tries to re-use the
connection.

FWIW, `fasthttp`'s HTTP client implementation treats `io.EOF` as
"this request needs to be retried", but I don't know how much that
knowledge transfers to `net/http`.

Is my interpretation of the situation correct? Or are there other
circumstances where the request _did_ end up at the remote end and
`io.EOF` is returned?

I guess what I'm asking is: Is it safe (as in: requests won't end
up on the remote twice or more times) to retry POST requests when
`Client.Do` returns an `io.EOF`?

Note that disabling connection reuse (as was suggested by a number
of stackoverflow posts) is an option that we'd like to avoid unless
there's absolutely no other way to handle this.

--
Gregor Best
be...@pferdewetten.de

Axel Wagner

unread,
Dec 7, 2020, 6:59:17 AM12/7/20
to Gregor Best, golang-nuts
We recently had the same issue.

On Mon, Dec 7, 2020 at 11:58 AM Gregor Best <be...@pferdewetten.de> wrote:
Hi!

We're using a 3rd party provider's API to handle some of our customer
requests. Interaction with their API consists of essentially POST'ing
a small XML document to them.

 From time to time, `net/http`'s `Client.Do` returns an `io.EOF`
when sending the request. For now, the provider always reported
those instances as "we didn't get your request".

Cursory search in various Github issues and a glance at the source
of `net/http` seems to indicate that `io.EOF` is almost always
caused by the server closing the connection, but the client not
getting the "it's now closed" signal before it tries to re-use the
connection.

That was what I concluded as well. I think it could theoretically also happen if a new connection is opened and immediately closed by the server.

FWIW, `fasthttp`'s HTTP client implementation treats `io.EOF` as
"this request needs to be retried", but I don't know how much that
knowledge transfers to `net/http`.

I think `fasthttp` is behaving incorrectly - in particular, if it also does so for POST requests (you mention that you use them). They are, in general, not idempotent and there is a race where the client sends the request, the server receives it and starts handling it (causing some observable side-effects) but then dies before it can send a response, with the connection being closed by the kernel. If the client retries that (at a different backend, or once the server got restarted), you might end up with corrupted state.

AIUI, `net/http` never assumes requests are retriable - even GET requests - and leaves it up to the application to decide, whether a request can be retried or not. Our solution was to verify that all our requests *can* be retried and then wrapping the client call with retries.

Is my interpretation of the situation correct? Or are there other
circumstances where the request _did_ end up at the remote end and
`io.EOF` is returned?

I think in general, you can't distinguish (as a client) whether or not the server received the message. For example, you can try a TCP server that immediately closes the connection. The `net/http` client will report an EOF. Though, to clarify: In this test it didn't return `io.EOF`, but an `*url.Error` wrapping `io.EOF`. So if you actually get an unwrapped `io.EOF`, the answer might be different.

I guess what I'm asking is: Is it safe (as in: requests won't end
up on the remote twice or more times) to retry POST requests when
`Client.Do` returns an `io.EOF`?

Note that disabling connection reuse (as was suggested by a number
of stackoverflow posts) is an option that we'd like to avoid unless
there's absolutely no other way to handle this.

If I'm correct in my understanding, even disabling keep-alive won't really help - though it might reduce the number of these errors significantly. It will always be possible that the server closes the connection while a request is in-flight. If that is sufficient, a middle-ground might be to reduce the keep-alive timeout on the client (or increase it on the server):
In our case, the server was a java server and it used a timeout of 30s, while the go client defaults to a 90s timeout. If you instead use, say, a 20s timeout on the go client, you still get most of the performance benefit of keep-alive, but the client will assume the connection is useless 10s before the server actually closes it. Not ideal, but it should significantly improve the situation.

IMO the only real solution though, is to make requests idempotent, e.g. by adding a unique request ID. It's a lot of work and only really effective if it's propagated all the way down. But it's still easier to achieve than exactly-once-delivery :)
 

--
   Gregor Best
   be...@pferdewetten.de

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/a31d42a5-6a81-0579-a380-b268d10f4eb0%40pferdewetten.de.

Robert Engels

unread,
Dec 7, 2020, 10:27:13 AM12/7/20
to Axel Wagner, Gregor Best, golang-nuts
Excellent analysis. Idempotence and exactly once delivery are often glossed over and yet it is usually critical to proper system design.

The key for me is to remember that the request can fail at ANY point in the flow.

XA transactions can solve this, but most systems these days rely on eventual consistency for scalability. 

On Dec 7, 2020, at 5:59 AM, 'Axel Wagner' via golang-nuts <golan...@googlegroups.com> wrote:



Marcin Romaszewicz

unread,
Dec 7, 2020, 11:29:31 AM12/7/20
to Gregor Best, golang-nuts
It's uncommon to talk directly to a server these days, instead, we have proxies and load balancers along the way as well, and there are many reasons that a connection would get closed and you'd get an io.EOF. It's unlikely that the server received the request in this case, but it's possible, depending on how the proxies work.

You need to design API's with that in mind. On the server side, you must assume the client will retry, and so, you must reject duplicate requests. On the client side, you should retry everything except an HTTP/400 class error. It's now a common pattern that requests may include an idempotency token of some kind to aid the server in rejecting duplicates.

-- Marcin



Gregor Best

unread,
Dec 9, 2020, 4:25:11 AM12/9/20
to golang-nuts
Thanks for the replies guys. Looks like we (and our provider) will have
to do a bit of soul-searching wrt idempotent API requests. At least it's
good to see that we're not entirely off the beaten path with what we're
doing :)
> be...@pferdewetten.de <mailto:be...@pferdewetten.de>
>
> --
> You received this message because you are subscribed to the Google
> Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to golang-nuts...@googlegroups.com
> <mailto:golang-nuts%2Bunsu...@googlegroups.com>.
> <https://groups.google.com/d/msgid/golang-nuts/a31d42a5-6a81-0579-a380-b268d10f4eb0%40pferdewetten.de>.
>

--
Gregor Best
be...@pferdewetten.de

Amit Saha

unread,
Dec 9, 2020, 5:42:06 AM12/9/20
to Axel Wagner, Gregor Best, golang-nuts


> On 7 Dec 2020, at 10:58 pm, 'Axel Wagner' via golang-nuts <golan...@googlegroups.com> wrote:
>
> We recently had the same issue.
>
> On Mon, Dec 7, 2020 at 11:58 AM Gregor Best <be...@pferdewetten.de> wrote:
> Hi!
>
> We're using a 3rd party provider's API to handle some of our customer
> requests. Interaction with their API consists of essentially POST'ing
> a small XML document to them.
>
> From time to time, `net/http`'s `Client.Do` returns an `io.EOF`
> when sending the request. For now, the provider always reported
> those instances as "we didn't get your request".
>
> Cursory search in various Github issues and a glance at the source
> of `net/http` seems to indicate that `io.EOF` is almost always
> caused by the server closing the connection, but the client not
> getting the "it's now closed" signal before it tries to re-use the
> connection.
>
> That was what I concluded as well. I think it could theoretically also happen if a new connection is opened and immediately closed by the server.
>
> FWIW, `fasthttp`'s HTTP client implementation treats `io.EOF` as
> "this request needs to be retried", but I don't know how much that
> knowledge transfers to `net/http`.
>
> I think `fasthttp` is behaving incorrectly - in particular, if it also does so for POST requests (you mention that you use them). They are, in general, not idempotent and there is a race where the client sends the request, the server receives it and starts handling it (causing some observable side-effects) but then dies before it can send a response, with the connection being closed by the kernel. If the client retries that (at a different backend, or once the server got restarted), you might end up with corrupted state.
>
> AIUI, `net/http` never assumes requests are retriable - even GET requests - and leaves it up to the application to decide, whether a request can be retried or not. Our solution was to verify that all our requests *can* be retried and then wrapping the client call with retries.

Just wanted to point out here that the standard Transport has some retry behaviour by default (https://golang.org/pkg/net/http/#Transport):

> Transport only retries a request upon encountering a network error if the request is idempotent and either has no body or has its Request.GetBody defined. HTTP requests are considered idempotent if they have HTTP methods GET, HEAD, OPTIONS, or TRACE; or if their Header map contains an "Idempotency-Key" or "X-Idempotency-Key" entry. If the idempotency key value is a zero-length slice, the request is treated as idempotent but the header is not sent on the wire.




Reply all
Reply to author
Forward
0 new messages