I'm running a small monitoring service in Go that runs a series of HTTP requests sequentially every minute, reporting on the status of various services and comparing HTTP responses to what's expected.
We're getting a number of false alarms at the moment, which seem to come down to client.Do() returning an EOF error roughly once every 2000-2500 requests.
This is the essence of my code:
var (
tr = &http.Transport{
TLSClientConfig: &tls.Config{
InsecureSkipVerify: true,
ServerName: HOST,
},
DisableKeepAlives: true,
}
result = newCheckResult()
client = http.Client{Transport: tr}
req *http.Request
res *http.Response
)
if req, err = http.NewRequest(method, url, nil); err != nil {
return nil, err
}
req.Host = HOST
req.Close = true
if res, err = client.Do(req); res != nil {
defer res.Body.Close()
}
if err != nil {
return nil, err
}
// Response handling logic here; does not read in the response body
EOF is being returned from that last block there. It's hard to match up precisely, but I'm fairly certain that the Apache servers I'm hitting have logged these requests as a response with code 200.
The roundabout way of creating the requests is to avoid problems with hostname verification, and to ensure that SNI works by setting the ServerName. The failing requests are all actually over plain HTTP, so I don't think the TLS config has any effect.
I've tried digging into the code but it's difficult to see where an EOF would come from in the RoundTripper. I found some old bugs and a StackOverflow thread that suggest bugs or quirks with the connection pooling, which is why I see Close to true and disable keep-alive.
Is there some way to get better visibility over why a request actually failed, rather than just EOF? It's a peculiar error to get as I don't even try to read in the response body.