I'm running into issues when using a ReverseProxy that sits in front of a service that serves static asset files (in this case, 100's in a few seconds). Occasionally (1 out ~500 files) will error out with:
http: proxy error: dial tcp: lookup myapp.default.svc.cluster.local on 172.18.0.16:53: dial udp 172.18.0.16:53: operation was canceledThat error occurs within a second of the request being sent, so it shouldn't trigger any timeouts.
When using a default ReverseProxy initialized using:
proxy := httputil.NewSingleHostReverseProxy(&url.URL{Scheme: scheme,Host: host})
In a configuration like:
browser -> proxy -> golang http service
Where the proxy and http service (go 1.7.4) are running in docker containers (using kubernetes), and the DNS resolver is kube-dns (kubernetes 1.4).
Changing the proxy to use a non-default transport (identical to http.DefaultTransport) but changing DialContext -> Dial:
v.Proxy.Transport = &http.Transport{
Proxy: http.ProxyFromEnvironment,
Dial: (&net.Dialer{
Timeout: 30 * time.Second,
KeepAlive: 30 * time.Second,
}).Dial,
MaxIdleConns: 100,
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
}
solves the problem.
There are no errors in dnsmasq and kube-dns (the containers that provide DNS information).
I've stress tested the DNS setup (hundreds of thousands of requests, with one hundred concurrently) without issues (from a go binary).
I've hit the backend service directly with 1000 concurrent requests, no issues (from a go binary as well).
If I switch from using a hostname in NewSingleHostReverseProxy to the IP address of the backend service, the problem is also solved.
This seems to be a unique to named hosts + DialContext + ReverseProxy. Any ideas as to what might be happening?