httputil.ReverseProxy + http.DefaultTransport.DialContext results in cancelled DNS requests

1,434 views
Skip to first unread message

Matt Duch

unread,
Dec 20, 2016, 6:55:17 PM12/20/16
to golang-nuts
I'm running into issues when using a ReverseProxy that sits in front of a service that serves static asset files (in this case, 100's in a few seconds). Occasionally (1 out ~500 files) will error out with:
http: proxy error: dial tcp: lookup myapp.default.svc.cluster.local on 172.18.0.16:53: dial udp 172.18.0.16:53: operation was canceled
That error occurs within a second of the request being sent, so it shouldn't trigger any timeouts.

When using a default ReverseProxy initialized using:
proxy := httputil.NewSingleHostReverseProxy(&url.URL{Scheme: scheme,Host: host})

In a configuration like:
browser -> proxy -> golang http service

Where the proxy and http service (go 1.7.4) are running in docker containers (using kubernetes), and the DNS resolver is kube-dns (kubernetes 1.4). 
Changing the proxy to use a non-default transport (identical to http.DefaultTransport) but changing DialContext -> Dial:
v.Proxy.Transport = &http.Transport{
    Proxy: http.ProxyFromEnvironment,
    Dial: (&net.Dialer{
        Timeout:   30 * time.Second,
        KeepAlive: 30 * time.Second,
    }).Dial,
    MaxIdleConns:          100,
    IdleConnTimeout:       90 * time.Second,
    TLSHandshakeTimeout:   10 * time.Second,
    ExpectContinueTimeout: 1 * time.Second,
}
solves the problem.

There are no errors in dnsmasq and kube-dns (the containers that provide DNS information). 
I've stress tested the DNS setup (hundreds of thousands of requests, with one hundred concurrently) without issues (from a go binary). 
I've hit the backend service directly with 1000 concurrent requests, no issues (from a go binary as well). 
If I switch from using a hostname in NewSingleHostReverseProxy to the IP address of the backend service, the problem is also solved.

This seems to be a unique to named hosts + DialContext + ReverseProxy. Any ideas as to what might be happening?

luke....@hootsuite.com

unread,
Jun 28, 2017, 2:56:40 PM6/28/17
to golang-nuts
Seeing this issue with golang 1.8.3 in Kubernetes but without a reverse proxy. Just KubeDNS -> Kubernetes service IP -> other Golang service.

When compiled with go 1.7 we saw "dial tcp: no suitable address found" errors and so after looking at this issue upgraded to go 1.8.3 hoping it would fix it. Instead it seems to have just been replaced with this error.

dial tcp: lookup other-service.default.svc.cluster.local on 10.244.0.10:53: dial udp 10.244.0.10:53: operation was canceled


id: 7898659753248090

emilien.k...@gmail.com

unread,
Jul 17, 2017, 9:27:38 AM7/17/17
to golang-nuts
It looks like I have a similar issue with just a golang service trying to resolve the ip of another service through kube-dns. The error shows up when I submit a "large" number of requests within a few seconds. Do you have any update on this?

emilien.k...@gmail.com

unread,
Jul 19, 2017, 4:55:14 AM7/19/17
to golang-nuts
With more investigation we could see that the kube-dns service was restarting every now and then (about every 20 minutes) which could explain why it is sometimes not able to lookup names. The service restarts because of a concurrent map writes (see https://github.com/kubernetes/dns/issues/88) which should be fixed in newer versions of kubernetes (not very clear if >1.6.7 or >1.7.0, see https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG.md, looks like it's been fixed in kube-dns 1.14.2).
Reply all
Reply to author
Forward
0 new messages