docker hang with kubernetes 1.5.3

315 views
Skip to first unread message

krma...@gmail.com

unread,
Jul 28, 2017, 3:10:46 PM7/28/17
to Kubernetes developer/contributor discussion
Hi All

We are seeing a docker hang when running Kubernetes 1.5.3 on Centos 7. The command "docker ps" hangs on the machine. Sometimes it works with "docker ps -n 5" or some such number but  hangs beyond that number. "docker version" also works.

This generally repro's easily when we create 50-60 Deployments of a java app. The app itself doesn't consume much resources and just sits there idling.

The docker version i am using is

Client:

 Version:      1.12.6

 API version:  1.24

 Go version:   go1.6.4

 Git commit:   78d1802

 Built:        Tue Jan 10 20:20:01 2017

 OS/Arch:      linux/amd64


Server:

 Version:      1.12.6

 API version:  1.24

 Go version:   go1.6.4

 Git commit:   78d1802

 Built:        Tue Jan 10 20:20:01 2017

 OS/Arch:      linux/amd64


Are there known issues in this version of docker , we should not be using with Kubernetes ?



the strace of docker ps shows the following:-

futex(0xc820062908, FUTEX_WAKE, 1)      = 1

socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 4

setsockopt(4, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0

connect(4, {sa_family=AF_LOCAL, sun_path="/var/run/docker.sock"}, 23) = 0

epoll_create1(EPOLL_CLOEXEC)            = 5

epoll_ctl(5, EPOLL_CTL_ADD, 4, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1170749536, u64=139639147477088}}) = 0

getsockname(4, {sa_family=AF_LOCAL, NULL}, [2]) = 0

getpeername(4, {sa_family=AF_LOCAL, sun_path="/var/run/docker.sock"}, [23]) = 0

futex(0xc820062908, FUTEX_WAKE, 1)      = 1

read(4, 0xc82034b000, 4096)             = -1 EAGAIN (Resource temporarily unavailable)

write(4, "GET /v1.24/containers/json HTTP/"..., 95) = 95

futex(0xc820062d08, FUTEX_WAKE, 1)      = 1

futex(0x132ccc8, FUTEX_WAIT, 0, NULL^CProcess 154003 detached



kubelet logs show some interesting lines which are likely a result of docker being down



GET /healthz: (161.703µs) 500

8160 [running]:

netes/pkg/httplog.(*respLogger).recordStatus(0xc4216650a0, 0x1f4)

io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/httplog/log.go:219 +0xbb

netes/pkg/httplog.(*respLogger).WriteHeader(0xc4216650a0, 0x1f4)

io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/httplog/log.go:198 +0x35

or(0x5c578a0, 0xc4216650a0, 0xc4210e0210, 0xa3, 0x1f4)

o/src/net/http/server.go:1738 +0xda

netes/pkg/healthz.handleRootHealthz.func1(0x5c578a0, 0xc4216650a0, 0xc421bfb680)

io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/healthz/healthz.go:109 +0x490

dlerFunc.ServeHTTP(0xc4211ba0a0, 0x5c578a0, 0xc4216650a0, 0xc421bfb680)

o/src/net/http/server.go:1726 +0x44

netes/vendor/github.com/emicklei/go-restful.(*Container).HandleWithFilter.func1(0x5c578a0, 0xc4216650a0, 0xc421bfb680)

io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:301 +0x84

dlerFunc.ServeHTTP(0xc4211ba0c0, 0x5c578a0, 0xc4216650a0, 0xc421bfb680)

o/src/net/http/server.go:1726 +0x44

erveMux).ServeHTTP(0xc4211d0000, 0x5c578a0, 0xc4216650a0, 0xc421bfb680)

o/src/net/http/server.go:2022 +0x7f

netes/vendor/github.com/emicklei/go-restful.(*Container).ServeHTTP(0xc4211d2000, 0x5c578a0, 0xc4216650a0, 0xc421bfb680)

io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:287 +0x4d

netes/pkg/kubelet/server.(*Server).ServeHTTP(0xc4211b60a0, 0x5c578a0, 0xc4216650a0, 0xc421bfb680)

io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/server/server.go:739 +0x10e

verHandler.ServeHTTP(0xc420247d00, 0x5c5d860, 0xc422f9c270, 0xc421bfb680)

o/src/net/http/server.go:2202 +0x7d

onn).serve(0xc42332ba00, 0x5c60560, 0xc420997b00)

o/src/net/http/server.go:1579 +0x4b7

et/http.(*Server).Serve

o/src/net/http/server.go:2293 +0x44d

r output: "[+]ping ok\n[+]syncloop ok\n[-]pleg failed: PLEG took longer than expected: pleg was last seen active at 2017-07-28 18:50:44.467884266 +0000 GMT\nhealthz check failed\

ient/1.1] 127.0.0.1:57126]


12.348513   97509 container_manager_linux.go:426] errors moving "docker-containerd" pid: failed to find pid namespace of process '\U0001c064'


23.223091   97509 kubelet_pods.go:710] Error listing containers: dockertools.operationTimeout{err:context.deadlineExceededError{}}


 desired_state_of_world_populator.go:182] kubeContainerRuntime.findAndRemoveDeletedPods returned error operation timeout: context deadline exceeded.


Guang Ya Liu

unread,
Jul 29, 2017, 9:55:22 PM7/29/17
to krma...@gmail.com, Kubernetes developer/contributor discussion
I think this should be caused by your app hang. There are already some issues tracing in Docker community:


One suggestion for you to check this problem is as https://docs.docker.com/engine/admin/#force-a-stack-trace-to-be-logged to see what is wrong with docker.

Thanks,

Guangya


--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-dev@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/e0731622-8500-43f5-ab8d-bbf62bd54472%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

krma...@gmail.com

unread,
Aug 2, 2017, 3:52:14 AM8/2/17
to Kubernetes developer/contributor discussion, krma...@gmail.com
Thanks Guangya for the pointers.

It seems in our case, we have docker-containerd NOT running for some reason. Is this a known issue in 1.12.6 ?  Note that in our docker ps hang case, we are unable to get the stack trace.
 I didnt find any obvious bugs in docker which is chasing unexpected docker-containerd exits .

On a related note, does kubelet check for docker health and restart it or show that as a health signal somewhere ?

-mayank
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
To post to this group, send email to kuberne...@googlegroups.com.

David Ashpole

unread,
Aug 2, 2017, 11:07:11 AM8/2/17
to krma...@gmail.com, Kubernetes developer/contributor discussion
The kubelet reports the NodeCondition "NotReady" with a message related to the runtime (e.g. docker) if the runtime becomes unavailable.  The kubelet may also restart the runtime if it becomes unresponsive.

krma...@gmail.com

unread,
Aug 7, 2017, 2:08:05 AM8/7/17
to Kubernetes developer/contributor discussion, krma...@gmail.com
Thanks David. I dont believe the kubelet is restarting the runtime if it becomes unresponsive or the method by which it checks for the runtime being unresponsive is not full proof otherwise we would never see the docker hang in the first case.

In our case, docker remains in the hung state and when we manually restart docker, the problem is resolved. Is kubelet going to restart docker if it sees the docker in hung state or is it only going to restart docker-containerd ?

Regards
Mayank
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages