docker hang with kubernetes 1.5.3

krma...@gmail.com

unread,

Jul 28, 2017, 3:10:46 PM7/28/17

to Kubernetes developer/contributor discussion

Hi All

We are seeing a docker hang when running Kubernetes 1.5.3 on Centos 7. The command "docker ps" hangs on the machine. Sometimes it works with "docker ps -n 5" or some such number but hangs beyond that number. "docker version" also works.

This generally repro's easily when we create 50-60 Deployments of a java app. The app itself doesn't consume much resources and just sits there idling.

The docker version i am using is

Client:

Version: 1.12.6

API version: 1.24

Go version: go1.6.4

Git commit: 78d1802

Built: Tue Jan 10 20:20:01 2017

OS/Arch: linux/amd64

Server:

Version: 1.12.6

API version: 1.24

Go version: go1.6.4

Git commit: 78d1802

Built: Tue Jan 10 20:20:01 2017

OS/Arch: linux/amd64

Are there known issues in this version of docker , we should not be using with Kubernetes ?

the strace of docker ps shows the following:-

futex(0xc820062908, FUTEX_WAKE, 1) = 1

socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 4

setsockopt(4, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0

connect(4, {sa_family=AF_LOCAL, sun_path="/var/run/docker.sock"}, 23) = 0

epoll_create1(EPOLL_CLOEXEC) = 5

epoll_ctl(5, EPOLL_CTL_ADD, 4, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1170749536, u64=139639147477088}}) = 0

getsockname(4, {sa_family=AF_LOCAL, NULL}, [2]) = 0

getpeername(4, {sa_family=AF_LOCAL, sun_path="/var/run/docker.sock"}, [23]) = 0

futex(0xc820062908, FUTEX_WAKE, 1) = 1

read(4, 0xc82034b000, 4096) = -1 EAGAIN (Resource temporarily unavailable)

write(4, "GET /v1.24/containers/json HTTP/"..., 95) = 95

futex(0xc820062d08, FUTEX_WAKE, 1) = 1

futex(0x132ccc8, FUTEX_WAIT, 0, NULL^CProcess 154003 detached

kubelet logs show some interesting lines which are likely a result of docker being down

GET /healthz: (161.703µs) 500

8160 [running]:

netes/pkg/httplog.(*respLogger).recordStatus(0xc4216650a0, 0x1f4)

io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/httplog/log.go:219 +0xbb

netes/pkg/httplog.(*respLogger).WriteHeader(0xc4216650a0, 0x1f4)

io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/httplog/log.go:198 +0x35

or(0x5c578a0, 0xc4216650a0, 0xc4210e0210, 0xa3, 0x1f4)

o/src/net/http/server.go:1738 +0xda

netes/pkg/healthz.handleRootHealthz.func1(0x5c578a0, 0xc4216650a0, 0xc421bfb680)

io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/healthz/healthz.go:109 +0x490

dlerFunc.ServeHTTP(0xc4211ba0a0, 0x5c578a0, 0xc4216650a0, 0xc421bfb680)

o/src/net/http/server.go:1726 +0x44

netes/vendor/github.com/emicklei/go-restful.(*Container).HandleWithFilter.func1(0x5c578a0, 0xc4216650a0, 0xc421bfb680)

io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:301 +0x84

dlerFunc.ServeHTTP(0xc4211ba0c0, 0x5c578a0, 0xc4216650a0, 0xc421bfb680)

o/src/net/http/server.go:1726 +0x44

erveMux).ServeHTTP(0xc4211d0000, 0x5c578a0, 0xc4216650a0, 0xc421bfb680)

o/src/net/http/server.go:2022 +0x7f

netes/vendor/github.com/emicklei/go-restful.(*Container).ServeHTTP(0xc4211d2000, 0x5c578a0, 0xc4216650a0, 0xc421bfb680)

io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/emicklei/go-restful/container.go:287 +0x4d

netes/pkg/kubelet/server.(*Server).ServeHTTP(0xc4211b60a0, 0x5c578a0, 0xc4216650a0, 0xc421bfb680)

io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/server/server.go:739 +0x10e

verHandler.ServeHTTP(0xc420247d00, 0x5c5d860, 0xc422f9c270, 0xc421bfb680)

o/src/net/http/server.go:2202 +0x7d

onn).serve(0xc42332ba00, 0x5c60560, 0xc420997b00)

o/src/net/http/server.go:1579 +0x4b7

et/http.(*Server).Serve

o/src/net/http/server.go:2293 +0x44d

r output: "[+]ping ok\n[+]syncloop ok\n[-]pleg failed: PLEG took longer than expected: pleg was last seen active at 2017-07-28 18:50:44.467884266 +0000 GMT\nhealthz check failed\

ient/1.1] 127.0.0.1:57126]

12.348513 97509 container_manager_linux.go:426] errors moving "docker-containerd" pid: failed to find pid namespace of process '\U0001c064'

23.223091 97509 kubelet_pods.go:710] Error listing containers: dockertools.operationTimeout{err:context.deadlineExceededError{}}

desired_state_of_world_populator.go:182] kubeContainerRuntime.findAndRemoveDeletedPods returned error operation timeout: context deadline exceeded.

Guang Ya Liu

unread,

Jul 29, 2017, 9:55:22 PM7/29/17

to krma...@gmail.com, Kubernetes developer/contributor discussion

I think this should be caused by your app hang. There are already some issues tracing in Docker community:

a) https://github.com/moby/moby/issues/28183

b) https://github.com/moby/moby/pull/31273

One suggestion for you to check this problem is as https://docs.docker.com/engine/admin/#force-a-stack-trace-to-be-logged to see what is wrong with docker.

Thanks,

Guangya

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-dev@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/e0731622-8500-43f5-ab8d-bbf62bd54472%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

krma...@gmail.com

unread,

Aug 2, 2017, 3:52:14 AM8/2/17

to Kubernetes developer/contributor discussion, krma...@gmail.com

Thanks Guangya for the pointers.

It seems in our case, we have docker-containerd NOT running for some reason. Is this a known issue in 1.12.6 ? Note that in our docker ps hang case, we are unable to get the stack trace.

I didnt find any obvious bugs in docker which is chasing unexpected docker-containerd exits .

On a related note, does kubelet check for docker health and restart it or show that as a health signal somewhere ?

-mayank

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
To post to this group, send email to kuberne...@googlegroups.com.

David Ashpole

unread,

Aug 2, 2017, 11:07:11 AM8/2/17

to krma...@gmail.com, Kubernetes developer/contributor discussion

The kubelet reports the NodeCondition "NotReady" with a message related to the runtime (e.g. docker) if the runtime becomes unavailable. The kubelet may also restart the runtime if it becomes unresponsive.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/0d71d5db-79b6-433a-b750-1fd580c7a6cd%40googlegroups.com.

krma...@gmail.com

unread,

Aug 7, 2017, 2:08:05 AM8/7/17

to Kubernetes developer/contributor discussion, krma...@gmail.com

Thanks David. I dont believe the kubelet is restarting the runtime if it becomes unresponsive or the method by which it checks for the runtime being unresponsive is not full proof otherwise we would never see the docker hang in the first case.

In our case, docker remains in the hung state and when we manually restart docker, the problem is resolved. Is kubelet going to restart docker if it sees the docker in hung state or is it only going to restart docker-containerd ?