GKE 1.4.6 nodes' kubernetes.service crash looping looking for dead Docker

207 views
Skip to first unread message

Brad Fitzpatrick

unread,
Nov 23, 2016, 11:57:31 AM11/23/16
to kubernet...@googlegroups.com
I upgraded Go's build system to GKE 1.4.6 the other day.

I'm now finding Docker dying and the GKE nodes to go unhealthy (their kubernetes.service crash looping on boot, looking for Docker).

$ kubectl get nodes
NAME                                       STATUS     AGE
gke-buildlets-default-pool-a4a240f8-0pf0   NotReady   10h
gke-buildlets-default-pool-a4a240f8-7uwv   NotReady   10h
gke-buildlets-default-pool-a4a240f8-spbd   NotReady   10h

..

  Ready                 Unknown         Wed, 23 Nov 2016 10:12:50 +0000         Wed, 23 Nov 2016 10:13:35 +0000         NodeStatusUnknown               Kubelet stopped posting node status.  

....

If I ssh into one of the nodes and sudo journalctl -f, I see kubernetes.service repeatedly crash looping with:

Nov 23 16:24:22 gke-buildlets-default-pool-a4a240f8-7uwv systemd[602]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Nov 23 16:24:22 gke-buildlets-default-pool-a4a240f8-7uwv systemd[602]: kubelet.service: Unit entered failed state.
Nov 23 16:24:22 gke-buildlets-default-pool-a4a240f8-7uwv systemd[602]: kubelet.service: Failed with result 'exit-code'.
Nov 23 16:24:22 gke-buildlets-default-pool-a4a240f8-7uwv kubelet[9136]: error: failed to run Kubelet: failed to create kubelet: failed to get runtime 
version: docker: failed to get docker version: Cannot connect to the Docker daemon. Is the docker daemon running on this host?


And yup, Docker is dead:

# journalctl -u docker.service --since="10 hours ago" 
....
....
Nov 23 11:40:28 gke-buildlets-default-pool-a4a240f8-7uwv docker[18440]: time="2016-11-23T11:40:28.546703577Z" level=warning msg="container 464da48e2869e518e7bc5e052ced1e25b8ce70ba15168d1d34599b825dc61519 restart canceled"
Nov 23 11:40:28 gke-buildlets-default-pool-a4a240f8-7uwv docker[18440]: time="2016-11-23T11:40:28.766556953Z" level=warning msg="container d1f1425cfe29a34783668a39bb50ad5db3f218a7d0caae12d8ba81387776b708 restart canceled"
Nov 23 11:40:33 gke-buildlets-default-pool-a4a240f8-7uwv docker[18440]: time="2016-11-23T11:40:33.516533433Z" level=error msg="Force shutdown daemon"
Nov 23 11:40:33 gke-buildlets-default-pool-a4a240f8-7uwv docker[18440]: time="2016-11-23T11:40:33Z" level=info msg="stopping containerd after receiving terminated"
Nov 23 11:40:33 gke-buildlets-default-pool-a4a240f8-7uwv docker[18440]: time="2016-11-23T11:40:33Z" level=fatal msg="containerd: serve grpc" error="accept unix /var/run/docker/libcontainerd/docker-containerd.sock: use of closed network connection"
Nov 23 11:40:34 gke-buildlets-default-pool-a4a240f8-7uwv sh[4983]: + [ ! -s /var/lib/docker/repositories-overlay ]
Nov 23 11:40:34 gke-buildlets-default-pool-a4a240f8-7uwv sh[4983]: + rm -f /var/lib/docker/repositories-overlay
Nov 23 11:40:59 gke-buildlets-default-pool-a4a240f8-7uwv sh[5113]: + [ ! -s /var/lib/docker/repositories-overlay ]
Nov 23 11:40:59 gke-buildlets-default-pool-a4a240f8-7uwv sh[5113]: + rm -f /var/lib/docker/repositories-overlay
....
Nov 23 11:47:35 gke-buildlets-default-pool-a4a240f8-7uwv sh[15220]: + [ ! -s /var/lib/docker/repositories-overlay ]Nov 23 11:47:35 gke-buildlets-default-pool-a4a240f8-7uwv sh[15220]: + rm -f /var/lib/docker/repositories-overlay
Nov 23 11:47:36 gke-buildlets-default-pool-a4a240f8-7uwv docker[15226]: time="2016-11-23T11:47:36.494367508Z" level=fatal msg="Error starting daemon: Error initializing network controller: Error creating default \"bridge\" network: failed to allocate gateway (169.254.123.1): Address already in use"
Nov 23 11:47:36 gke-buildlets-default-pool-a4a240f8-7uwv sh[15264]: + [ ! -s /var/lib/docker/repositories-overlay ]
Nov 23 11:47:36 gke-buildlets-default-pool-a4a240f8-7uwv sh[15264]: + rm -f /var/lib/docker/repositories-overlay
Nov 23 11:47:37 gke-buildlets-default-pool-a4a240f8-7uwv docker[15270]: time="2016-11-23T11:47:37.607995725Z" level=fatal msg="Error starting daemon: Error initializing network controller: Error creating default \"bridge\" network: failed to allocate gateway (169.254.123.1): Address already in use"

And I can't restart docker:

# systemctl start docker.service
Nov 23 16:53:55 gke-buildlets-default-pool-a4a240f8-7uwv sh[15731]: + [ ! -s /var/lib/docker/repositories-overlay ]
Nov 23 16:53:55 gke-buildlets-default-pool-a4a240f8-7uwv sh[15731]: + rm -f /var/lib/docker/repositories-overlay
Nov 23 16:53:56 gke-buildlets-default-pool-a4a240f8-7uwv docker[15737]: time="2016-11-23T16:53:56.413732453Z" level=fatal msg="Error starting daemon: Error init
ializing network controller: Error creating default \"bridge\" network: failed to allocate gateway (169.254.123.1): Address already in use"
Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.


# docker --version
Docker version 1.11.2, build 4dc5990


Any clues?

Is this a known issue?


Robert Bailey

unread,
Nov 23, 2016, 11:59:00 AM11/23/16
to kubernet...@googlegroups.com
What version did you upgrade from? Was it 1.3.x or an earlier 1.4 release?

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

brad...@gmail.com

unread,
Nov 23, 2016, 12:06:36 PM11/23/16
to Kubernetes user discussion and Q&A
Upgrade was perhaps a misleading word.

I nuked the world and recreated it.

(But it was 1.2 before. But none of that state remains.)
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.
>
> To post to this group, send email to kubernet...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages