can not deploy pods when using kubelet-wrapper with rkt

310 views
Skip to first unread message

Stefan Vetter

unread,
Aug 9, 2016, 12:09:00 PM8/9/16
to CoreOS Dev
Hi all,

first of all, I still have error messages when using a downloaded kubelet, but pods get deployed.


Using kubernetes 1.3.2 and rkt 1.10.1, coreos 1068.8.0


Here are my findings (kubelet-wrapper):

Aug 09 14:35:09 node3.cluster1.kubernetes.cluster.int sudo[4487]: E0809 14:35:09.942522    4523 kubelet.go:2594] Failed cleaning pods: couldn't list pods: rpc error: code = 4 desc = "context deadline exce

eded"

Aug 09 14:35:09 node3.cluster1.kubernetes.cluster.int sudo[4487]: E0809 14:35:09.942455    4523 kubelet.go:2247] Error listing containers: &errors.errorString{s:"couldn't list pods: rpc error: code = 4 de

sc = \"context deadline exceeded\""}

Aug 09 14:35:04 node3.cluster1.kubernetes.cluster.int sudo[4487]: E0809 14:35:04.942120    4523 generic.go:197] GenericPLEG: Unable to retrieve pods: couldn't list pods: rpc error: code = 4 desc = "contex

t deadline exceeded"

Aug 09 14:35:04 node3.cluster1.kubernetes.cluster.int sudo[4487]: E0809 14:35:04.941377    4523 kubelet.go:927] Container garbage collection failed: rpc error: code = 4 desc = "context deadline exceeded"

Aug 09 14:35:04 node3.cluster1.kubernetes.cluster.int sudo[4487]: E0809 14:35:04.941322    4523 rkt.go:1866] rkt: Failed to list pods: rpc error: code = 4 desc = "context deadline exceeded"

Aug 09 14:33:04 node3.cluster1.kubernetes.cluster.int sudo[4487]: E0809 14:33:04.944400    4523 manager.go:235] Docker container factory registration failed: failed to validate Docker info: failed to dete

ct Docker info: Cannot connect to the Docker daemon. Is the docker daemon running on this host?.

Aug 09 14:33:04 node3.cluster1.kubernetes.cluster.int sudo[4487]: E0809 14:33:04.940714    4523 kubelet.go:933] Image garbage collection failed: unable to find data for container /




Here are my findings (without wrapper):


E0809 16:00:35.278281   46910 pod_workers.go:183] Error syncing pod ff3cfae6-5e48-11e6-9c00-001999c214fd, skipping: failed to SyncPod: couldn't list pods: rpc error: code = 4 desc = "context deadline exceeded"

E0809 16:02:34.972125   46910 kubelet.go:2247] Error listing containers: &errors.errorString{s:"couldn't list pods: rpc error: code = 4 desc = \"context deadline exceeded\""}

E0809 16:02:34.972156   46910 kubelet.go:2594] Failed cleaning pods: couldn't list pods: rpc error: code = 4 desc = "context deadline exceeded"

E0809 16:02:37.278562   46910 generic.go:197] GenericPLEG: Unable to retrieve pods: couldn't list pods: rpc error: code = 4 desc = "context deadline exceeded"

E0809 16:02:39.183965   46910 rkt.go:65] detectRktContainers: listRunningPods failed: rpc error: code = 2 desc = "grpc: the client connection is closing"




Thanks for any very appreaciated help!



Stefan

Stefan Vetter

unread,
Aug 9, 2016, 10:41:25 PM8/9/16
to CoreOS Dev
To add this:

Also with kubernetes (hyperkube) 1.3.4 and rkt 1.9.1 I am experiencing this issue...

Stefan Vetter

unread,
Aug 9, 2016, 10:42:09 PM8/9/16
to CoreOS Dev
Maybe I should just switch back to docker as it seams rkt does not work with kubelet properly... :-(


Am Dienstag, 9. August 2016 18:09:00 UTC+2 schrieb Stefan Vetter:

Euan Kemp

unread,
Aug 10, 2016, 12:36:52 PM8/10/16
to CoreOS Dev
The logs of the "rkt-api" unit are a good place to check when you get rpc errors because that's the service it's trying to communicate with.

I recall getting a similar error with older versions of rkt due to a bug in the api-service leading to a panic. Updating to a newer rkt version (which can be done by updating to a newer CoreOS release) might help, but if not the api-service logs would help us figure out what's going on.

Best,
Euan

Stefan Vetter

unread,
Aug 11, 2016, 4:14:09 PM8/11/16
to CoreOS Dev
Current status:

Problems still existing:
1. Time until a container is created is very long (at least 2-3minutes)
2. Time until the new status of the container can be seen in the master/apiserver: several minutes

Problems solved:
1. Containers can be created/managed now

Actions to get the problem solved:
1. Create directory /var/lib/polkit-1
2. Add Slice=machine.slice to rkt-api.service


If you have any ideas on this problem, please any time send me an answer.


Errors in log:
Aug 11 20:12:11 node1.cluster1.kubernetes.cluster.int sudo[5671]: E0811 20:12:11.665421    5687 generic.go:197] GenericPLEG: Unable to retrieve pods: couldn't list pods: rpc error: code = 4 desc = "context deadline exceeded"
Aug 11 20:12:13 node1.cluster1.kubernetes.cluster.int sudo[5613]: api-service: checking for machined registration failed
Aug 11 20:12:13 node1.cluster1.kubernetes.cluster.int sudo[5613]:   └─pod not found
Aug 11 20:12:33 node1.cluster1.kubernetes.cluster.int sudo[5671]: E0811 20:12:33.596852    5687 kubelet.go:2249] Error listing containers: &errors.errorString{s:"couldn't list pods: rpc error: code = 4 desc = \"context deadline exceeded\""}
Aug 11 20:12:33 node1.cluster1.kubernetes.cluster.int sudo[5671]: E0811 20:12:33.596903    5687 kubelet.go:2596] Failed cleaning pods: couldn't list pods: rpc error: code = 4 desc = "context deadline exceeded"
Aug 11 20:12:34 node1.cluster1.kubernetes.cluster.int sudo[5613]: api-service: checking for machined registration failed
Aug 11 20:12:34 node1.cluster1.kubernetes.cluster.int sudo[5613]:   └─pod not found
Aug 11 20:12:57 node1.cluster1.kubernetes.cluster.int sudo[5671]: W0811 20:12:57.606672    5687 image_manager.go:135] [ImageManager] Failed to monitor images: couldn't list pods: rpc error: code = 4 desc = "context deadline exceeded"
Aug 11 20:12:58 node1.cluster1.kubernetes.cluster.int sudo[5613]: api-service: checking for machined registration failed
Aug 11 20:12:58 node1.cluster1.kubernetes.cluster.int sudo[5613]:   └─pod not found


Thanks for any help!

Stefan Vetter

unread,
Aug 11, 2016, 4:27:27 PM8/11/16
to CoreOS Dev
Update: looks like it is not working on all machines :-(

And I don't see any additional log entries there:
Aug 11 20:24:39 node3.cluster1.kubernetes.cluster.int sudo[32844]: api-service: checking for machined registration failed
Aug 11 20:24:39 node3.cluster1.kubernetes.cluster.int sudo[32844]:   └─pod not found
Aug 11 20:25:43 node3.cluster1.kubernetes.cluster.int sudo[32845]: E0811 20:25:43.291881   32877 generic.go:197] GenericPLEG: Unable to retrieve pods: couldn't list pods: rpc error: code = 4 desc = "context deadline exceeded"
Aug 11 20:25:44 node3.cluster1.kubernetes.cluster.int sudo[32845]: E0811 20:25:44.286288   32877 pod_workers.go:183] Error syncing pod 7ce192e0-5ffe-11e6-9211-001999c214fd, skipping: failed to SyncPod: couldn't list pods: rpc error: code = 4 desc = "context deadline exceeded"
Aug 11 20:25:44 node3.cluster1.kubernetes.cluster.int sudo[32844]: api-service: checking for machined registration failed
Aug 11 20:25:44 node3.cluster1.kubernetes.cluster.int sudo[32844]:   └─pod not found
Aug 11 20:25:45 node3.cluster1.kubernetes.cluster.int sudo[32845]: E0811 20:25:45.285416   32877 kubelet.go:2249] Error listing containers: &errors.errorString{s:"couldn't list pods: rpc error: code = 4 desc = \"context deadline exceeded\""}
Aug 11 20:25:45 node3.cluster1.kubernetes.cluster.int sudo[32845]: E0811 20:25:45.285506   32877 kubelet.go:2596] Failed cleaning pods: couldn't list pods: rpc error: code = 4 desc = "context deadline exceeded"
Aug 11 20:25:45 node3.cluster1.kubernetes.cluster.int sudo[32844]: api-service: checking for machined registration failed
Aug 11 20:25:45 node3.cluster1.kubernetes.cluster.int sudo[32844]:   └─pod not found
Aug 11 20:26:28 node3.cluster1.kubernetes.cluster.int sudo[32844]: api-service: checking for machined registration failed
Aug 11 20:26:28 node3.cluster1.kubernetes.cluster.int sudo[32844]:   └─pod not found
Aug 11 20:26:34 node3.cluster1.kubernetes.cluster.int sudo[32844]: api-service: checking for machined registration failed
Aug 11 20:26:34 node3.cluster1.kubernetes.cluster.int sudo[32844]:   └─pod not found
Aug 11 20:26:39 node3.cluster1.kubernetes.cluster.int sudo[32845]: E0811 20:26:39.285875   32877 rkt.go:1859] rkt: Failed to list pods: rpc error: code = 4 desc = "context deadline exceeded"
Aug 11 20:26:39 node3.cluster1.kubernetes.cluster.int sudo[32845]: E0811 20:26:39.285913   32877 kubelet.go:927] Container garbage collection failed: rpc error: code = 4 desc = "context deadline exceeded"
Aug 11 20:26:48 node3.cluster1.kubernetes.cluster.int sudo[32844]: api-service: checking for machined registration failed
Aug 11 20:26:48 node3.cluster1.kubernetes.cluster.int sudo[32844]:   └─pod not found
Aug 11 20:26:48 node3.cluster1.kubernetes.cluster.int sudo[32844]: api-service: checking for machined registration failed
Aug 11 20:26:48 node3.cluster1.kubernetes.cluster.int sudo[32844]:   └─pod not found
Aug 11 20:26:49 node3.cluster1.kubernetes.cluster.int sudo[32844]: api-service: checking for machined registration failed
Aug 11 20:26:49 node3.cluster1.kubernetes.cluster.int sudo[32844]:   └─pod not found
Aug 11 20:26:50 node3.cluster1.kubernetes.cluster.int sudo[32844]: api-service: checking for machined registration failed
Aug 11 20:26:50 node3.cluster1.kubernetes.cluster.int sudo[32844]:   └─pod not found
Aug 11 20:26:50 node3.cluster1.kubernetes.cluster.int sudo[32844]: api-service: checking for machined registration failed
Aug 11 20:26:50 node3.cluster1.kubernetes.cluster.int sudo[32844]:   └─pod not found
Aug 11 20:27:03 node3.cluster1.kubernetes.cluster.int sudo[32845]: W0811 20:27:03.369172   32877 manager.go:1045] Failed to process watch event {EventType:0 Name:/user.slice/user-0.slice/session-c57.scope WatchSource:1}: this should be impossible!, new handler failing, but factory allowed, name = /user.slice/user-0.slice/session-c57.scope
Aug 11 20:27:03 node3.cluster1.kubernetes.cluster.int sudo[32844]: api-service: checking for machined registration failed
Aug 11 20:27:03 node3.cluster1.kubernetes.cluster.int sudo[32844]:   └─pod not found
Aug 11 20:27:03 node3.cluster1.kubernetes.cluster.int sudo[32844]: api-service: checking for machined registration failed
Aug 11 20:27:03 node3.cluster1.kubernetes.cluster.int sudo[32844]:   └─pod not found
Aug 11 20:27:05 node3.cluster1.kubernetes.cluster.int sudo[32844]: api-service: checking for machined registration failed
Aug 11 20:27:05 node3.cluster1.kubernetes.cluster.int sudo[32844]:   └─pod not found
Aug 11 20:27:09 node3.cluster1.kubernetes.cluster.int sudo[32844]: api-service: checking for machined registration failed
Aug 11 20:27:09 node3.cluster1.kubernetes.cluster.int sudo[32844]:   └─pod not found

Euan Kemp

unread,
Aug 16, 2016, 5:30:54 PM8/16/16
to coreo...@googlegroups.com
To close the loop on this, we chatted offline a bit and I'm pretty sure this is related to a bug in the rkt api-service wherein rkt fly pods result in an increased latency for `ListPods` calls, leading to the kubelet timing out contacting the api-service.

I've filed a rkt issue upstream and am working on a fix for that bug: https://github.com/coreos/rkt/issues/3060

Minimzing the number of running fly pods should reduce the impact of this issue, though if it doesn't we might be dealing with a difference issue entirely. In this case, it did seem to help.

Reply all
Reply to author
Forward
0 new messages