There were no other non-apiserver pods running on those nodes.
I still have not pieced everything together yet, but what appears to have happened is:
1) workers temporarily lost communication with the master
2) saw lots of "failed to delete pod" errors in logs
3) master communication was recovered
4) lots of nodes were in inconsistent states when it comes to kubernetes / docker. `docker ps` showed 5 pods running; `kubectl describe node x` showed 3 pods running.
I restarted all worker nodes. That resolved the inconsistent state described in 4, though that did not resolved the "OutOfmemory" issues. It wasn't until I did a rolling restart of all my master nodes did that go away. My *guess* is that the replication controller was stuck in a weird loop. Does that sound like a reasonable explanation? Any additional data I could collect that might help get to a root cause?
On Friday, December 9, 2016 at 5:30:50 PM UTC-5, Yu-Ju Hong wrote:
https://github.com/kubernetes/kubernetes/blob/v1.4.5/pkg/kubelet/kubelet.go#L2195
Kubelet could not admit the pod because there wasn't enough memory on the node. Normally this would not happen because scheduler runs the same predicates to make the decision. However, if there are some non-apiserver pods (e.g., pod manifest files) on the node, scheduler would not have the complete picture until kubelet reports these pods. This create a small window where kubelet may reject pods assigned to it due to insufficient resources.
Getting the following line in my k8s event stream and can't figure out where it's coming from or what it means.
devtest 2016-12-09 21:17:54 +0000 UTC 2016-12-09 21:17:54 +0000 UTC 1 foo-2327802694-e49ie Pod Warning OutOfmemory {kubelet 96.118.51.255}
devtest 2016-12-09 21:17:55 +0000 UTC 2016-12-09 21:17:55 +0000 UTC 1 foo-2327802694-3xhau Pod Normal Scheduled {default-scheduler } Successfully assigned foo-2327802694-3xhau to 96.118.51.81
What does the OutOfmemory error mean when scoped to a Pod? I can't find that error string anywhere in the k8s code. Running k8s 1.4.5.
--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.