kubelet failing with "eviction manager: unexpected err: failed GetNode: "

6,274 views
Skip to first unread message

ayodele abejide

unread,
Feb 24, 2017, 5:19:10 PM2/24/17
to Kubernetes user discussion and Q&A
Hi,

I have searched the internet and found no answers that solve my problem.

Background:

I have a working cluster that I intended to automate it's creation via puppet and terraform, all seemed to have worked well except I find:



Feb 24 22:05:41 kube-worker02.mydomain.com kubelet[18956]: E0224 22:05:41.775438 18956 eviction_manager.go:204] eviction manager: unexpected err: failed GetNode: node 'kube-worker02.mydomain.com' not found
Feb 24 22:05:51 kube-worker02.mydomain.com kubelet[18956]: I0224 22:05:51.725950 18956 kubelet.go:1155] Image garbage collection succeeded
Feb 24 22:05:51 kube-worker02.mydomain.com kubelet[18956]: W0224 22:05:51.733365 18956 container_manager_linux.go:728] CPUAccounting not enabled for pid: 6095
Feb 24 22:05:51 kube-worker02.mydomain.com kubelet[18956]: W0224 22:05:51.733381 18956 container_manager_linux.go:731] MemoryAccounting not enabled for pid: 6095
Feb 24 22:05:51 kube-worker02.mydomain.com kubelet[18956]: I0224 22:05:51.733391 18956 container_manager_linux.go:434] Discovered runtime cgroups name: /system.slice/docker.service
Feb 24 22:05:51 kube-worker02.mydomain.com kubelet[18956]: W0224 22:05:51.733478 18956 container_manager_linux.go:728] CPUAccounting not enabled for pid: 18956
Feb 24 22:05:51 kube-worker02.mydomain.com kubelet[18956]: W0224 22:05:51.733489 18956 container_manager_linux.go:731] MemoryAccounting not enabled for pid: 18956
Feb 24 22:05:51 kube-worker02.mydomain.com kubelet[18956]: E0224 22:05:51.776184 18956 eviction_manager.go:204] eviction manager: unexpected err: failed GetNode: node 'kube-worker02.mydomain.com' not found

in the kubelet logs.

What have I tried?

I have disabled ssl, set authorization-mode=AlwaysAllow, confirmed that kube-proxy running on the same worker can reach the api-server, also confirmed that cadvisor is running:

curl localhost:4194/api/v2.0/                                                                                                                    
Supported request types: "appmetrics,attributes,events,machine,ps,spec,stats,storage,summary,version"


I have also rebooted the worker a couple of times and restarted kubelet and kube-proxy so many times.

I am at loss here and don't know what to try, any help will be appreciated.

Thanks!

Vishnu Kannan

unread,
Feb 24, 2017, 5:25:49 PM2/24/17
to Kubernetes user discussion and Q&A
Kubelet is unable to retrieve the "Node" object that represents the node its running on from the api server. Has the node successfully registered itself with the apiserver?

Relevant code that is generating the failure is here.

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

ayodele abejide

unread,
Feb 24, 2017, 5:45:21 PM2/24/17
to Kubernetes user discussion and Q&A
kubeclt get nodes returns No resources found. I do not know how to verify:


Has the node successfully registered itself with the apiserver?

Relevant code that is generating the failure is here.

I sort of have browsed the code, but don't have enough context to get enough information out of it
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.
To post to this group, send email to kubernet...@googlegroups.com.

Vishnu Kannan

unread,
Feb 24, 2017, 6:28:53 PM2/24/17
to Kubernetes user discussion and Q&A
Your node bootstrapping is failing. I'd recommend focussing on getting your node registered and ignore the eviction manager error. The latter is a red herring.

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

ayodele abejide

unread,
Feb 24, 2017, 6:39:26 PM2/24/17
to Kubernetes user discussion and Q&A
Your node bootstrapping is failing.

How do I debug this?

Vishnu Kannan

unread,
Feb 24, 2017, 6:42:18 PM2/24/17
to Kubernetes user discussion and Q&A
It depends on how you are trying to setup the cluster. I'd recommend starting with a known working solution like "kubeadm" to bootstrap your cluster before customizing custer bringup.

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

Àbéjídé Àyodélé

unread,
Feb 24, 2017, 6:45:38 PM2/24/17
to kubernet...@googlegroups.com
Like I said I have a similar cluster that I setup by hand that works, trying to automate the cluster I setup by hand is where I am running into problems, also I am interested in knowing why I am running into this problem, so I can learn for the future

Abejide Ayodele
It always seems impossible until it's done. --Nelson Mandela

--
You received this message because you are subscribed to a topic in the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-users/bpER7WJX-Jc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubernetes-users+unsubscribe@googlegroups.com.

Vishnu Kannan

unread,
Feb 24, 2017, 6:52:03 PM2/24/17
to Kubernetes user discussion and Q&A, kubernetes-sig-c...@googlegroups.com
+sig-cluster-lifecycle

On Fri, Feb 24, 2017 at 3:45 PM, Àbéjídé Àyodélé <abejide...@gmail.com> wrote:
Like I said I have a similar cluster that I setup by hand that works, trying to automate the cluster I setup by hand is where I am running into problems, also I am interested in knowing why I am running into this problem, so I can learn for the future

Abejide Ayodele
It always seems impossible until it's done. --Nelson Mandela

Àbéjídé Àyodélé

unread,
Feb 24, 2017, 7:02:00 PM2/24/17
to kubernet...@googlegroups.com, kubernetes-sig-c...@googlegroups.com
Found this in kube-proxy logs, not sure how useful this is:

Feb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: W0224 23:55:15.159640    3039 server.go:468] Failed to retrieve node info: nodes "kube-worker01.mydomain.com" not found
Feb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: W0224 23:55:15.160308    3039 proxier.go:249] invalid nodeIP, initialize kube-proxy with 127.0.0.1 as nodeIP
Feb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: W0224 23:55:15.160596    3039 proxier.go:254] clusterCIDR not specified, unable to distinguish between internal and external traffic
Feb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: I0224 23:55:15.160832    3039 server.go:227] Tearing down userspace rules.
Feb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: I0224 23:55:15.161206    3039 healthcheck.go:119] Initializing kube-proxy health
checker
Feb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: I0224 23:55:15.170297    3039 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
Feb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: I0224 23:55:15.170731    3039 conntrack.go:66] Setting conntrack hashsize to 32768
Feb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: I0224 23:55:15.170920    3039 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
Feb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: I0224 23:55:15.170950    3039 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
Feb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: I0224 23:55:15.172295    3039 proxier.go:802] Not syncing iptables until Services and Endpoints have been received from master
Feb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: I0224 23:55:15.173529    3039 proxier.go:472] Adding new service "default/kubernetes:https" at 172.32.0.1:443/TCP

Abejide Ayodele
It always seems impossible until it's done. --Nelson Mandela

abejide...@getbraintree.com

unread,
Feb 28, 2017, 9:49:45 PM2/28/17
to Kubernetes user discussion and Q&A, kubernetes-sig-c...@googlegroups.com
On Friday, February 24, 2017 at 6:02:00 PM UTC-6, ayodele abejide wrote:
> Found this in kube-proxy logs, not sure how useful this is:
>
>
>
>
>
>
>
>
>
> Feb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: W0224 23:55:15.159640    3039 server.go:468] Failed to retrieve node info: nodes "kube-worker01.mydomain.com" not foundFeb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: W0224 23:55:15.160308    3039 proxier.go:249] invalid nodeIP, initialize kube-proxy with 127.0.0.1 as nodeIPFeb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: W0224 23:55:15.160596    3039 proxier.go:254] clusterCIDR not specified, unable to distinguish between internal and external trafficFeb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: I0224 23:55:15.160832    3039 server.go:227] Tearing down userspace rules.Feb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: I0224 23:55:15.161206    3039 healthcheck.go:119] Initializing kube-proxy healthcheckerFeb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: I0224 23:55:15.170297    3039 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072Feb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: I0224 23:55:15.170731    3039 conntrack.go:66] Setting conntrack hashsize to 32768Feb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: I0224 23:55:15.170920    3039 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400Feb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: I0224 23:55:15.170950    3039 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600Feb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: I0224 23:55:15.172295    3039 proxier.go:802] Not syncing iptables until Services and Endpoints have been received from masterFeb 24 23:55:15 kube-worker01.mydomain.com kube-proxy[3039]: I0224 23:55:15.173529    3039 proxier.go:472] Adding new service "default/kubernetes:https" at 172.32.0.1:443/TCP
> You received this message because you are subscribed to a topic in the Google Groups "Kubernetes user discussion and Q&A" group.
>
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-users/bpER7WJX-Jc/unsubscribe.
>
> To unsubscribe from this group and all its topics, send an email to kubernetes-use...@googlegroups.com.
>
> To post to this group, send email to kubernet...@googlegroups.com.
>
> Visit this group at https://groups.google.com/group/kubernetes-users.
>
> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
>
>
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.
>
> To post to this group, send email to kubernet...@googlegroups.com.
>
> Visit this group at https://groups.google.com/group/kubernetes-users.
>
> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
>
>
>
> --
>
> You received this message because you are subscribed to a topic in the Google Groups "Kubernetes user discussion and Q&A" group.
>
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-users/bpER7WJX-Jc/unsubscribe.
>
> To unsubscribe from this group and all its topics, send an email to kubernetes-use...@googlegroups.com.
>
> To post to this group, send email to kubernet...@googlegroups.com.
>
> Visit this group at https://groups.google.com/group/kubernetes-users.
>
> For more options, visit https://groups.google.com/d/optout.

Bump...

I still don't have a solve to this problem

jda...@redhat.com

unread,
Jan 19, 2018, 11:18:11 AM1/19/18
to Kubernetes user discussion and Q&A
I had the same issue, which was caused by the fact that my kube-controller-manager was misconfigured and therefore not running.

I'd suggest checking logs for every daemon process and see "what else is broken", in case that something else is the primary cause.

Jan 19 17:01:04 nixos kube-controller-manager[18471]: F0119 17:01:04.695313 18471 node_controller.go:262] Controller: Invalid clusterCIDR, mask size of clusterCIDR must be less than nodeCIDRMaskSize.
Jan 19 17:01:04 nixos systemd[1]: kube-controller-manager.service: Main process exited, code=exited, status=255/n/a
Jan 19 17:01:04 nixos systemd[1]: kube-controller-manager.service: Unit entered failed state.
Jan 19 17:01:04 nixos systemd[1]: kube-controller-manager.service: Failed with result 'exit-code'.

ayodele abejide

unread,
Jan 19, 2018, 11:23:04 AM1/19/18
to Kubernetes user discussion and Q&A
The problem in my case was because I had a broken kubeconfig(bad YAML) for kubelet, and later found kubelet ignores the kubeconfig flag if require-kubeconfig arg is not passed. 

jda...@redhat.com

unread,
Jan 19, 2018, 11:30:14 AM1/19/18
to Kubernetes user discussion and Q&A
On Friday, January 19, 2018 at 5:23:04 PM UTC+1, ayodele abejide wrote:
> The problem in my case was because I had a broken kubeconfig(bad YAML) for kubelet, and later found kubelet ignores the kubeconfig flag if require-kubeconfig arg is not passed. 

That should be different in current version 1.9 though (and maybe even before, cannot find it now). There, kubeconfig is always being read and --require-kubeconfig is ignored, option to be removed in 1.10.

https://github.com/kubernetes/kubernetes/blob/b7100f1ee7231617891a100dd34b3490a1f578e4/cmd/kubelet/app/options/options.go#L314

Abejide Ayodele

unread,
Jan 19, 2018, 11:38:05 AM1/19/18
to kubernet...@googlegroups.com
There, kubeconfig is always being read and --require-kubeconfig is ignored, option to be removed in 1.10.

I know, this convo was from almost a year ago when we were still running 1.5.2

--
You received this message because you are subscribed to a topic in the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-users/bpER7WJX-Jc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.



--

Note: this information is confidential. It is prohibited to share, post online or otherwise publicize without Braintree's prior written consent.

Reply all
Reply to author
Forward
0 new messages