Cause for rescheduling a pod

1,183 views
Skip to first unread message

Matt Hughes

unread,
Aug 30, 2016, 12:42:11 PM8/30/16
to Kubernetes developer/contributor discussion
What, if anything, would cause a pod to get rescheduled on another node?  I can think of at least three potential candidates:

* container repeatedly exits with non-zero exit code
* container repeatedly fails readiness health check
* container repeatedly fails liveliness health check

I could have sworn k8s rescheduled on the first condition, but I have a pod with 1k restarts, so pretty sure that's not true.  I also could have sworn I read about this but can't find it mentioned in the documentation anymore.  I should mention that I'm using deployments to schedule my pods.


In my case, my pod binds to a port in the unreserved range.  One on VM I was unlucky enough to have another process bind that port and the pod repeatedly failed to bind, causing it to restart.  There are other remedies to this, but why not try rescheduling this pod on another node?  There are lots of things that can happen to a single node that might cause pod failure: disk space, network connectivity, etc.  Try the same pod on another node in your cluster and it will work.

Clayton Coleman

unread,
Aug 30, 2016, 12:46:48 PM8/30/16
to Matt Hughes, Kubernetes developer/contributor discussion
The things that move pods off nodes:

1. User or admin manually deletes the pod
2. Admin drains a node (which just delete pods)

In the future there may be:

3. Rescheduler - a component that detects poorly scheduled nodes and corrects them by deleting pods

If this happens in the context of a deployment, the readiness check and the health check are part of what would detect the pod is bad and start it somewhere else.  If you set a readiness check, your "bound to a port" case would result in the pod failing and the deployment leaving old pods around (until you resolved the issue).


--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-dev@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/2e9a9bc4-cd9a-4b2f-bc22-d827d9814095%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Prashanth B

unread,
Aug 30, 2016, 12:54:53 PM8/30/16
to Matt Hughes, Kubernetes developer/contributor discussion
On Tue, Aug 30, 2016 at 9:42 AM, Matt Hughes <hughe...@gmail.com> wrote:
What, if anything, would cause a pod to get rescheduled on another node?  I can think of at least three potential candidates:

* container repeatedly exits with non-zero exit code
* container repeatedly fails readiness health check
* container repeatedly fails liveliness health check

I could have sworn k8s rescheduled on the first condition, but I have a pod with 1k restarts, so pretty sure that's not true.  I also could have sworn I read about this but can't find it mentioned in the documentation anymore.  I should mention that I'm using deployments to schedule my pods.


In my case, my pod binds to a port in the unreserved range.  One on VM I was unlucky enough to have another process bind that port and the pod repeatedly failed to bind, causing it to restart. 

you need to set hostPort in 2 pods and the scheduler won't put them on the same node. The scheduler/kubelet don't understand that you have a non-kube process running on a node consuming resources. 

There are other remedies to this, but why not try rescheduling this pod on another node?  There are lots of things that can happen to a single node that might cause pod failure: disk space, network connectivity, etc. 

some of these are surfaced as node NotReady, and the node controller deletes all pods on the node. Provided you have an RC governing your pods, they'll get recreated and the scheduler will reassign them to healthy nodes. 
 
Try the same pod on another node in your cluster and it will work.

--

Daniel Smith

unread,
Aug 30, 2016, 12:55:55 PM8/30/16
to Clayton Coleman, Matt Hughes, Kubernetes developer/contributor discussion
4. Node is NotReady for > 5 minutes.
4.1. Node is on a network segment for > 5 minutes.
5. Pod's namespace is deleted. (although in this case the pod doesn't come back.)
6. Misconfigured and therefore fighting controllers (ControllerRef is going to address this)

The OP's conditions cause *local* restarts, and don't cause the pod to get started on another node. (And will quickly go into backoff states so that kubelet doesn't waste too much time.)

On Tue, Aug 30, 2016 at 9:46 AM, Clayton Coleman <ccol...@redhat.com> wrote:
The things that move pods off nodes:

1. User or admin manually deletes the pod
2. Admin drains a node (which just delete pods)

In the future there may be:

3. Rescheduler - a component that detects poorly scheduled nodes and corrects them by deleting pods

If this happens in the context of a deployment, the readiness check and the health check are part of what would detect the pod is bad and start it somewhere else.  If you set a readiness check, your "bound to a port" case would result in the pod failing and the deployment leaving old pods around (until you resolved the issue).

On Tue, Aug 30, 2016 at 12:42 PM, Matt Hughes <hughe...@gmail.com> wrote:
What, if anything, would cause a pod to get rescheduled on another node?  I can think of at least three potential candidates:

* container repeatedly exits with non-zero exit code
* container repeatedly fails readiness health check
* container repeatedly fails liveliness health check

I could have sworn k8s rescheduled on the first condition, but I have a pod with 1k restarts, so pretty sure that's not true.  I also could have sworn I read about this but can't find it mentioned in the documentation anymore.  I should mention that I'm using deployments to schedule my pods.


In my case, my pod binds to a port in the unreserved range.  One on VM I was unlucky enough to have another process bind that port and the pod repeatedly failed to bind, causing it to restart.  There are other remedies to this, but why not try rescheduling this pod on another node?  There are lots of things that can happen to a single node that might cause pod failure: disk space, network connectivity, etc.  Try the same pod on another node in your cluster and it will work.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com.
To post to this group, send email to kuberne...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-dev@googlegroups.com.

Derek Carr

unread,
Aug 31, 2016, 8:05:18 AM8/31/16
to Kubernetes developer/contributor discussion
7. Eviction from Kubelet due to node memory pressure if you have configured eviction thresholds
8. Eviction from Kubelet due to disk pressure (1.4 feature)

Thanks

Tim Hockin

unread,
Aug 31, 2016, 11:26:34 AM8/31/16
to Matt Hughes, Kubernetes developer/contributor discussion
On Tue, Aug 30, 2016 at 9:42 AM, Matt Hughes <hughe...@gmail.com> wrote:

> In my case, my pod binds to a port in the unreserved range. One on VM I was
> unlucky enough to have another process bind that port and the pod repeatedly
> failed to bind, causing it to restart. There are other remedies to this,
> but why not try rescheduling this pod on another node? There are lots of
> things that can happen to a single node that might cause pod failure: disk
> space, network connectivity, etc. Try the same pod on another node in your
> cluster and it will work.

This right here is why hostPorts ought to be a very last resort.

Brian Grant

unread,
Aug 31, 2016, 11:47:16 AM8/31/16
to Matt Hughes, Kubernetes developer/contributor discussion
As Clayton mentioned, the plan is to use the "rescheduler" for this case. It's mentioned in the proposal:

It just hasn't been implemented yet.

On Tue, Aug 30, 2016 at 9:42 AM, Matt Hughes <hughe...@gmail.com> wrote:

--
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages