CNV networking Senior Software Engineer
Hi All,Looking at the following issue [1] the problem is related to the nodes not being able to create tap devices and this ends up propagating a critical network error, I was expecting this to mark the VMI as Failed and not being re-enqueue. Looking at the code looks like we are always re-enqueuing on error [2] shouldn't kubevirt to stop re-enqueing if the error is critical ?
--Quique LlorenteCNV networking Senior Software Engineer
--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CAHVoYmLU%2BcdE2w6JaaPJE8eSJ3ti_QneqjgiYB%2BSoZLMHhCZXA%40mail.gmail.com.
Hi,On Thu, Jun 17, 2021 at 10:51 AM Felix Enrique Llorente Pastora <ello...@redhat.com> wrote:Hi All,Looking at the following issue [1] the problem is related to the nodes not being able to create tap devices and this ends up propagating a critical network error, I was expecting this to mark the VMI as Failed and not being re-enqueue. Looking at the code looks like we are always re-enqueuing on error [2] shouldn't kubevirt to stop re-enqueing if the error is critical ?Right now we behave here similar to the kubelet on errors. The kubelet will for instance also indefinitely try to create a pod, even if there is a permanent error on the node and the container sandboxes can't be created.This approach has pros- and cons. The con is that your workload can be stuck indefinitely if you don't resolve the error somehow, which can worst-case lead to full downtimes if your application can not handle replicas.
The pro is that in combination with the retry back-off, we don't overload the cluster with recreated pods which likely will end up on the same node again and fail there again. By accident, David just posted a mail [3] wherehe describes what can happen to a cluster if workloads just fail and get rescheduled.It is btw. somethwing which we are seeing pretty often also in kubevirt CI if nodes have issues. A node with issues tends to kill pods fast, leading to fast re-scheduling to exactly this node, because it is from the scheduler perspective the most-attractive one (no long running workloads present, which means a lot of free resources from the scheduler perspective).
However there are cases where the kubelet can completely reject pods. That is something which we can't do right now. It may make sense to have this flow for certain use-cases.
On Thu, Jun 17, 2021 at 5:18 AM Roman Mohr <rm...@redhat.com> wrote:Hi,On Thu, Jun 17, 2021 at 10:51 AM Felix Enrique Llorente Pastora <ello...@redhat.com> wrote:Hi All,Looking at the following issue [1] the problem is related to the nodes not being able to create tap devices and this ends up propagating a critical network error, I was expecting this to mark the VMI as Failed and not being re-enqueue. Looking at the code looks like we are always re-enqueuing on error [2] shouldn't kubevirt to stop re-enqueing if the error is critical ?Right now we behave here similar to the kubelet on errors. The kubelet will for instance also indefinitely try to create a pod, even if there is a permanent error on the node and the container sandboxes can't be created.This approach has pros- and cons. The con is that your workload can be stuck indefinitely if you don't resolve the error somehow, which can worst-case lead to full downtimes if your application can not handle replicas.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CAPjOJFvFxkPgqyvxiXOK7kSLVCwTARxeE7yD8UFW%2BpnJEg0sJA%40mail.gmail.com.
On Thu, Jun 17, 2021 at 4:09 PM David Vossel <dvo...@redhat.com> wrote:On Thu, Jun 17, 2021 at 5:18 AM Roman Mohr <rm...@redhat.com> wrote:Hi,On Thu, Jun 17, 2021 at 10:51 AM Felix Enrique Llorente Pastora <ello...@redhat.com> wrote:Hi All,Looking at the following issue [1] the problem is related to the nodes not being able to create tap devices and this ends up propagating a critical network error, I was expecting this to mark the VMI as Failed and not being re-enqueue. Looking at the code looks like we are always re-enqueuing on error [2] shouldn't kubevirt to stop re-enqueing if the error is critical ?Right now we behave here similar to the kubelet on errors. The kubelet will for instance also indefinitely try to create a pod, even if there is a permanent error on the node and the container sandboxes can't be created.This approach has pros- and cons. The con is that your workload can be stuck indefinitely if you don't resolve the error somehow, which can worst-case lead to full downtimes if your application can not handle replicas.I think there is a subtle difference between kubelet and what kubevirt does: Kubelet will retry to start a new pod if one fails, so any resources it created until failing, will get cleaned up with the containers removal.
But with kubevirt, I think we are trying to re-deploy on the same pod and not to start a new one.
I am not 100% sure about this, but based on the errors this seemed to be the case.
On Thu, 17 Jun 2021 at 15:34, Edward Haas <edw...@redhat.com> wrote:On Thu, Jun 17, 2021 at 4:09 PM David Vossel <dvo...@redhat.com> wrote:On Thu, Jun 17, 2021 at 5:18 AM Roman Mohr <rm...@redhat.com> wrote:Hi,On Thu, Jun 17, 2021 at 10:51 AM Felix Enrique Llorente Pastora <ello...@redhat.com> wrote:Hi All,Looking at the following issue [1] the problem is related to the nodes not being able to create tap devices and this ends up propagating a critical network error, I was expecting this to mark the VMI as Failed and not being re-enqueue. Looking at the code looks like we are always re-enqueuing on error [2] shouldn't kubevirt to stop re-enqueing if the error is critical ?Right now we behave here similar to the kubelet on errors. The kubelet will for instance also indefinitely try to create a pod, even if there is a permanent error on the node and the container sandboxes can't be created.This approach has pros- and cons. The con is that your workload can be stuck indefinitely if you don't resolve the error somehow, which can worst-case lead to full downtimes if your application can not handle replicas.I think there is a subtle difference between kubelet and what kubevirt does: Kubelet will retry to start a new pod if one fails, so any resources it created until failing, will get cleaned up with the containers removal.
But with kubevirt, I think we are trying to re-deploy on the same pod and not to start a new one.
I am not 100% sure about this, but based on the errors this seemed to be the case.From what I know I am 99% sure you are right, virt-controller creates the virt-launcher pod and virt-handler taps into network namespace to setup networking if it fails and VMI is not marked as Failed it will tap again at the very same pod, so reenqueue does not mean virt-launcher re-creation.