[kubevirt-dev] Restarting virt-launchers at virt-handler restart

164 views

Skip to first unread message

Felix Enrique Llorente Pastora

unread,

Jun 23, 2021, 5:36:50 AM6/23/21

to kubevirt-dev

Hi,

Currently there are some preparation steps done at virt-handler into the virt-launcher network namespace, in case of virt-handler restarting these steps are restarted and may or may not end up with error of bad state as a result of virt-handler restarting. Would it make sense to resolve this issue by restarting virt-launchers that are not at final state when virt-handler restarts ?

Quique Llorente

CNV networking Senior Software Engineer

Red Hat EMEA

ello...@redhat.com

@RedHat Red Hat Red Hat

Roman Mohr

unread,

Jun 23, 2021, 5:48:45 AM6/23/21

to Felix Enrique Llorente Pastora, kubevirt-dev

Hi,

On Wed, Jun 23, 2021 at 11:37 AM Felix Enrique Llorente Pastora <ello...@redhat.com> wrote:

Hi,

Currently there are some preparation steps done at virt-handler into the virt-launcher network namespace, in case of virt-handler restarting these steps are restarted and may or may not end up with error of bad state as a result of virt-handler restarting. Would it make sense to resolve this issue by restarting virt-launchers that are not at final state when virt-handler restarts ?

In general the expectation is that these code-paths are written in an idempotent way. I would expect that the networking setup path either sets up the interfaces in a way that it can at any stage detect where it last stopped, or that it does set a checkpoint where it e.g. stores some required config which may get lost in case one gets interrupted in a bad moment.

Do you expect some blockers?

Best regards,

Roman

BR

--
Quique Llorente
CNV networking Senior Software Engineer
Red Hat EMEA
ello...@redhat.com
@RedHat Red Hat Red Hat

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CAHVoYm%2Be7pPHPxish_Aahj%2BvDde6xoP36dW0k0J2yNXpkLxL7A%40mail.gmail.com.

Edward Haas

unread,

Jun 23, 2021, 6:00:53 AM6/23/21

to Roman Mohr, Felix Enrique Llorente Pastora, kubevirt-dev

On Wed, Jun 23, 2021 at 12:48 PM Roman Mohr <rm...@redhat.com> wrote:

Hi,

On Wed, Jun 23, 2021 at 11:37 AM Felix Enrique Llorente Pastora <ello...@redhat.com> wrote:
Hi,

Currently there are some preparation steps done at virt-handler into the virt-launcher network namespace, in case of virt-handler restarting these steps are restarted and may or may not end up with error of bad state as a result of virt-handler restarting. Would it make sense to resolve this issue by restarting virt-launchers that are not at final state when virt-handler restarts ?

In general the expectation is that these code-paths are written in an idempotent way. I would expect that the networking setup path either sets up the interfaces in a way that it can at any stage detect where it last stopped, or that it does set a checkpoint where it e.g. stores some required config which may get lost in case one gets interrupted in a bad moment.

The question is: If the virt-handler restarted, why do we need to trust anything that exists in a half-backed virt-launcher?

It is indeed better to make the code idempotent, or at least to detect it is in a bad shape. The question is, why should we invest and trust that we covered all angles?

What is the downside of tearing down the pod and starting a new one in this specific case? (which is pretty rare anyway)

Do you expect some blockers?

Best regards,
Roman

BR

--
Quique Llorente
CNV networking Senior Software Engineer
Red Hat EMEA
ello...@redhat.com
@RedHat Red Hat Red Hat

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CAHVoYm%2Be7pPHPxish_Aahj%2BvDde6xoP36dW0k0J2yNXpkLxL7A%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CALDPj7vsRCf5ULYrAwMsmxjEXWe23j4m-BkqL3OBjYKNOdW0rg%40mail.gmail.com.

Roman Mohr

unread,

Jun 23, 2021, 6:33:32 AM6/23/21

to Edward Haas, Felix Enrique Llorente Pastora, kubevirt-dev

On Wed, Jun 23, 2021 at 12:00 PM Edward Haas <edw...@redhat.com> wrote:

On Wed, Jun 23, 2021 at 12:48 PM Roman Mohr <rm...@redhat.com> wrote:
Hi,

On Wed, Jun 23, 2021 at 11:37 AM Felix Enrique Llorente Pastora <ello...@redhat.com> wrote:
Hi,

Currently there are some preparation steps done at virt-handler into the virt-launcher network namespace, in case of virt-handler restarting these steps are restarted and may or may not end up with error of bad state as a result of virt-handler restarting. Would it make sense to resolve this issue by restarting virt-launchers that are not at final state when virt-handler restarts ?

In general the expectation is that these code-paths are written in an idempotent way. I would expect that the networking setup path either sets up the interfaces in a way that it can at any stage detect where it last stopped, or that it does set a checkpoint where it e.g. stores some required config which may get lost in case one gets interrupted in a bad moment.

The question is: If the virt-handler restarted, why do we need to trust anything that exists in a half-backed virt-launcher?
It is indeed better to make the code idempotent, or at least to detect it is in a bad shape. The question is, why should we invest and trust that we covered all angles?
What is the downside of tearing down the pod and starting a new one in this specific case? (which is pretty rare anyway)

I think it goes back to the other email thread regarding pretty much the same topic. Also have a look at the examples there. We have to assume that any call which returns an error can return an error.

I don't see much of a difference to a restart in that case. There is also always a chance that such errors, if not fixable by retrying, may occur again if the pods get recreated which can lead to unexpected retry floods on the cluster-level.

Keep in mind that recreating the pod means involving the whole cluster, not just virt-handler and the kubelet.

Best regards,

Roman

Alona Paz

unread,

Jun 23, 2021, 7:03:49 AM6/23/21

to Roman Mohr, Edward Haas, Felix Enrique Llorente Pastora, kubevirt-dev

On Wed, Jun 23, 2021 at 1:33 PM Roman Mohr <rm...@redhat.com> wrote:

On Wed, Jun 23, 2021 at 12:00 PM Edward Haas <edw...@redhat.com> wrote:

On Wed, Jun 23, 2021 at 12:48 PM Roman Mohr <rm...@redhat.com> wrote:
Hi,

On Wed, Jun 23, 2021 at 11:37 AM Felix Enrique Llorente Pastora <ello...@redhat.com> wrote:
Hi,

Currently there are some preparation steps done at virt-handler into the virt-launcher network namespace, in case of virt-handler restarting these steps are restarted and may or may not end up with error of bad state as a result of virt-handler restarting. Would it make sense to resolve this issue by restarting virt-launchers that are not at final state when virt-handler restarts ?

In general the expectation is that these code-paths are written in an idempotent way. I would expect that the networking setup path either sets up the interfaces in a way that it can at any stage detect where it last stopped, or that it does set a checkpoint where it e.g. stores some required config which may get lost in case one gets interrupted in a bad moment.

The question is: If the virt-handler restarted, why do we need to trust anything that exists in a half-backed virt-launcher?
It is indeed better to make the code idempotent, or at least to detect it is in a bad shape. The question is, why should we invest and trust that we covered all angles?
What is the downside of tearing down the pod and starting a new one in this specific case? (which is pretty rare anyway)

I think it goes back to the other email thread regarding pretty much the same topic. Also have a look at the examples there. We have to assume that any call which returns an error can return an error.
I don't see much of a difference to a restart in that case. There is also always a chance that such errors, if not fixable by retrying, may occur again if the pods get recreated which can lead to unexpected retry floods on the cluster-level.
Keep in mind that recreating the pod means involving the whole cluster, not just virt-handler and the kubelet.

Since the scenario in the original mail is pretty rare. Is there a real value in implementing a state machine in the networking setup so it can recover no matter on what stage the crash happened?

I was thinking about having markers marking when we enter the stage we are changing virt-launcher networking and when we finish the networking changes.

If after the crash we have only the first marker we just move the vm to a failed state.

I understand that this is less user friendly than a full state machine.

But since the scenario is very rare, comparing the added value of the state machine to the complexity it will add to the code and its maintenance. I believe the markers should be enough.

@Roman Mohr what do you think?

Best regards,
Roman

Do you expect some blockers?

Best regards,
Roman

BR

--
Quique Llorente
CNV networking Senior Software Engineer
Red Hat EMEA
ello...@redhat.com
@RedHat Red Hat Red Hat

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CAHVoYm%2Be7pPHPxish_Aahj%2BvDde6xoP36dW0k0J2yNXpkLxL7A%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CALDPj7vsRCf5ULYrAwMsmxjEXWe23j4m-BkqL3OBjYKNOdW0rg%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CALDPj7tnvTZca_dGng2aG7nd7-J_rbXio9sbET6ZEq2BrgvuwA%40mail.gmail.com.

Roman Mohr

unread,

Jun 23, 2021, 8:01:20 AM6/23/21

to Alona Paz, Edward Haas, Felix Enrique Llorente Pastora, kubevirt-dev

On Wed, Jun 23, 2021 at 1:03 PM Alona Paz <alka...@redhat.com> wrote:

On Wed, Jun 23, 2021 at 1:33 PM Roman Mohr <rm...@redhat.com> wrote:

On Wed, Jun 23, 2021 at 12:00 PM Edward Haas <edw...@redhat.com> wrote:

On Wed, Jun 23, 2021 at 12:48 PM Roman Mohr <rm...@redhat.com> wrote:
Hi,

On Wed, Jun 23, 2021 at 11:37 AM Felix Enrique Llorente Pastora <ello...@redhat.com> wrote:
Hi,

Currently there are some preparation steps done at virt-handler into the virt-launcher network namespace, in case of virt-handler restarting these steps are restarted and may or may not end up with error of bad state as a result of virt-handler restarting. Would it make sense to resolve this issue by restarting virt-launchers that are not at final state when virt-handler restarts ?

In general the expectation is that these code-paths are written in an idempotent way. I would expect that the networking setup path either sets up the interfaces in a way that it can at any stage detect where it last stopped, or that it does set a checkpoint where it e.g. stores some required config which may get lost in case one gets interrupted in a bad moment.

The question is: If the virt-handler restarted, why do we need to trust anything that exists in a half-backed virt-launcher?
It is indeed better to make the code idempotent, or at least to detect it is in a bad shape. The question is, why should we invest and trust that we covered all angles?
What is the downside of tearing down the pod and starting a new one in this specific case? (which is pretty rare anyway)

I think it goes back to the other email thread regarding pretty much the same topic. Also have a look at the examples there. We have to assume that any call which returns an error can return an error.
I don't see much of a difference to a restart in that case. There is also always a chance that such errors, if not fixable by retrying, may occur again if the pods get recreated which can lead to unexpected retry floods on the cluster-level.
Keep in mind that recreating the pod means involving the whole cluster, not just virt-handler and the kubelet.

Since the scenario in the original mail is pretty rare. Is there a real value in implementing a state machine in the networking setup so it can recover no matter on what stage the crash happened?

I think there are different patterns on how to tackle idempotency here. I am not sure if you need a complex state machine. Some basic checkpoints should be ok. One very prominent one which is used a lot is when a VM has reached the "running" phase.

If you prefer more state-machine-like thinking you can create checkpoint files and e.g. just blindly erase anything inside the networking setup phase which could be there and start fresh. If you like checkpoint files less, you can accompany every network setup primitive call with a pre-or-post check, to detect that this phase has already happened.

Note that `virt-handler` (like other domains in k8s) has a back-off mechanism. Retries will normally not be dangerous for the whole node or the cluster.

I want to highlight that this scenario with the handler restart is just one potential error source. I am not sure how an error from a "primitive" operation is different from a handler restart. It would have the same effect.

Maybe you can explain a little why it differs.

I was thinking about having markers marking when we enter the stage we are changing virt-launcher networking and when we finish the networking changes.
If after the crash we have only the first marker we just move the vm to a failed state.

I think it is important that we should not move VMIs to failed on errors in general. Since I can't see the difference between an ordinary error in the code paths and a restart, I would not treat this differently.

I understand that this is less user friendly than a full state machine.

Depending on which situation we talk about it can be beneficial for the availability of the workload and not beneficial for another one. For instance a workload which would run fine after a reschedule can be stuck pretty long, but on the other hand

if it would not run fine, we would endanger other workloads if we would permanently evict it.

But since the scenario is very rare, comparing the added value of the state machine to the complexity it will add to the code and its maintenance. I believe the markers should be enough.

I think not going to a failed state is there to keep the cluster responsive and don't overload it. In k8s an entity which owns an object has to deal with it and retry.

@Roman Mohr what do you think?

Maybe you can explain more why you think that a restart is another type of "failure" compared to other interruptions due to errors.

Best regards,

Roman

Alona Paz

unread,

Jun 23, 2021, 10:21:54 AM6/23/21

to Roman Mohr, Edward Haas, Felix Enrique Llorente Pastora, kubevirt-dev

On Wed, Jun 23, 2021 at 3:01 PM Roman Mohr <rm...@redhat.com> wrote:

On Wed, Jun 23, 2021 at 1:03 PM Alona Paz <alka...@redhat.com> wrote:

On Wed, Jun 23, 2021 at 1:33 PM Roman Mohr <rm...@redhat.com> wrote:

On Wed, Jun 23, 2021 at 12:00 PM Edward Haas <edw...@redhat.com> wrote:

On Wed, Jun 23, 2021 at 12:48 PM Roman Mohr <rm...@redhat.com> wrote:
Hi,

On Wed, Jun 23, 2021 at 11:37 AM Felix Enrique Llorente Pastora <ello...@redhat.com> wrote:
Hi,

Currently there are some preparation steps done at virt-handler into the virt-launcher network namespace, in case of virt-handler restarting these steps are restarted and may or may not end up with error of bad state as a result of virt-handler restarting. Would it make sense to resolve this issue by restarting virt-launchers that are not at final state when virt-handler restarts ?

In general the expectation is that these code-paths are written in an idempotent way. I would expect that the networking setup path either sets up the interfaces in a way that it can at any stage detect where it last stopped, or that it does set a checkpoint where it e.g. stores some required config which may get lost in case one gets interrupted in a bad moment.

The question is: If the virt-handler restarted, why do we need to trust anything that exists in a half-backed virt-launcher?
It is indeed better to make the code idempotent, or at least to detect it is in a bad shape. The question is, why should we invest and trust that we covered all angles?
What is the downside of tearing down the pod and starting a new one in this specific case? (which is pretty rare anyway)

I think it goes back to the other email thread regarding pretty much the same topic. Also have a look at the examples there. We have to assume that any call which returns an error can return an error.
I don't see much of a difference to a restart in that case. There is also always a chance that such errors, if not fixable by retrying, may occur again if the pods get recreated which can lead to unexpected retry floods on the cluster-level.
Keep in mind that recreating the pod means involving the whole cluster, not just virt-handler and the kubelet.

Since the scenario in the original mail is pretty rare. Is there a real value in implementing a state machine in the networking setup so it can recover no matter on what stage the crash happened?

I think there are different patterns on how to tackle idempotency here. I am not sure if you need a complex state machine. Some basic checkpoints should be ok. One very prominent one which is used a lot is when a VM has reached the "running" phase.
If you prefer more state-machine-like thinking you can create checkpoint files and e.g. just blindly erase anything inside the networking setup phase which could be there and start fresh. If you like checkpoint files less, you can accompany every network setup primitive call with a pre-or-post check, to detect that this phase has already happened.

Note that `virt-handler` (like other domains in k8s) has a back-off mechanism. Retries will normally not be dangerous for the whole node or the cluster.

I want to highlight that this scenario with the handler restart is just one potential error source. I am not sure how an error from a "primitive" operation is different from a handler restart. It would have the same effect.
Maybe you can explain a little why it differs.

The main difference that I see is that on handler restart you probably can recover the pod networking (continue from the stage before the crash).

On a primitive error, most of the chances you will get the same error again and again.

For example if you failed to create a bridge, why would you succeed on the next retry?

I was thinking about having markers marking when we enter the stage we are changing virt-launcher networking and when we finish the networking changes.
If after the crash we have only the first marker we just move the vm to a failed state.

I think it is important that we should not move VMIs to failed on errors in general. Since I can't see the difference between an ordinary error in the code paths and a restart, I would not treat this differently.

I believe that to decide whether we should try to recover, we should first understand if the error is recoverable/worth to be recovered.

I understand that this is less user friendly than a full state machine.

Depending on which situation we talk about it can be beneficial for the availability of the workload and not beneficial for another one. For instance a workload which would run fine after a reschedule can be stuck pretty long, but on the other hand
if it would not run fine, we would endanger other workloads if we would permanently evict it.

But since the scenario is very rare, comparing the added value of the state machine to the complexity it will add to the code and its maintenance. I believe the markers should be enough.

I think not going to a failed state is there to keep the cluster responsive and don't overload it. In k8s an entity which owns an object has to deal with it and retry.

@Roman Mohr what do you think?

Maybe you can explain more why you think that a restart is another type of "failure" compared to other interruptions due to errors.

Currently we move the vm to a failed state if the networking setup fails, I didn't know it was an "issue".

The bug that we now have is only if virt-handler restarts during the networking setup. We mark only the setup ending, so if the network setup is re-called after the restart we may end up

with a corrupted state of the network (since we start configuring everything from the beginning). Adding a marker to the beginning of the network setup should solve this bug.

Making the whole networking setup recoverable only to fix the described bug, imo, isn't worth the effort.

Now that the discussion moved to a more general issue - avoiding as much as possible moving the vm to a Failed state,

including recovering primitive error (probably not recoverable) or virt-handler restart during the network setup (very rare).

Maybe (especially if my assumptions of "probably not recoverable" and "very rare" are wrong) it is worth making the effort to make the network setup code more recoverable.

Roman Mohr

unread,

Jun 24, 2021, 4:06:41 AM6/24/21

to Alona Paz, Edward Haas, Felix Enrique Llorente Pastora, kubevirt-dev

I think this has two aspects:

1. Surprisingly we had quite some errors which really were only temporary in the past in multiple scenarios. They very often manifested themselves as flakes in CI (since there we try to not allow warning events in the well known test env),

while users just saw sometimes a warning in the logs and the retry worked. A very prominent still existing scenario are for instance cgroup rewrites by the kubelet which are not always atomic.
2. If we face a permanent error, we are confronted with a weird unknown (something we did not think about, a bug, ...). There the saves action for the cluster-health, as infra component, is to retry and not fail.

I was thinking about having markers marking when we enter the stage we are changing virt-launcher networking and when we finish the networking changes.
If after the crash we have only the first marker we just move the vm to a failed state.

I think it is important that we should not move VMIs to failed on errors in general. Since I can't see the difference between an ordinary error in the code paths and a restart, I would not treat this differently.

I believe that to decide whether we should try to recover, we should first understand if the error is recoverable/worth to be recovered.

Here I think that (2) applies from above.

I understand that this is less user friendly than a full state machine.

Depending on which situation we talk about it can be beneficial for the availability of the workload and not beneficial for another one. For instance a workload which would run fine after a reschedule can be stuck pretty long, but on the other hand
if it would not run fine, we would endanger other workloads if we would permanently evict it.

But since the scenario is very rare, comparing the added value of the state machine to the complexity it will add to the code and its maintenance. I believe the markers should be enough.

I think not going to a failed state is there to keep the cluster responsive and don't overload it. In k8s an entity which owns an object has to deal with it and retry.

@Roman Mohr what do you think?

Maybe you can explain more why you think that a restart is another type of "failure" compared to other interruptions due to errors.

Currently we move the vm to a failed state if the networking setup fails, I didn't know it was an "issue".

Yes, that is really something which we should not do. From my perspective such situations need to be analyzed and unblocked once understood. That does not mean that this has to be a manual process.

I understand that this is kind of an issue when we think about availability for non-scalable applications in VMs.

The bug that we now have is only if virt-handler restarts during the networking setup. We mark only the setup ending, so if the network setup is re-called after the restart we may end up
with a corrupted state of the network (since we start configuring everything from the beginning). Adding a marker to the beginning of the network setup should solve this bug.

Making the whole networking setup recoverable only to fix the described bug, imo, isn't worth the effort.

Sounds like a good way to solve this bug fast.

Now that the discussion moved to a more general issue - avoiding as much as possible moving the vm to a Failed state,
including recovering primitive error (probably not recoverable) or virt-handler restart during the network setup (very rare).
Maybe (especially if my assumptions of "probably not recoverable" and "very rare" are wrong) it is worth making the effort to make the network setup code more recoverable.

I think your "probably not recoverable" and "very rare" assumptions are not necessarily wrong. There are just more factors to consider.

Reply all

Reply to author

Forward

0 new messages