Convert ManualRecoveryRequired field for volume migration into a condition

7 views
Skip to first unread message

Alice Frosi

unread,
Oct 2, 2024, 5:04:34 AM10/2/24
to kubevirt-dev, Federico Fossemo, Antonio Cardace, Vladik Romanovsky, Michael Henriksen
Hi everyone,

I'd like some feedback on an API field in kubevirt.
The questionable field is the ManualRecoveryRequired field [1] introduced by the PR [2]. This field is part of the VolumeMigrationState which was added in the VM status in order to persist the migrated volumes information in case the VMI disappeared for any reasons. The ManualRequiredField indicates if the migration failed and the VM is, therefore, in an inconsistent state because it has the new volumes but the copy hasn't finished successfully. For example, if the users try to boot the VM, then could face an unbootable disk.

The question raised by Federico and Antonio is why we have a field and not a condition on the VM. Initially, when I designed this was to have everything in the VolumeMigrationState. However, if this sounds like a bad practice, then we can always change it and convert this to a new condition on the VM. Here, the question: do we want to update this? If so, we would like to modify it before the feature freeze, hence, please raise your concerns on time.


Many thanks,
Alice

Alexander Wels

unread,
Oct 2, 2024, 8:26:23 AM10/2/24
to Alice Frosi, kubevirt-dev, Federico Fossemo, Antonio Cardace, Vladik Romanovsky, Michael Henriksen
Would it make sense to have the condition on the VMI migration resource? That way we can correlate the actual migration with the failure. On the other hand I have seen it not create the VMIM resource due to feature gates not being on and we wouldn't capture the failure then. In general I am for the condition over the boolean field.

Alexander

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CABBoX7M%2B4YO6v_aBEsyRNE3bpczsK6jq_j04fRYAEXZCxkyAPQ%40mail.gmail.com.

Alice Frosi

unread,
Oct 2, 2024, 9:14:42 AM10/2/24
to Alexander Wels, kubevirt-dev, Federico Fossemo, Antonio Cardace, Vladik Romanovsky, Michael Henriksen
Hi Alexander,

it needs to be on the VM since it might be the case when the VMI disappeared like the VM was shutdown or the VMI was accidentally deleted. So, no in general, a condition on the VMI doesn't help. Additionally, this is to prevent the VM to be start or restarted with the wrong set of volumes.

Alice

Lee Yarwood

unread,
Oct 2, 2024, 12:02:43 PM10/2/24
to Alice Frosi, kubevirt-dev, Federico Fossemo, Antonio Cardace, Vladik Romanovsky, Michael Henriksen
Hey Alice,

Why not both? My understanding was that conditions provide a higher level state summary derived from the lower level details already captured in the status of an object [1]. 

That said I'm not entirely sure how to word such a condition and I'm not sold by the use of a bool in the status of the VM to convey that a volume migration has failed in a way that now requires manual intervention.

Shouldn't we capture more details about the failure in status while also exposing a higher level condition highlighting that a specific user intervention is required to rollback the volume migration attempt?

Regards,

Lee



Alexander Wels

unread,
Oct 2, 2024, 12:40:55 PM10/2/24
to Alice Frosi, kubevirt-dev, Federico Fossemo, Antonio Cardace, Vladik Romanovsky, Michael Henriksen
I was talking about a condition on the VMI Migration resource, not the VMI. But they are probably owned by the VMI so if that disappears then the VMIMs will also disappear if that is the case. I would have to check.

Antonio Cardace

unread,
Oct 3, 2024, 5:38:29 AM10/3/24
to Lee Yarwood, Alice Frosi, kubevirt-dev, Federico Fossemo, Vladik Romanovsky, Michael Henriksen
On Wed, Oct 2, 2024 at 6:02 PM Lee Yarwood <lyar...@redhat.com> wrote:
On Wed, 2 Oct 2024 at 10:04, Alice Frosi <afr...@redhat.com> wrote:
Hi everyone,

I'd like some feedback on an API field in kubevirt.
The questionable field is the ManualRecoveryRequired field [1] introduced by the PR [2]. This field is part of the VolumeMigrationState which was added in the VM status in order to persist the migrated volumes information in case the VMI disappeared for any reasons. The ManualRequiredField indicates if the migration failed and the VM is, therefore, in an inconsistent state because it has the new volumes but the copy hasn't finished successfully. For example, if the users try to boot the VM, then could face an unbootable disk.

The question raised by Federico and Antonio is why we have a field and not a condition on the VM. Initially, when I designed this was to have everything in the VolumeMigrationState. However, if this sounds like a bad practice, then we can always change it and convert this to a new condition on the VM. Here, the question: do we want to update this? If so, we would like to modify it before the feature freeze, hence, please raise your concerns on time.


Many thanks,
Alice

Hey Alice,

Why not both? My understanding was that conditions provide a higher level state summary derived from the lower level details already captured in the status of an object [1]. 

I'd say no because they're redundant, I think you can already tell from the status vs spec that the volume migration failed, also if a condition will be used then there's going to be that as well.

That said I'm not entirely sure how to word such a condition and I'm not sold by the use of a bool in the status of the VM to convey that a volume migration has failed in a way that now requires manual intervention.

I agree, that's exactly the same point I raised, usually when you have a bool in the status it's an api smell.

I think the status should just reflect the intrinsic status of the object, it isn't made to convey info to the object owner about something that needs to be done or that something failed, there are plenty of examples in the kubernetes apis.

Conditions on the other hand are used by controllers to tell other controllers/ui/object owner of some higher level information, examples are: there is a mismatch in the replicas of your deployment, node pressure, restart required in case of our VMs, and many others. I think the ManualRecoveryRequired field falls into this category and as such should be represented as a condition and not as a status field, you also have more expressibility with the condition as you're not limited to a bool type but you also have a message,status and reason field.

I think we should change this before the 1.4 FF.

Thanks,
Antonio



Vladik Romanovsky

unread,
Oct 4, 2024, 9:45:53 AM10/4/24
to Alice Frosi, kubevirt-dev, Federico Fossemo, Antonio Cardace, Michael Henriksen
Hey Alice,

I think since https://github.com/kubevirt/kubevirt/pull/12864 is already using this field to prevent a from starting, I'd agree that it would make more sense to have it as a condition.
Mainly since this is not just informative anymore.

On the other hand, conditions are also part of the `status`. If no functionality depends on or is waiting for a specific condition, it doesn't matter to me if this field is part of the conditions or in the VolumeMigrationState struct.

I hope that going forward, we will be able to make the VM automatically recover from failed volume migrations. Then we could remove this condition entirely.

Vladik 

Alice Frosi

unread,
Oct 7, 2024, 4:19:37 AM10/7/24
to Vladik Romanovsky, kubevirt-dev, Federico Fossemo, Antonio Cardace, Michael Henriksen
Hi Vladik,

On Fri, Oct 4, 2024 at 3:45 PM Vladik Romanovsky <vrom...@redhat.com> wrote:
Hey Alice,

I think since https://github.com/kubevirt/kubevirt/pull/12864 is already using this field to prevent a from starting, I'd agree that it would make more sense to have it as a condition.
Mainly since this is not just informative anymore.

No, we aren't using it anymore for preventing to start, I replaced it by checking the existence of VirtualMachineInstanceVolumesChange. Right now, this has become an informative field which can be easily modified/replaced.
 

On the other hand, conditions are also part of the `status`. If no functionality depends on or is waiting for a specific condition, it doesn't matter to me if this field is part of the conditions or in the VolumeMigrationState struct.

Ok, from the comments, it seems a condition is more preferable, I'll modify the code accordingly.
 
Thanks,
Alice

Vladik Romanovsky

unread,
Oct 7, 2024, 8:54:00 AM10/7/24
to Alice Frosi, kubevirt-dev, Federico Fossemo, Antonio Cardace, Michael Henriksen
Hi Alice,

On Mon, Oct 7, 2024 at 4:19 AM Alice Frosi <afr...@redhat.com> wrote:
Hi Vladik,

On Fri, Oct 4, 2024 at 3:45 PM Vladik Romanovsky <vrom...@redhat.com> wrote:
Hey Alice,

I think since https://github.com/kubevirt/kubevirt/pull/12864 is already using this field to prevent a from starting, I'd agree that it would make more sense to have it as a condition.
Mainly since this is not just informative anymore.

No, we aren't using it anymore for preventing to start, I replaced it by checking the existence of VirtualMachineInstanceVolumesChange. Right now, this has become an informative field which can be easily modified/replaced.

If this field is not being used to control the lifecycle of the VM and no functionality depends on it - I see no reason to turn this into a condition and make any further modifications.

Alice Frosi

unread,
Oct 7, 2024, 9:11:49 AM10/7/24
to Vladik Romanovsky, kubevirt-dev, Federico Fossemo, Antonio Cardace, Michael Henriksen
On Mon, Oct 7, 2024 at 2:54 PM Vladik Romanovsky <vrom...@redhat.com> wrote:
Hi Alice,

On Mon, Oct 7, 2024 at 4:19 AM Alice Frosi <afr...@redhat.com> wrote:
Hi Vladik,

On Fri, Oct 4, 2024 at 3:45 PM Vladik Romanovsky <vrom...@redhat.com> wrote:
Hey Alice,

I think since https://github.com/kubevirt/kubevirt/pull/12864 is already using this field to prevent a from starting, I'd agree that it would make more sense to have it as a condition.
Mainly since this is not just informative anymore.

No, we aren't using it anymore for preventing to start, I replaced it by checking the existence of VirtualMachineInstanceVolumesChange. Right now, this has become an informative field which can be easily modified/replaced.

If this field is not being used to control the lifecycle of the VM and no functionality depends on it - I see no reason to turn this into a condition and make any further modifications.

IMO, we need something to communicate to the users or the overlaying tools that a recovery is required. 
For example, Alexander's tool could watch for that condition and restore the VM spec automatically. Using the VirtualMachineInstanceVolumesChange condition is too complex for that condition because it is a VMI condition and could be still set true if the VMI disappeared.

Alice Frosi

unread,
Oct 9, 2024, 5:31:08 AM10/9/24
to Vladik Romanovsky, kubevirt-dev, Federico Fossemo, Antonio Cardace, Michael Henriksen
Hi,

I have opened a PR [1] which converted the ManualRecoveryRequiered field into a condition, please take a looka and provide feedback.


Thanks,
Alice
Reply all
Reply to author
Forward
0 new messages