Question about vm migration with hotplug disks

Vasiliy Ulyanov

unread,

Mar 25, 2021, 9:56:53 AM3/25/21

to kubevirt-dev

Hello everyone,

As I can see currently if a VMI has hotplug disks attached it is marked as *not* migratable. I wonder: is it a fundamental limitation or smth that might become available in the future?

Thanks,

Vasiliy

Alexander Wels

unread,

Mar 25, 2021, 10:03:15 AM3/25/21

to Vasiliy Ulyanov, kubevirt-dev

Vasiliy,

I have it on my TODO list to enable Live Migration with hotplug. There are some interesting questions to answer. For instance when you live migrate, do you want the hotplugged disk to remain hotplugged (and thus use an extra pod) or become part of the virt-launcher pod, and be considered a non hotplugged volume. So there are some questions we need to deal with before fully implementing Live Migration, and this is the main reason we decided for the first drop of hotplug to simply disable Live Migration.

Alexander

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CANwfQB-gwcZvQtzmn%2BLxg0BU4F5L6W%2BNFYr%2B-MXFfkqeuFu7Fw%40mail.gmail.com.

Vasiliy Ulyanov

unread,

Apr 7, 2021, 7:49:04 AM4/7/21

to Alexander Wels, kubevirt-dev

Hi Alexander,

Thank you for sharing the status. For sure it's a valid question how to handle the migration target in regards to hotplug disks. Just from my standpoint I would say that it'd make sense to have a default behaviour which would probably just recreate the attachment pod on the target node and thus the disks would remain hot pluggable. While it would still be nice to have the possibility to make the disks 'permanently' attached after the migration. Perhaps allowing to specify it via some additional API (e.g. by specifying the migration strategy)? As far as I understand current implementation creates a separate attachment pod for each disk. In certain scenarios it may lead to way too many pods running on a node (i.e. multiple VMs with multiple hotplug disks). I think that the migration following the second approach may also help to mitigate that issue. Maybe it makes sense to implement both of them (just a thought though...)? In general the feature seems very useful. Just let me know if there is something I could help with. I would be happy to contribute.

Thanks,

Vasiliy

чт, 25 мар. 2021 г. в 15:03, Alexander Wels <aw...@redhat.com>:

Zang Li

unread,

Apr 7, 2021, 5:44:48 PM4/7/21

to Vasiliy Ulyanov, Alexander Wels, kubevirt-dev

Hi Alexander and Vasiliy,

If live migration can insert the extra volume into a new virt launcher pod, why not just make this as the default behavior as long as the live migration is supported? I.e. every time you do a hotplug, migrate the old vm to a new one with the hotplugged disk. This way, we don't need to have the extra pods around.

Thanks,

Zang

To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CANwfQB-nYg1cEA3AxyGUoBty_kjU05XJu84N6UF33MQzEm9Ugw%40mail.gmail.com.

dvo...@redhat.com

unread,

Apr 8, 2021, 8:11:16 AM4/8/21

to kubevirt-dev

On Wednesday, April 7, 2021 at 5:44:48 PM UTC-4 Zang Li wrote:

Hi Alexander and Vasiliy,

If live migration can insert the extra volume into a new virt launcher pod, why not just make this as the default behavior as long as the live migration is supported? I.e. every time you do a hotplug, migrate the old vm to a new one with the hotplugged disk. This way, we don't need to have the extra pods around.

Hey,

A similar idea was thrown around at one point. It definitely has some benefits, but the drawbacks are large as well.

- we'd need the ability to migrate VMs on the same host, there are some limitations here we hit within the virt stack, but besides that it presents issues with available resources. We'd need to be able to schedule a pod with identical mem request/limits on the same node. that node may or may not have capacity for the a second target pod, which then prevents the hotplug from occurring.

- migrations can be slow (depending on the workload), reducing the responsiveness of the hotplug

- depending on the usecase and how the network is setup, migrating a VM causes a new pod ip, which isn't something everyone can handle.

There might be more reasons, but those are the ones that I remember off the top of my head

- David

Zang Li

unread,

Apr 8, 2021, 2:52:56 PM4/8/21

to dvo...@redhat.com, kubevirt-dev

Hi David and Vasiliy,

Thanks for listing the limitations. Since migratable or not is known from our side, we should be able to tell when to do migration and when not. I believe live migration is a very desired feature for every production workload so it should be a more common use case than the non-migratable ones.

- we'd need the ability to migrate VMs on the same host, there are some limitations here we hit within the virt stack, but besides that it presents issues with available resources. We'd need to be able to schedule a pod with identical mem request/limits on the same node. that node may or may not have capacity for the a second target pod, which then prevents the hotplug from occurring.

Agree that it would be best to migrate the VM to the same host, but is this a must-have? As long as we can do live migration, the migration should be transparent to the users with a very short period interruption. Most cases, users probably don't care which node the vm resides as long as it is within a particular node pool. Alternatively, we can block the operation if the vm has a node-selector but the node doesn't have enough resources.

A side question on this - is it possible to increase the cpu/mem as well during a live migration? This falls into a more general question and use case - what changes can be done without restarting a vm?

- migrations can be slow (depending on the workload), reducing the responsiveness of the hotplug

Do we have some benchmark results on how long does it take for a migration to finish?

- depending on the usecase and how the network is setup, migrating a VM causes a new pod ip, which isn't something everyone can handle.

I believe a VM is not migratable if it uses pod IP directly because pod IP can not persist. Please correct me if I am wrong.

Thanks,

Zang

David Vossel

unread,

Apr 8, 2021, 5:19:28 PM4/8/21

to Zang Li, kubevirt-dev

On Thu, Apr 8, 2021 at 2:53 PM Zang Li <zan...@google.com> wrote:

Hi David and Vasiliy,

Thanks for listing the limitations. Since migratable or not is known from our side, we should be able to tell when to do migration and when not. I believe live migration is a very desired feature for every production workload so it should be a more common use case than the non-migratable ones.

- we'd need the ability to migrate VMs on the same host, there are some limitations here we hit within the virt stack, but besides that it presents issues with available resources. We'd need to be able to schedule a pod with identical mem request/limits on the same node. that node may or may not have capacity for the a second target pod, which then prevents the hotplug from occurring.

Agree that it would be best to migrate the VM to the same host, but is this a must-have?

As long as we can do live migration, the migration should be transparent to the users with a very short period interruption. Most cases, users probably don't care which node the vm resides as long as it is within a particular node pool. Alternatively, we can block the operation if the vm has a node-selector but the node doesn't have enough resources.

Migrating to the same node is not a "must-have" necessarily for what you are describing to work. However, there are situations where we're limited if "same node" live migration isn't possible though... For example, if local storage PVCs are being hotplugged to a VMI. We support hotplugging local storage PVCs today, but that would get tricky if we need to depend on migration for hotplug.

A side question on this - is it possible to increase the cpu/mem as well during a live migration? This falls into a more general question and use case - what changes can be done without restarting a vm?

I haven't put much thought into this, but theoretically we could give the target pod increased cpu/mem requests and limits, then hotplug those additional resources to the running qemu guest after the migration completes.

- migrations can be slow (depending on the workload), reducing the responsiveness of the hotplug

Do we have some benchmark results on how long does it take for a migration to finish?

I've seen non-public benchmarks, and the migration times vary depending on the cluster hardware, the guest workload, and the migration tunings KubeVirt is configured with. It's hard to give accurate numbers here that make any sense.

If we're optimizing for getting the guest workload running on the target pod ASAP (to access a hotplugged disk), we do have the ability to perform postCopy migration, which transfers the workload into the target pod nearly immediately while we continue to stream contents from the source pod. This is similar in concept to what GCP is doing behind the scenes with their compute instance live migration. From a workload performance perspective, any memory intensive workload that needs low latency guarantees is definitely going to notice a performance hit while the contents continue to stream from the source to target.

All that said, even with PostCopy things can get kind of weird if we use Live Migration for hotplug. Streaming all the contents of a VM across the network is bandwidth intensive. If we're doing this every time someone hotplugs or un-hotplugs a disk, there's a limit to how many of those migrations can occur in parallel across the cluster.

The idea with same node migration is that it perhaps helps alleviate some of those bandwidth concerns as well

- depending on the usecase and how the network is setup, migrating a VM causes a new pod ip, which isn't something everyone can handle.

I believe a VM is not migratable if it uses pod IP directly because pod IP can not persist. Please correct me if I am wrong.

yep, when we're using the "bridge" network binding with the pod network, you're correct. We don't allow migration for VMIs with pod network + bridge binding

However, when using the "masquerade" binding with the pod network, we setup NAT and give the guest an IP address that will remain consistent throughout a live migration. So after a live migration occurs, the pod IP technically changes, but the IP the vm guest sees internally on eth0 remains consistent.

I was thinking about the "masquerade" binding when I wrote that bullet point earlier, however I don't think my argument is very strong. It's true that with the masquerade binding the VM's IP within the cluster will change, which might impact how other endpoints contact the VM, however we have the exact same problem when a VM is stopped and started as well.

Thanks,
Zang

Alexander Wels

unread,

Apr 9, 2021, 8:22:37 AM4/9/21

to David Vossel, Zang Li, kubevirt-dev

On Thu, Apr 8, 2021 at 5:19 PM David Vossel <dvo...@redhat.com> wrote:

On Thu, Apr 8, 2021 at 2:53 PM Zang Li <zan...@google.com> wrote:
Hi David and Vasiliy,

Thanks for listing the limitations. Since migratable or not is known from our side, we should be able to tell when to do migration and when not. I believe live migration is a very desired feature for every production workload so it should be a more common use case than the non-migratable ones.

- we'd need the ability to migrate VMs on the same host, there are some limitations here we hit within the virt stack, but besides that it presents issues with available resources. We'd need to be able to schedule a pod with identical mem request/limits on the same node. that node may or may not have capacity for the a second target pod, which then prevents the hotplug from occurring.

Agree that it would be best to migrate the VM to the same host, but is this a must-have?
As long as we can do live migration, the migration should be transparent to the users with a very short period interruption. Most cases, users probably don't care which node the vm resides as long as it is within a particular node pool. Alternatively, we can block the operation if the vm has a node-selector but the node doesn't have enough resources.

Migrating to the same node is not a "must-have" necessarily for what you are describing to work. However, there are situations where we're limited if "same node" live migration isn't possible though... For example, if local storage PVCs are being hotplugged to a VMI. We support hotplugging local storage PVCs today, but that would get tricky if we need to depend on migration for hotplug.

KubeVirt also supports using just local storage like hostpath or local volume, and live migration is not available there at all. So we would have to have the migration to the same host in order to do anything. We are currently looking into ways to mitigate the pod per hotplugged disk issue, as it is an obvious problem with the current approach.

A side question on this - is it possible to increase the cpu/mem as well during a live migration? This falls into a more general question and use case - what changes can be done without restarting a vm?

I haven't put much thought into this, but theoretically we could give the target pod increased cpu/mem requests and limits, then hotplug those additional resources to the running qemu guest after the migration completes.

- migrations can be slow (depending on the workload), reducing the responsiveness of the hotplug

Do we have some benchmark results on how long does it take for a migration to finish?

I've seen non-public benchmarks, and the migration times vary depending on the cluster hardware, the guest workload, and the migration tunings KubeVirt is configured with. It's hard to give accurate numbers here that make any sense.

If we're optimizing for getting the guest workload running on the target pod ASAP (to access a hotplugged disk), we do have the ability to perform postCopy migration, which transfers the workload into the target pod nearly immediately while we continue to stream contents from the source pod. This is similar in concept to what GCP is doing behind the scenes with their compute instance live migration. From a workload performance perspective, any memory intensive workload that needs low latency guarantees is definitely going to notice a performance hit while the contents continue to stream from the source to target.

All that said, even with PostCopy things can get kind of weird if we use Live Migration for hotplug. Streaming all the contents of a VM across the network is bandwidth intensive. If we're doing this every time someone hotplugs or un-hotplugs a disk, there's a limit to how many of those migrations can occur in parallel across the cluster.

The idea with same node migration is that it perhaps helps alleviate some of those bandwidth concerns as well

- depending on the usecase and how the network is setup, migrating a VM causes a new pod ip, which isn't something everyone can handle.

I believe a VM is not migratable if it uses pod IP directly because pod IP can not persist. Please correct me if I am wrong.

yep, when we're using the "bridge" network binding with the pod network, you're correct. We don't allow migration for VMIs with pod network + bridge binding

However, when using the "masquerade" binding with the pod network, we setup NAT and give the guest an IP address that will remain consistent throughout a live migration. So after a live migration occurs, the pod IP technically changes, but the IP the vm guest sees internally on eth0 remains consistent.

I was thinking about the "masquerade" binding when I wrote that bullet point earlier, however I don't think my argument is very strong. It's true that with the masquerade binding the VM's IP within the cluster will change, which might impact how other endpoints contact the VM, however we have the exact same problem when a VM is stopped and started as well.

Thanks,
Zang

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CAPjOJFt6-Xx_4cHb2p39yhfHAdtKONgTGZ_0QOse2GT36otmOA%40mail.gmail.com.

Vasiliy Ulyanov

unread,

May 27, 2021, 5:36:56 AM5/27/21

to kubevirt-dev, David Vossel, Zang Li, Alexander Wels

Hey,

Trying to bring up this discussion again :)

I was looking into the topic of migrating VMs with hotplug disks recently. Currently there is a draft PR [1] that implements the approach with a single attachment pod holding all the volumes. I tried to enable LM using that PR as a base. In the end I think I managed to shape it to some working state. I created a WIP PR [2] and would like to get some feedback from the community about it. There is still some work to be done but the approach in general is there. I hope that helps to move forward and bring the feature closer to release as currently there is an interest and a strong wish to leverage it in other projects.

[1] https://github.com/kubevirt/kubevirt/pull/5649

[2] https://github.com/kubevirt/kubevirt/pull/5728

Thanks,

Vasiliy

пт, 9 апр. 2021 г. в 14:22, Alexander Wels <aw...@redhat.com>:

To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CAKDnqXaYR7AhpTjBB9Ghzk0cuycYBJoQ8jAjR1XSumMr2hxWrw%40mail.gmail.com.

Alexander Wels

unread,

Jun 1, 2021, 9:28:41 AM6/1/21

to Vasiliy Ulyanov, kubevirt-dev, David Vossel, Zang Li

On Thu, May 27, 2021 at 5:37 AM Vasiliy Ulyanov <vasil...@gmail.com> wrote:

Hey,

Trying to bring up this discussion again :)

I was looking into the topic of migrating VMs with hotplug disks recently. Currently there is a draft PR [1] that implements the approach with a single attachment pod holding all the volumes. I tried to enable LM using that PR as a base. In the end I think I managed to shape it to some working state. I created a WIP PR [2] and would like to get some feedback from the community about it. There is still some work to be done but the approach in general is there. I hope that helps to move forward and bring the feature closer to release as currently there is an interest and a strong wish to leverage it in other projects.

[1] https://github.com/kubevirt/kubevirt/pull/5649
[2] https://github.com/kubevirt/kubevirt/pull/5728

Thanks,
Vasiliy

I did a quick review on the PR, and I don't see anything wrong with the approach. At this point it is just me slowing you down because I need to finish up [1] and get that merged so you can base your code on it.

Vasiliy Ulyanov

unread,

Jun 1, 2021, 11:21:48 AM6/1/21

to Alexander Wels, kubevirt-dev, David Vossel, Zang Li

Hi Alexander,

Thank you for reviewing the PR. The [1] works pretty well so it does not slow me down or cause troubles :) Still there are things in TODO.

Also apart from what is mentioned in the PR (code cleanup and testing) there is currently an issue that the attachment pod is not deleted after a successful migration. My assumption was that it should be destroyed along with the virt-launcher pod. In general that is the case but after migration virt-launcher stays around in the "Completed" state and only manual deletion removes it from the cluster. So currently I am not sure whether I need to explicitly delete the attachment pod. I think since virt-launcher is not deleted after migration there should be some reasoning behind it. Maybe then it makes sense also to keep the attachemnt pod as long as virt-launcher is there. At least that seems like the easiest and most straghtforward approach. Any thought on that?

вт, 1 июн. 2021 г. в 15:28, Alexander Wels <aw...@redhat.com>:

Alexander Wels

unread,

Jun 1, 2021, 11:44:31 AM6/1/21

to Vasiliy Ulyanov, kubevirt-dev, David Vossel, Zang Li

On Tue, Jun 1, 2021 at 11:21 AM Vasiliy Ulyanov <vasil...@gmail.com> wrote:

Hi Alexander,

Thank you for reviewing the PR. The [1] works pretty well so it does not slow me down or cause troubles :) Still there are things in TODO.

Also apart from what is mentioned in the PR (code cleanup and testing) there is currently an issue that the attachment pod is not deleted after a successful migration. My assumption was that it should be destroyed along with the virt-launcher pod. In general that is the case but after migration virt-launcher stays around in the "Completed" state and only manual deletion removes it from the cluster. So currently I am not sure whether I need to explicitly delete the attachment pod. I think since virt-launcher is not deleted after migration there should be some reasoning behind it. Maybe then it makes sense also to keep the attachemnt pod as long as virt-launcher is there. At least that seems like the easiest and most straghtforward approach. Any thought on that?

Well if the virt-launcher pod is in the completed state, that means the VM process in the pod is no longer running, and it really makes no sense to keep the attachment pod around at that point. It's just there so the kubelet has something it can use to properly clean up any attached volumes when you delete the pod. We should ensure that once the VM process is done, it has released all the volumes properly so the kubelet won't get blocked trying to detach the volumes. We should be able to detect the completed state in the VMI controller and call the function that cleans up the attachment pods.

As for why the original virt-launcher pod remains, I don't actually know.

Vasiliy Ulyanov

unread,

Jun 7, 2021, 2:40:30 AM6/7/21

to Alexander Wels, kubevirt-dev, David Vossel, Zang Li

Yeah, right. Probably it is better to clean up the 'orphaned' attachment pods explicitly. It's not very difficult to implement anyway...

I faced one more issue actually. I've been looking into it for some time already. It is reproducible with the functional test that is available in the PR and happens with block volumes: during the preparation of the target virt-launcher pod I do the write to cgroup device controller file ('devices.allow') to allow access to the volume and print in the log whether it is successful or not. But when the preparation is done and the target virt-launcher tries to resume the VM I get 'Operation not permitted' error and for some reason the rule to allow the block device is no longer in the 'devices.list'. So it looks like that the rule is dropped by someone else meanwhile (container runtime?). Initially I thought it was happening because the initialization of one of the virt-launcher containers was not finished and the container runtime was overwriting the device rules. But looks like it's not the case as I tried to wait till the pod was running. Interesting that the issue is not always reproducible. Usually it happens after several migrations e.g. when I try to migrate the VM back to the original node (i.e. node02 -> node01 -> node02) but sometimes it does not happen. Seems like it's a floating issue. Maybe someone saw something similar before?

вт, 1 июн. 2021 г. в 17:44, Alexander Wels <aw...@redhat.com>:

Alexander Wels

unread,

Jun 7, 2021, 7:59:07 AM6/7/21

to Vasiliy Ulyanov, kubevirt-dev, David Vossel, Zang Li

On Mon, Jun 7, 2021 at 2:40 AM Vasiliy Ulyanov <vasil...@gmail.com> wrote:

Yeah, right. Probably it is better to clean up the 'orphaned' attachment pods explicitly. It's not very difficult to implement anyway...

I faced one more issue actually. I've been looking into it for some time already. It is reproducible with the functional test that is available in the PR and happens with block volumes: during the preparation of the target virt-launcher pod I do the write to cgroup device controller file ('devices.allow') to allow access to the volume and print in the log whether it is successful or not. But when the preparation is done and the target virt-launcher tries to resume the VM I get 'Operation not permitted' error and for some reason the rule to allow the block device is no longer in the 'devices.list'. So it looks like that the rule is dropped by someone else meanwhile (container runtime?). Initially I thought it was happening because the initialization of one of the virt-launcher containers was not finished and the container runtime was overwriting the device rules. But looks like it's not the case as I tried to wait till the pod was running. Interesting that the issue is not always reproducible. Usually it happens after several migrations e.g. when I try to migrate the VM back to the original node (i.e. node02 -> node01 -> node02) but sometimes it does not happen. Seems like it's a floating issue. Maybe someone saw something similar before?

Yeah, we believe its is cpumanager reverting the changes. So basically you are in a race. That is why in the normal hotplug scenario we will keep adding the device to the devices.allow until we are certain the the VM has successfully attached the device (at which point we stop adding it to devices.allow because it is now already in use by the VM). You should be able to test this by disabling cpumanager on the target and doing the migration and seeing it won't revert the device being allowed.

Vasiliy Ulyanov

unread,

Jun 7, 2021, 9:42:50 AM6/7/21

to Alexander Wels, kubevirt-dev, David Vossel, Zang Li

Right, it was because of the cpumanager: "The CPU manager periodically writes resource updates through the CRI in order to reconcile in-memory CPU assignments with cgroupfs.". Funny that it was enabled only on one of the nodes and therefore I wasn't able to catch the bug on the other node (I used the Kubevirt CI VMs). Also I tested on a baremetal cluster and without any issues.

Now thinking on how to handle this... I am not sure yet about the complexity of implemening some retry logic when it comes to migration... and in general it does not seem like a 'solid' solution. Does it make sense to go that path? But on the other hand leaving it as it is now will cause possible failures during migration... that potentially can succeed on the next try though. Wonder if there is any alternative way of dealing with cpumanager... cannot think of any at the moment. Any thoughts here?

пн, 7 июн. 2021 г. в 13:59, Alexander Wels <aw...@redhat.com>:

Vasiliy Ulyanov

unread,

Jun 16, 2021, 1:14:30 PM6/16/21

to Alexander Wels, kubevirt-dev, David Vossel, Zang Li

Hey there,

I think I have updates on the issue with cpumanager. I found this commit [1] that actually fixes the problem upstream on the runc level. In fact cpumanager updates cgroup parameters related only to CPU, memory and blockIO. It does not explicitly reset the device rules. Seems that it was the implementation of the `update` call on the runc side that was dropping all the rules (apparently they were just reapplying the config to match the existing container spec and thus dropping all the 'extra' device rules).

I rebuilt runc from the master branch with that fix included and so far I do not observe any issues. I can confirm that the hotplug device rules are no more dropped. But the fix is actually pretty new (a couple of weeks) and it is not yet even released. So it seems the problem has been fixed but the fix is not yet *widely* available. I just wonder how to tackle this on the KubeVirt side. I do not see any reliable workaround for this. Any thoughts?

[1] https://github.com/opencontainers/runc/commit/bf7492ee5d022cd99a9dbe71c5c4f965041552e9

пн, 7 июн. 2021 г. в 15:42, Vasiliy Ulyanov <vasil...@gmail.com>:

Alexander Wels

unread,

Jun 17, 2021, 1:33:44 PM6/17/21

to Vasiliy Ulyanov, kubevirt-dev, David Vossel, Zang Li

On Wed, Jun 16, 2021 at 1:14 PM Vasiliy Ulyanov <vasil...@gmail.com> wrote:

Hey there,

I think I have updates on the issue with cpumanager. I found this commit [1] that actually fixes the problem upstream on the runc level. In fact cpumanager updates cgroup parameters related only to CPU, memory and blockIO. It does not explicitly reset the device rules. Seems that it was the implementation of the `update` call on the runc side that was dropping all the rules (apparently they were just reapplying the config to match the existing container spec and thus dropping all the 'extra' device rules).

I rebuilt runc from the master branch with that fix included and so far I do not observe any issues. I can confirm that the hotplug device rules are no more dropped. But the fix is actually pretty new (a couple of weeks) and it is not yet even released. So it seems the problem has been fixed but the fix is not yet *widely* available. I just wonder how to tackle this on the KubeVirt side. I do not see any reliable workaround for this. Any thoughts?

[1] https://github.com/opencontainers/runc/commit/bf7492ee5d022cd99a9dbe71c5c4f965041552e9

I am trying to decide if it is worth it for us to try and do a work around until the actual fix is widely available. I guess the earliest we could see the fix is k8s 1.22? Maybe we can just document and note that LM with a hotplugged block device doesn't work reliably until k8s 1.22? Having to make a work around and then removing it seems like a waste of time. But I do want to know what others think about this.

Vasiliy Ulyanov

unread,

Jun 18, 2021, 1:54:07 AM6/18/21

to Alexander Wels, kubevirt-dev, David Vossel, Zang Li

чт, 17 июн. 2021 г. в 19:33, Alexander Wels <aw...@redhat.com>:

On Wed, Jun 16, 2021 at 1:14 PM Vasiliy Ulyanov <vasil...@gmail.com> wrote:
Hey there,

I think I have updates on the issue with cpumanager. I found this commit [1] that actually fixes the problem upstream on the runc level. In fact cpumanager updates cgroup parameters related only to CPU, memory and blockIO. It does not explicitly reset the device rules. Seems that it was the implementation of the `update` call on the runc side that was dropping all the rules (apparently they were just reapplying the config to match the existing container spec and thus dropping all the 'extra' device rules).

I rebuilt runc from the master branch with that fix included and so far I do not observe any issues. I can confirm that the hotplug device rules are no more dropped. But the fix is actually pretty new (a couple of weeks) and it is not yet even released. So it seems the problem has been fixed but the fix is not yet *widely* available. I just wonder how to tackle this on the KubeVirt side. I do not see any reliable workaround for this. Any thoughts?

[1] https://github.com/opencontainers/runc/commit/bf7492ee5d022cd99a9dbe71c5c4f965041552e9

I am trying to decide if it is worth it for us to try and do a work around until the actual fix is widely available. I guess the earliest we could see the fix is k8s 1.22? Maybe we can just document and note that LM with a hotplugged block device doesn't work reliably until k8s 1.22? Having to make a work around and then removing it seems like a waste of time. But I do want to know what others think about this.

Agree with documenting. I was also thinking about adding a newer `runc` to the KubeVirt CI. It will require building from sources though but I believe it's not a big issue. That way we can run e2e tests for live migration with hotplug volumes.

Reply all

Reply to author

Forward