In the case of hot add disk. img is missing

79 views
Skip to first unread message

mzzga...@gmail.com

unread,
Sep 2, 2021, 6:41:59 AM9/2/21
to kubevirt-dev
When I start a vm, at this time the vm is running, at this time I use local-volume for hot add disk operation, at this time the disk mount point in the launcherpod is /var/lib/kubelet/pods/$PODID/ volumes/kubernetes.io~empty-dir/hotplug-disks/dv-data-0/disk.img, when the launcher container fails, it is forcibly deleted, this file will be lost at this time


WX20210902-184120@2x.png

Alexander Wels

unread,
Sep 2, 2021, 7:52:55 AM9/2/21
to mzzga...@gmail.com, kubevirt-dev
So /var/lib/kubelet/pods/$PODID/ volumes/kubernetes.io~empty-dir/hotplug-disks/<volume_name>/disk.img is a bind mount to the actual file. If the launcher pod dies we unmount the bind mount, but the original file on the original volume should not be affected at all. Are you saying something is deleting the actual file before the unmount happens, and thus the original volume is modified?

On Thu, Sep 2, 2021 at 6:42 AM mzzga...@gmail.com <mzzga...@gmail.com> wrote:
When I start a vm, at this time the vm is running, at this time I use local-volume for hot add disk operation, at this time the disk mount point in the launcherpod is /var/lib/kubelet/pods/$PODID/ volumes/kubernetes.io~empty-dir/hotplug-disks/dv-data-0/disk.img, when the launcher container fails, it is forcibly deleted, this file will be lost at this time


WX20210902-184120@2x.png

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/c6e5b2ef-d6a1-4546-966e-6afae641e2a0n%40googlegroups.com.

mzzga...@gmail.com

unread,
Sep 2, 2021, 11:04:26 PM9/2/21
to kubevirt-dev

When I mount a local-volume type disk, when the disk is in the hotplugin-disk state, under this premise, the launcher pod corresponding to the vm is forcibly deleted. At this time, the disk.img in the corresponding local-volume is deleted. Will also be lost

Alexander Wels

unread,
Sep 3, 2021, 8:50:44 AM9/3/21
to mzzga...@gmail.com, kubevirt-dev
So I just tried this to double check, here is what I did:
- Start a kubevirtci cluster and installed the head of main on it.
- Enabled the hotplug feature gate
- Started an ephemeral VMI (examples/vmi-fedora)
- Created a blank datavolume
- Added the volume to the running VMI (this then actually creates the disk,img file on the host)
- Verified that the volume was attached and accessible inside the VM
- Looked up on the host where the disk.img resides.
- killed the VMI pod with --force and --grace-period=0
- Verified that the disk.img still exists on the host.

What did I not do that you did to get the behavior you are seeing?

mzzga...@gmail.com

unread,
Sep 14, 2021, 12:02:52 AM9/14/21
to kubevirt-dev
 We have verified that it is determined that kubelet's logic to clean up emptydir caused disk.img in emptydir to be lost
 This phenomenon happens occasionally

Alexander Wels

unread,
Sep 15, 2021, 11:02:28 AM9/15/21
to mzzga...@gmail.com, kubevirt-dev
So you are saying there is a race between the virt-handler unbinding the disk image, and the kubelet cleaning up the empty dir. If the kubelet beats virt-handler, it will remove everything in the emptyDir which includes the disk.img. That would be really bad, we can't have data disappearing.

Alexander Wels

unread,
Sep 15, 2021, 2:08:59 PM9/15/21
to mzzga...@gmail.com, kubevirt-dev
So I have been thinking about this, as the thought of potentially losing data really bothers me. And I think we can do the following to fix this. As long as there are any hotplugged disks we can put a finalizer on the virt-launcher pod. Then when you delete the pod (forced or not) it will mark the pod with a deletion timestamp. This will trigger the cleanup code in virt-handler to unbind any hotplugged disks. Then that triggers code that removes the finalizer, and the virt-launcher pod can be safely deleted. We would only add the finalizer if there are any hotplugged disks, and remove it if there are no more hotplugged disks.

Does that sound like a reasonable solution? This way ensures we unbind any hotplugged disks before the pod is actually removed, and the kubelet cleans up the emptyDir.

mzzga...@gmail.com

unread,
Sep 16, 2021, 9:58:27 PM9/16/21
to kubevirt-dev

We now temporarily modify the kubelet code to adapt to the scenario of deleting the empty-dir. In empty_dir.go, whether the directory matching /var/lib/kubelet/pods/uuid/volumes/kubernetes.io~empty-dir/ exists, if If it exists, it is forced to perform umout operation when deleting, so as to ensure that disk.img will not be deleted

Alexander Wels

unread,
Sep 17, 2021, 8:35:58 AM9/17/21
to mzzga...@gmail.com, kubevirt-dev
That seems like a maintenance nightmare, now it becomes hard to update your k8s version since you are running a custom kubelet. Does my suggested solution make sense to you (with the finalizer) so you don't have to run a custom kubelet?

Alexander Wels

unread,
Sep 22, 2021, 9:14:43 AM9/22/21
to mzzga...@gmail.com, kubevirt-dev
After doing some experimentation with finalizers, it turns out they do not help at all with this race condition. Even with a finalizer on the pod, the emptyDir will be emptied potentially before the virt-handler has a chance to unmount the volumes. So the only way to fix it is to have the emptyDir code unmount. Now I think it is unlikely that the k8s community will accept a PR that does this. So I went looking for alternatives and I have a proposal for adding a kubevirt csi driver [0] that supports ephemeral volumes. The ephemeral volumes have the exact same properties as emptyDir that we are interested in, but having the csi driver under our control allows us to properly unmount volumes during teardown of the ephemeral volumes.

Alexander

Alexander Wels

unread,
Sep 29, 2021, 1:28:48 PM9/29/21
to mzzga...@gmail.com, kubevirt-dev
So after some discussion in the proposal, I think we have arrived at an immediate solution to this problem, a PR with the fix [0] has been merged, and is being backported to 0.45 and 0.44. Basically the idea is that instead of bind mounting the directory with the disk.img, we bind mount the disk.img file itself. This way when the emptyDir tries to clean up, it can't remove the disk.img file because it is busy until the virt-handler gets a chance to unmount the disk.img file. Before it would be unable to remove the directory itself, but the contents were getting deleted.

@mzzga...@gmail.com I think this should solve your problem without the need for a custom kubelet.

mzzga...@gmail.com

unread,
Oct 30, 2021, 2:05:00 AM10/30/21
to kubevirt-dev
Great, the community is very active
Reply all
Reply to author
Forward
0 new messages