kubevirt support qcow progress

871 views
Skip to first unread message

高鹏

unread,
Mar 2, 2021, 9:13:35 PM3/2/21
to kubevirt-dev
Hi,

We use kubevirt on our local disk,but the os image is raw, Compare the occupied transmission bandwidth,we want to use qcow type in os.
I would like to ask when the community will support or say the reasons and risks of not supporting at present


Thanks;

Mazzystr

unread,
Mar 3, 2021, 11:27:51 AM3/3/21
to 高鹏, kubevirt-dev
KubeVirt and container data importer definitely supports qcow2... https://kubevirt.io/user-guide/operations/containerized_data_importer/#supported-image-formats

Thanks,
/Chris Callegari

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/a3aeb702-b4df-4f05-b071-9c2d57a3d9c6n%40googlegroups.com.

Adam Litke

unread,
Mar 3, 2021, 2:19:13 PM3/3/21
to Mazzystr, 高鹏, kubevirt-dev
On Wed, Mar 3, 2021 at 11:28 AM Mazzystr <mazz...@gmail.com> wrote:
KubeVirt and container data importer definitely supports qcow2... https://kubevirt.io/user-guide/operations/containerized_data_importer/#supported-image-formats

As Chris says, qcow2 is supported as an import format.  If the qcow2 image is sparse then you can transmit it efficiently over the network.  See virt-sparsify (1) for a nice tool to ensure that your disk image is as small as possible to prepare it for transmission.

Several reasons why we are not using qcow2:
  • For security reasons libvirt must know if it is safe to interpret a disk image as qcow2.  To support qcow2 we would need to enhance the VMI API with a "format" field and teach users how to properly set it for each VM disk.
  • Qcow2 metadata overhead is non-deterministic and reduces the amount of usable space on the PVC
  • We do not want to rely on qcow2 features (such as qcow2 snapshots and backing chains) because they tightly couple the compute layer to the storage layer.
Is there any other reason why you want to use qcow2 besides transmission efficiency? 


--

Adam Litke

He / Him / His

Associate Manager - OpenShift Virtualization Storage

ali...@redhat.com   

mzzga...@gmail.com

unread,
Mar 3, 2021, 10:09:15 PM3/3/21
to kubevirt-dev
Hi,
I think 
1. Libvirt is already a very mature solution using qcow format
2. We use local disks and want to achieve online snapshot capabilities,so online snapshot cannot be used in the local disk
    So our preliminary plan is to use the qcow format to implement online snapshots of disks through virt-launcher(qemu-img snapshot or virsh snapshot etc.. )

Thanks

Alice Frosi

unread,
Mar 4, 2021, 3:46:09 AM3/4/21
to mzzga...@gmail.com, kubevirt-dev
On Thu, Mar 4, 2021 at 4:09 AM mzzga...@gmail.com <mzzga...@gmail.com> wrote:
Hi,
I think 
1. Libvirt is already a very mature solution using qcow format 
2. We use local disks and want to achieve online snapshot capabilities,so online snapshot cannot be used in the local disk
    So our preliminary plan is to use the qcow format to implement online snapshots of disks through virt-launcher(qemu-img snapshot or virsh snapshot etc.. )
Where do you plan to store the snapshots? If it is the pvc where the image is stored then you also need to account for this additional space. This is also not easy to estimate when you require a pvc

 
Thanks
在2021年3月4日星期四 UTC+8 上午3:19:13<ali...@redhat.com> 写道:
On Wed, Mar 3, 2021 at 11:28 AM Mazzystr <mazz...@gmail.com> wrote:
KubeVirt and container data importer definitely supports qcow2... https://kubevirt.io/user-guide/operations/containerized_data_importer/#supported-image-formats

As Chris says, qcow2 is supported as an import format.  If the qcow2 image is sparse then you can transmit it efficiently over the network.  See virt-sparsify (1) for a nice tool to ensure that your disk image is as small as possible to prepare it for transmission.

Several reasons why we are not using qcow2:
  • For security reasons libvirt must know if it is safe to interpret a disk image as qcow2.  To support qcow2 we would need to enhance the VMI API with a "format" field and teach users how to properly set it for each VM disk.
  • Qcow2 metadata overhead is non-deterministic and reduces the amount of usable space on the PVC
  • We do not want to rely on qcow2 features (such as qcow2 snapshots and backing chains) because they tightly couple the compute layer to the storage layer.
Is there any other reason why you want to use qcow2 besides transmission efficiency? 


--

Adam Litke

He / Him / His

Associate Manager - OpenShift Virtualization Storage

ali...@redhat.com   

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.

dvo...@redhat.com

unread,
Mar 4, 2021, 11:25:25 AM3/4/21
to kubevirt-dev
On Wednesday, March 3, 2021 at 10:09:15 PM UTC-5 mzzga...@gmail.com wrote:
Hi,
I think 
1. Libvirt is already a very mature solution using qcow format
2. We use local disks and want to achieve online snapshot capabilities,so online snapshot cannot be used in the local disk
    So our preliminary plan is to use the qcow format to implement online snapshots of disks through virt-launcher(qemu-img snapshot or virsh snapshot etc.. )


realistically, it's unlikely we're going to officially support qcow2 files on PVCs.  That's been a difficult conversation. The simplicity of knowing that a disk will always be in raw format on a PVC affords us the ability to make assumptions that get much more complex once we mix qcow2 into the mix along with qcow2 snapshots.

There is a path for you to achieve what you're wanting though regardless of what the kubevirt community agrees on.

you can create a sidecar hook [1] [2] which allows you to modify the libvirt Domain XML however you want. This would give you the ability to convert our "raw" assumption for pvc disks to "qcow2". [1] shows you how the vmi definition looks, [2] shows you how the side car works. [3] shows you how we're testing this e2e. Make sure to enable the "Sidecar" feature gate on the KubeVirt cr when installing kubevirt to enable this behavior. 

hopefully that helps

- David


mzzga...@gmail.com

unread,
Mar 8, 2021, 5:02:17 AM3/8/21
to kubevirt-dev
thanks, I want to try it

John Snow

unread,
Mar 8, 2021, 12:33:09 PM3/8/21
to Adam Litke, Mazzystr, 高鹏, kubevirt-dev
On 3/3/21 2:18 PM, Adam Litke wrote:
> * For security reasons libvirt must know if it is safe to interpret a
> disk image as qcow2.  To support qcow2 we would need to enhance the
> VMI API with a "format" field and teach users how to properly set it
> for each VM disk.

I suppose there's nothing that can really be done here, though it
doesn't seem like a huge ordeal to allow a user to configure a VM image
as being raw/qcow2.

> * Qcow2 metadata overhead is non-deterministic and reduces the amount
> of usable space on the PVC

I suppose the non-deterministic attributes being referenced here are
when you are using things like snapshots and incremental backups and so on.

> * We do not want to rely on qcow2 features (such as qcow2 snapshots
> and backing chains) because they tightly couple the compute layer to
> the storage layer.

What's the vision for offering these features, if any?

There's increasing awareness at the QEMU developer level that we cannot
expect large distributed systems to rely on e.g. qcow2, but I am not
sure there's a cohesive vision of how to manage that going forward.

I know Alice had done some work into investigating the marriage of qemu
storage daemon and CSI, but in my limited knowledge only recall that
someone else was working on a different angle for the same/similar problems.

What's the latest news?

--js

Adam Litke

unread,
Mar 8, 2021, 2:31:51 PM3/8/21
to John Snow, Alice Frosi, Mazzystr, 高鹏, kubevirt-dev
On Mon, Mar 8, 2021 at 12:33 PM John Snow <js...@redhat.com> wrote:
On 3/3/21 2:18 PM, Adam Litke wrote:
>   * For security reasons libvirt must know if it is safe to interpret a
>     disk image as qcow2.  To support qcow2 we would need to enhance the
>     VMI API with a "format" field and teach users how to properly set it
>     for each VM disk.

I suppose there's nothing that can really be done here, though it
doesn't seem like a huge ordeal to allow a user to configure a VM image
as being raw/qcow2.

If anything, I could see a PVC annotation as a nice place to store such a format designator.  It's a solvable problem but I would still like to understand the specific use cases that people are targeting with qcow2 so that we can consider if kubernetes offers a way to provide this without creating virtual machine specific features.  A major goal of kubevirt is to remain kubernetes-native and to provide a converged infrastructure for running VMs and containers.
 
>   * Qcow2 metadata overhead is non-deterministic and reduces the amount
>     of usable space on the PVC

I suppose the non-deterministic attributes being referenced here are
when you are using things like snapshots and incremental backups and so on.

Yes, but even just allocation of the image causes overhead since qemu needs to implement a block indexing data structure in the image file.
 
>   * We do not want to rely on qcow2 features (such as qcow2 snapshots
>     and backing chains) because they tightly couple the compute layer to
>     the storage layer.

What's the vision for offering these features, if any?

Snapshots should be provided by the CSI storage drivers and be triggered via the kubernetes APIs.  This enables storage drivers to optimize the performance and storage requirements of snapshots and also keeps containerized workloads and vms converged so that integration with other systems such as cluster backup and replication are seamless.  The same can be said with respect to encryption.  Changed-block tracking will be more difficult but this feature is desired for containerized workloads as well.  It will be best to work in the CSI and kubernetes storage SIG to push these priorities.
 
There's increasing awareness at the QEMU developer level that we cannot
expect large distributed systems to rely on e.g. qcow2, but I am not
sure there's a cohesive vision of how to manage that going forward.

I know Alice had done some work into investigating the marriage of qemu
storage daemon and CSI, but in my limited knowledge only recall that
someone else was working on a different angle for the same/similar problems.

What's the latest news?

+Alice Frosi to give a more technical answer regarding QSD.  From my point of view, qemu has some really useful software-defined storage features and these can be broken out from the virtualization part of qemu and exposed via a CSI driver.  This is exactly what the QSD research is designed to prove.  Right now we are searching for the best use cases to implement.  It seems that QSD may provide a short to medium term solution to VMs only, whereas the real answer is to provide features that work equally well for containers and VMs.
 

John Snow

unread,
Mar 8, 2021, 2:53:32 PM3/8/21
to Adam Litke, Alice Frosi, Mazzystr, 高鹏, kubevirt-dev
On 3/8/21 2:31 PM, Adam Litke wrote:
>
>
> On Mon, Mar 8, 2021 at 12:33 PM John Snow <js...@redhat.com
> <mailto:js...@redhat.com>> wrote:
>
> On 3/3/21 2:18 PM, Adam Litke wrote:
> >   * For security reasons libvirt must know if it is safe to
> interpret a
> >     disk image as qcow2.  To support qcow2 we would need to
> enhance the
> >     VMI API with a "format" field and teach users how to properly
> set it
> >     for each VM disk.
>
> I suppose there's nothing that can really be done here, though it
> doesn't seem like a huge ordeal to allow a user to configure a VM image
> as being raw/qcow2.
>
>
> If anything, I could see a PVC annotation as a nice place to store such
> a format designator.  It's a solvable problem but I would still like to
> understand the specific use cases that people are targeting with qcow2
> so that we can consider if kubernetes offers a way to provide this
> without creating virtual machine specific features.  A major goal of
> kubevirt is to remain kubernetes-native and to provide a converged
> infrastructure for running VMs and containers.
>

I'm in the same boat -- just trying to understand the use cases and the
demands out there. I just want to understand the problems that the
format poses for you in practice so I can have that in mind as I assess
the use cases and user stories. Nothing more, nothing less O:-)

AFAIUI the major user story is simply:

- I am using kubevirt and I want to use pre-made VM images distributed
as qcow2.

And this can be solved by converting these images on import through
various means, as already addressed in this thread. Actually using them
at runtime is a different story, and the demands for that are not
immediately clear to me at present.

> >   * Qcow2 metadata overhead is non-deterministic and reduces the
> amount
> >     of usable space on the PVC
>
> I suppose the non-deterministic attributes being referenced here are
> when you are using things like snapshots and incremental backups and
> so on.
>
>
> Yes, but even just allocation of the image causes overhead since qemu
> needs to implement a block indexing data structure in the image file.
>

If you didn't pre-allocate the metadata, yes.

> >   * We do not want to rely on qcow2 features (such as qcow2 snapshots
> >     and backing chains) because they tightly couple the compute
> layer to
> >     the storage layer.
>
> What's the vision for offering these features, if any?
>
>
> Snapshots should be provided by the CSI storage drivers and be triggered
> via the kubernetes APIs.  This enables storage drivers to optimize the
> performance and storage requirements of snapshots and also keeps
> containerized workloads and vms converged so that integration with other
> systems such as cluster backup and replication are seamless.  The same
> can be said with respect to encryption.  Changed-block tracking will be
> more difficult but this feature is desired for containerized workloads
> as well.  It will be best to work in the CSI and kubernetes storage SIG
> to push these priorities.
>
> There's increasing awareness at the QEMU developer level that we cannot
> expect large distributed systems to rely on e.g. qcow2, but I am not
> sure there's a cohesive vision of how to manage that going forward.
>
> I know Alice had done some work into investigating the marriage of qemu
> storage daemon and CSI, but in my limited knowledge only recall that
> someone else was working on a different angle for the same/similar
> problems.
>
> What's the latest news?
>
>
> +Alice Frosi <mailto:afr...@redhat.com> to give a more technical answer
> regarding QSD.  From my point of view, qemu has some really useful
> software-defined storage features and these can be broken out from the
> virtualization part of qemu and exposed via a CSI driver.  This is
> exactly what the QSD research is designed to prove.  Right now we are
> searching for the best use cases to implement.  It seems that QSD may
> provide a short to medium term solution to VMs only, whereas the real
> answer is to provide features that work equally well for containers and VMs.
>
Sure. What features do we need, and what level should they be
implemented at long-term? It's not clear to me immediately what the
perceived shortcomings are if qcow2 is taken out of the mix.

I imagine the big features of qcow2 are:

- Filesystem independent sparseness
- Forking snapshots
- Deduplication (shared read-only base images)
- CBT/Incremental/Differential Backups

which ones remain important to address for kubevirt or kubernetes as a
whole?

--js

Alice Frosi

unread,
Mar 9, 2021, 4:04:09 AM3/9/21
to John Snow, Adam Litke, Mazzystr, 高鹏, kubevirt-dev
This is exactly what CDI is doing, and at the end, those images are converted to raw, and used as raw from qemu.
I apologize if all the people do not understand exactly what the QSD is. This is normal. It is a new tool that incorporates the block features of qemu [1]. There has been a very early attempt to create a CSI plugin using this tool on top of other storage drivers (it is not publically available). This PoC has shown the complexity of the management of this plugin. There is an overlap between the storage provider and the QSD functionalities.  Additionally, the plugin is not using libvirt, but it should interact with the QSD directly with QMP calls. It has the same disk hotplug issue that kubevirt has. In short, a lot of questions still have to be answered. As Adam mentioned, we are searching for a good use case to implement and focus on.
If somebody is interested I can polish the work, and publish it somewhere. I just give a brief explication because the work has been mentioned in the thread :)


Sure. What features do we need, and what level should they be
implemented at long-term? It's not clear to me immediately what the
perceived shortcomings are if qcow2 is taken out of the mix.

I imagine the big features of qcow2 are:

- Filesystem independent sparseness
- Forking snapshots
- Deduplication (shared read-only base images)
- CBT/Incremental/Differential Backups

which ones remain important to address for kubevirt or kubernetes as a
whole?
About the online snapshot and CBT, please keep in mind that that the CSI APIs do not exist yet. There is a Kubernetes workgroup that is designing the interfaces but it is still an ongoing work [2] (design document for CBT[3]). For a better overview, I encourage you to have a look at Michael and Ryan's presentation about snapshots [4].

Back to the QEMU snapshot topic, you need to consider the size of the PVC you request, and it is difficult to take into account the additional space of the snapshot (and the metadata). Right now, CDI creates a raw image on top of a PVC. The requested size of the PVC depends on the requested image size plus some (small) overhead. Here, how do you estimate the additional overhead needed for the snapshot?
Using the CSI interface, Kubernetes creates an object and queries the storage provider for the additional space required by the snapshot. In this case, the additional space is properly taken into account by the CSI call.


Alice


--js

mzzga...@gmail.com

unread,
Mar 10, 2021, 2:42:04 AM3/10/21
to kubevirt-dev
Hi,

In k8s snapshots, online snapshots can be implemented, but there is no way to roll back online. Because pvc-based snapshot recovery is to create a new pv to import data. (Local Disk)

Fabian Deutsch

unread,
Mar 10, 2021, 3:32:37 AM3/10/21
to Alice Frosi, John Snow, Adam Litke, Mazzystr, 高鹏, kubevirt-dev
I understand why there is the recurring interest in qcow2. At the same time I think it's importat to see that a main goal of KubeVirt is to feel cloud native and tie nicely into the concepts that Kubernetes provides. Building qcow2 support into the core of kubevirt would undermine this, the risk is that we diverge and get more easily into situations where the behavior and expectations between the kube storage and qcow2 become confusing at best and an undefined behavior at worst.

However, at the same time I actually wonder if we can make it easier to allow people to leverag qcow2 externally in kubevirt. I.e. qcow2 can be read with nbdkit (iiuic), thus if i.e. container disks were able to consume disks from a nbd socket instead of from a file only, then there would be much more flexibility of what kind of storage (format) is being used, while at the same time allow KubeVirt to focus on the core things.

IOW could KubeVirt look for a well defined nbdkit unix socket in the containerDisk in addition to a file?
The containerDisk could then decide what is providing this socket, i.e. an nbd server reading a qcow2 file. It's not native support, but better than the hook, and more well defined.


 


Sure. What features do we need, and what level should they be
implemented at long-term? It's not clear to me immediately what the
perceived shortcomings are if qcow2 is taken out of the mix.

I imagine the big features of qcow2 are:

- Filesystem independent sparseness
- Forking snapshots
- Deduplication (shared read-only base images)
- CBT/Incremental/Differential Backups

which ones remain important to address for kubevirt or kubernetes as a
whole?
About the online snapshot and CBT, please keep in mind that that the CSI APIs do not exist yet. There is a Kubernetes workgroup that is designing the interfaces but it is still an ongoing work [2] (design document for CBT[3]). For a better overview, I encourage you to have a look at Michael and Ryan's presentation about snapshots [4].

Back to the QEMU snapshot topic, you need to consider the size of the PVC you request, and it is difficult to take into account the additional space of the snapshot (and the metadata). Right now, CDI creates a raw image on top of a PVC. The requested size of the PVC depends on the requested image size plus some (small) overhead. Here, how do you estimate the additional overhead needed for the snapshot?
Using the CSI interface, Kubernetes creates an object and queries the storage provider for the additional space required by the snapshot. In this case, the additional space is properly taken into account by the CSI call.


Alice


--js

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.

Alice Frosi

unread,
Mar 10, 2021, 6:16:09 AM3/10/21
to Fabian Deutsch, John Snow, Adam Litke, Mazzystr, 高鹏, kubevirt-dev
Fabian, this is the idea behind the QSD CSI plugin. The part that is exposed to kubevirt is simply a socket (nbd or better vhost-user). In this way, the disk is managed by the qemu-storage-daemon, and you can interact with it using CSI (or maybe some additional APIs). This will be completely separated from Kubevirt. We just need a way to pass correctly the socket to Kubevirt.

Alice 

Fabian Deutsch

unread,
Mar 10, 2021, 9:14:45 AM3/10/21
to Alice Frosi, John Snow, Adam Litke, Mazzystr, 高鹏, kubevirt-dev
OK, and then maybe what I was bring up above might help here: In addition to supportin g a file on a PV or containerDisk, we could add (or move to) nbd sockets, I suppose they are ok for anything other than block dev and passthrough. If we moved to nbd sockets, then this would be QSD's integration point.

mzzga...@gmail.com

unread,
Mar 10, 2021, 10:49:24 PM3/10/21
to kubevirt-dev
Hi
 We are using local disks, we need to use the cdi-clone function to copy data when using cold migration. Using qcow format disks can save link overhead during transmission.

Alexander Wels

unread,
Mar 11, 2021, 7:59:02 AM3/11/21
to mzzga...@gmail.com, kubevirt-dev
On Wed, Mar 10, 2021 at 10:49 PM mzzga...@gmail.com <mzzga...@gmail.com> wrote:
Hi
 We are using local disks, we need to use the cdi-clone function to copy data when using cold migration. Using qcow format disks can save link overhead during transmission.


cdi clone uses both tar with sparsify and compression to not send 0's over the wire. Have you actually compared the difference in throughput between the two types using the clone?
 

mzzga...@gmail.com

unread,
Mar 15, 2021, 4:34:51 AM3/15/21
to kubevirt-dev

Where is the specific code implementation of compression in cdi-clone? What I see is the mirror upload operation of pipeToSnappy of cdi-cloner using io.Copy using the post interface and cdi-uploadserver

mzzga...@gmail.com

unread,
Mar 15, 2021, 6:07:07 AM3/15/21
to kubevirt-dev

mzzga...@gmail.com

unread,
Mar 16, 2021, 10:16:19 PM3/16/21
to kubevirt-dev
Hi,
We verified the raw migration on the 1.25 version and compared the time with the 1.31 version migration. We found that the cid-clone time of the two is the same, and we compared the code and found that the 1.25 code did not introduce the "tar Scv" compressed upload disk, in 1.28 And the "tar Scv" compression is introduced in the above version, I would like to ask why io.reader is used for uploading in the 1.25 version, why is the efficiency so high? Is it only to read valid data to upload?

Alexander Wels

unread,
Mar 19, 2021, 2:26:03 PM3/19/21
to mzzga...@gmail.com, kubevirt-dev
On Tue, Mar 16, 2021 at 10:16 PM mzzga...@gmail.com <mzzga...@gmail.com> wrote:
Hi,

We verified the raw migration on the 1.25 version and compared the time with the 1.31 version migration. We found that the cid-clone time of the two is the same, and we compared the code and found that the 1.25 code did not introduce the "tar Scv" compressed upload disk, in 1.28 And the "tar Scv" compression is introduced in the above version, I would like to ask why io.reader is used for uploading in the 1.25 version, why is the efficiency so high? Is it only to read valid data to upload?


I am not quite sure what you are asking here. The most recent changes to cloning we made was to replace gzip compression with snappy compression which works better when streaming data. The sparseness of tar has been in cloning since before 1.20 if I remember correctly.
 
Reply all
Reply to author
Forward
0 new messages