Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Vhostuser Network Binding Plugin Design Proposal

286 views
Skip to first unread message

benoit....@orange.com

unread,
Apr 15, 2024, 11:52:13 AM4/15/24
to kubevi...@googlegroups.com, afr...@redhat.com, Fabian Deutsch

Hi,

 

We are working on a vhostuser network binding plugin for Kubevirt and wanted to share its design with the community.

 

Vhostuser interfaces are required to attach VM to a userspace/dpdk dataplane like OVS-DPDK or VPP.

 

The attached design proposal describes how we implemented the vhostuser network binding plugin so far and focuses on the issue of sharing the vhostuser unix socket files between the virt-launcher pod’s compute container, the Multus/CNI pod and the dataplane pod.

Today, we rely on the virt-launcher pod “sockets” emptyDir, but a cleaner approach would be welcome.

 

We already had some discussion with Alice Frosi and Fabian Deutsch, and plan to attend next community meeting on April 17th to discuss about the proposal.

 

A more formal PR for this proposal will follow soon.

 

Regards,

 

Benoit.

 

 

 

 


Orange Restricted

____________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.
kubervirt-network-vhostuser-binding-design-proposal.pdf

Miguel Duarte de Mora Barroso

unread,
Apr 15, 2024, 12:06:24 PM4/15/24
to benoit....@orange.com, kubevi...@googlegroups.com, afr...@redhat.com, Fabian Deutsch
On Mon, Apr 15, 2024 at 5:52 PM <benoit....@orange.com> wrote:

Hi,

 

We are working on a vhostuser network binding plugin for Kubevirt and wanted to share its design with the community.

 

Vhostuser interfaces are required to attach VM to a userspace/dpdk dataplane like OVS-DPDK or VPP.

 

The attached design proposal describes how we implemented the vhostuser network binding plugin so far and focuses on the issue of sharing the vhostuser unix socket files between the virt-launcher pod’s compute container, the Multus/CNI pod and the dataplane pod.

Today, we rely on the virt-launcher pod “sockets” emptyDir, but a cleaner approach would be welcome.

 

We already had some discussion with Alice Frosi and Fabian Deutsch, and plan to attend next community meeting on April 17th to discuss about the proposal. 

 

A more formal PR for this proposal will follow soon.


Thank you. This topic is extremely interesting - so much we've seen multiple efforts in the past to implement it. 

Part of the reason none of them went anywhere is because there was not a proposed way to integrate any sort of e2e testing in it. While I know it is not trivial, please make sure to avoid that pitfall. Ensure your proposal covers that.

Also, would you quickly explain the differences between your proposal and [0] (for instance) ? 

 

Regards,

 

Benoit.

 

 

 

 

 


Orange Restricted

____________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/DB9PR02MB975533FDC83AAEBD81B136D3E8092%40DB9PR02MB9755.eurprd02.prod.outlook.com.

Edward Haas

unread,
Apr 15, 2024, 12:51:44 PM4/15/24
to benoit....@orange.com, kubevi...@googlegroups.com, afr...@redhat.com, Fabian Deutsch, Miguel Duarte de Mora Barroso, Alona Paz
On Mon, Apr 15, 2024 at 7:06 PM Miguel Duarte de Mora Barroso <mdba...@redhat.com> wrote:


On Mon, Apr 15, 2024 at 5:52 PM <benoit....@orange.com> wrote:

Hi,

 

We are working on a vhostuser network binding plugin for Kubevirt and wanted to share its design with the community.

 

Vhostuser interfaces are required to attach VM to a userspace/dpdk dataplane like OVS-DPDK or VPP.

 

The attached design proposal describes how we implemented the vhostuser network binding plugin so far and focuses on the issue of sharing the vhostuser unix socket files between the virt-launcher pod’s compute container, the Multus/CNI pod and the dataplane pod.

Today, we rely on the virt-launcher pod “sockets” emptyDir, but a cleaner approach would be welcome.

 

We already had some discussion with Alice Frosi and Fabian Deutsch, and plan to attend next community meeting on April 17th to discuss about the proposal.


Nice, looking forward to hearing more details.

 

A more formal PR for this proposal will follow soon.


Thank you. This topic is extremely interesting - so much we've seen multiple efforts in the past to implement it. 

+1

Part of the reason none of them went anywhere is because there was not a proposed way to integrate any sort of e2e testing in it. While I know it is not trivial, please make sure to avoid that pitfall. Ensure your proposal covers that.

If the plugin is developed in its own repo, outside kubevirt/kubevirt, there are *no* requirements from the kubevirt side (including e2e tests).
It is actually prefered to have this managed outside the core project.


Also, would you quickly explain the differences between your proposal and [0] (for instance) ? 

 

Regards,

 

Benoit.

 

 

 

 

 


Orange Restricted

____________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/DB9PR02MB975533FDC83AAEBD81B136D3E8092%40DB9PR02MB9755.eurprd02.prod.outlook.com.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.

Benoit Gaussen

unread,
Apr 15, 2024, 1:15:44 PM4/15/24
to kubevirt-dev
Thank you Miguel for your feedback,

Answers inline.

Benoit.

On Monday 15 April 2024 at 18:06:24 UTC+2 Miguel Duarte de Mora Barroso wrote:
On Mon, Apr 15, 2024 at 5:52 PM <benoit....@orange.com> wrote:

Hi,

 

We are working on a vhostuser network binding plugin for Kubevirt and wanted to share its design with the community.

 

Vhostuser interfaces are required to attach VM to a userspace/dpdk dataplane like OVS-DPDK or VPP.

 

The attached design proposal describes how we implemented the vhostuser network binding plugin so far and focuses on the issue of sharing the vhostuser unix socket files between the virt-launcher pod’s compute container, the Multus/CNI pod and the dataplane pod.

Today, we rely on the virt-launcher pod “sockets” emptyDir, but a cleaner approach would be welcome.

 

We already had some discussion with Alice Frosi and Fabian Deutsch, and plan to attend next community meeting on April 17th to discuss about the proposal. 

 

A more formal PR for this proposal will follow soon.


Thank you. This topic is extremely interesting - so much we've seen multiple efforts in the past to implement it. 


We're quite aware of the various implementations and PR about that. We think the Network Binding Plugin framework was the missing part to make a clean implementation.
 
Part of the reason none of them went anywhere is because there was not a proposed way to integrate any sort of e2e testing in it. While I know it is not trivial, please make sure to avoid that pitfall. Ensure your proposal covers that.

I agree on that point. We are running some CI testing on our implementation right now. It surely needs to be adapted to Kubevirt e2e test environment, we're not yet familiar with it. I guess the challenge is to have the right configuration for hugepages and a working data plane running. DPDK is not mandatory and a specific NIC to attach to the data plane is not required I think.
 
 
Also, would you quickly explain the differences between your proposal and [0] (for instance) ? 

The main difference is that we implemented vhostuser as a Network Binding Plugin following this design [1]. The plugin is run as a sidecar container in virt-launcher pod, and modifies the domain XML to add and configure vhostuser interface according to the VMI spec. Proposal [0] was quite invasive, required annotations specific to the CNI, and at the end it was proposed to implement it as network binding plugin. So here we are ;)
 

 

Regards,

 

Benoit.

 

 

Miguel Duarte de Mora Barroso

unread,
Apr 15, 2024, 1:20:23 PM4/15/24
to Benoit Gaussen, kubevirt-dev
On Mon, Apr 15, 2024 at 7:15 PM Benoit Gaussen <benoit....@orange.com> wrote:
Thank you Miguel for your feedback,

Answers inline.

Benoit.

On Monday 15 April 2024 at 18:06:24 UTC+2 Miguel Duarte de Mora Barroso wrote:
On Mon, Apr 15, 2024 at 5:52 PM <benoit....@orange.com> wrote:

Hi,

 

We are working on a vhostuser network binding plugin for Kubevirt and wanted to share its design with the community.

 

Vhostuser interfaces are required to attach VM to a userspace/dpdk dataplane like OVS-DPDK or VPP.

 

The attached design proposal describes how we implemented the vhostuser network binding plugin so far and focuses on the issue of sharing the vhostuser unix socket files between the virt-launcher pod’s compute container, the Multus/CNI pod and the dataplane pod.

Today, we rely on the virt-launcher pod “sockets” emptyDir, but a cleaner approach would be welcome.

 

We already had some discussion with Alice Frosi and Fabian Deutsch, and plan to attend next community meeting on April 17th to discuss about the proposal. 

 

A more formal PR for this proposal will follow soon.


Thank you. This topic is extremely interesting - so much we've seen multiple efforts in the past to implement it. 


We're quite aware of the various implementations and PR about that. We think the Network Binding Plugin framework was the missing part to make a clean implementation.

Good point. Yeah, that effort was created to help the community evolve faster with less "core" people intervention.
 
 
Part of the reason none of them went anywhere is because there was not a proposed way to integrate any sort of e2e testing in it. While I know it is not trivial, please make sure to avoid that pitfall. Ensure your proposal covers that.

I agree on that point. We are running some CI testing on our implementation right now. It surely needs to be adapted to Kubevirt e2e test environment, we're not yet familiar with it. I guess the challenge is to have the right configuration for hugepages and a working data plane running. DPDK is not mandatory and a specific NIC to attach to the data plane is not required I think.

That's arguable, but if you do that, you'd still have gotten a lot further than anyone else :) 
 
 
 
Also, would you quickly explain the differences between your proposal and [0] (for instance) ? 

The main difference is that we implemented vhostuser as a Network Binding Plugin following this design [1]. The plugin is run as a sidecar container in virt-launcher pod, and modifies the domain XML to add and configure vhostuser interface according to the VMI spec. Proposal [0] was quite invasive, required annotations specific to the CNI, and at the end it was proposed to implement it as network binding plugin. So here we are ;)

Nice to hear the effort paid off.  Good luck.

 

 

Regards,

 

Benoit.

 

 

 

 

 


Orange Restricted

____________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/DB9PR02MB975533FDC83AAEBD81B136D3E8092%40DB9PR02MB9755.eurprd02.prod.outlook.com.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.

Alice Frosi

unread,
Apr 16, 2024, 3:04:56 AM4/16/24
to Miguel Duarte de Mora Barroso, Edward Haas, Benoit Gaussen, kubevirt-dev
Hi Miguel and Edward,

You are more familiar with Multus than me and I hope you can help us here.

As Benoit explained in the first email, they are currently finding the vhost-user socket by looking into the pods filesystem (a similar approach as the handler connects to the launcher). However, as far as I understand, starting from the next Multus version, the CNI plugin won't have access anymore to the host filesystem, therefore they aren't able to find the socket anymore (@Benoit please correct any wrong details here).

Generally, QEMU supports the vhost-user protocol where the backend device handling is delegated to a third party component. It is true for networking, but also for other types of devices, see [1].

I was thinking if we could introduce a new plugin mechanism where we could expose the content of a directory of virt-launcher to an external plugin. See the attachment for a picture with the flow.
I see a couple of advantages of this approach:
- is generic and could also be potentially reused by other device types
- hides the KubeVirt implementation details. Currently, you need to know where the KubeVirt sockets are located in the virt-launcher filesystem. Potentially, if we change the directory path for the sockets, this would break the CNI plugin
- can isolate the resources dedicate to that particular plugin


Many thanks,
Alice

kubevirt-plugin-extension.png

Benoit Gaussen

unread,
Apr 16, 2024, 5:17:26 AM4/16/24
to kubevirt-dev
Thanks Alice for your support.

Some precision below.

Benoit.

On Tuesday 16 April 2024 at 09:04:56 UTC+2 Alice Frosi wrote:
Hi Miguel and Edward,

You are more familiar with Multus than me and I hope you can help us here.

As Benoit explained in the first email, they are currently finding the vhost-user socket by looking into the pods filesystem (a similar approach as the handler connects to the launcher). However, as far as I understand, starting from the next Multus version, the CNI plugin won't have access anymore to the host filesystem, therefore they aren't able to find the socket anymore (@Benoit please correct any wrong details here).


In fact, with Multus 4 with thick plugin, the CNI still  has access to the host filesystem through a chroot, but today it does a bind mount of the emptydir sockets volume from virt-launcher to a host directory (step 1. in the proposal diagram). But this requires a Bidirectional mountPropagation in the Multus daemonset that is not the default in deployments.
A more cleaner and straightforward solution as proposed by Alice would be great.

Miguel Duarte de Mora Barroso

unread,
Apr 16, 2024, 6:43:37 AM4/16/24
to Alice Frosi, Edward Haas, Benoit Gaussen, kubevirt-dev
On Tue, Apr 16, 2024 at 9:04 AM Alice Frosi <afr...@redhat.com> wrote:
Hi Miguel and Edward,

You are more familiar with Multus than me and I hope you can help us here.

As Benoit explained in the first email, they are currently finding the vhost-user socket by looking into the pods filesystem (a similar approach as the handler connects to the launcher). However, as far as I understand, starting from the next Multus version, the CNI plugin won't have access anymore to the host filesystem, therefore they aren't able to find the socket anymore (@Benoit please correct any wrong details here).

Generally, QEMU supports the vhost-user protocol where the backend device handling is delegated to a third party component. It is true for networking, but also for other types of devices, see [1].

I was thinking if we could introduce a new plugin mechanism where we could expose the content of a directory of virt-launcher to an external plugin. See the attachment for a picture with the flow.
I see a couple of advantages of this approach:
- is generic and could also be potentially reused by other device types
- hides the KubeVirt implementation details. Currently, you need to know where the KubeVirt sockets are located in the virt-launcher filesystem. Potentially, if we change the directory path for the sockets, this would break the CNI plugin
- can isolate the resources dedicate to that particular plugin

I have to say at this abstraction level it seems to make sense (at least to me). 

What I don't kind of understand is how is this any different from mounting a volume from the launcher pod to the node ? .... 

Which makes me think I'm missing something :)

Alice Frosi

unread,
Apr 16, 2024, 7:02:57 AM4/16/24
to Miguel Duarte de Mora Barroso, Edward Haas, Benoit Gaussen, kubevirt-dev
Hi Miguel

On Tue, Apr 16, 2024 at 12:43 PM Miguel Duarte de Mora Barroso <mdba...@redhat.com> wrote:


On Tue, Apr 16, 2024 at 9:04 AM Alice Frosi <afr...@redhat.com> wrote:
Hi Miguel and Edward,

You are more familiar with Multus than me and I hope you can help us here.

As Benoit explained in the first email, they are currently finding the vhost-user socket by looking into the pods filesystem (a similar approach as the handler connects to the launcher). However, as far as I understand, starting from the next Multus version, the CNI plugin won't have access anymore to the host filesystem, therefore they aren't able to find the socket anymore (@Benoit please correct any wrong details here).

Generally, QEMU supports the vhost-user protocol where the backend device handling is delegated to a third party component. It is true for networking, but also for other types of devices, see [1].

I was thinking if we could introduce a new plugin mechanism where we could expose the content of a directory of virt-launcher to an external plugin. See the attachment for a picture with the flow.
I see a couple of advantages of this approach:
- is generic and could also be potentially reused by other device types
- hides the KubeVirt implementation details. Currently, you need to know where the KubeVirt sockets are located in the virt-launcher filesystem. Potentially, if we change the directory path for the sockets, this would break the CNI plugin
- can isolate the resources dedicate to that particular plugin

I have to say at this abstraction level it seems to make sense (at least to me). 

What I don't kind of understand is how is this any different from mounting a volume from the launcher pod to the node ? .... 

Virt-launcher cannot use an hostPath volume because this requires the pod to be privileged.

Additionally, in this way, the CNI plugin only needs access to a single directory where all the directory dedicated to different VM will appear.

Miguel Duarte de Mora Barroso

unread,
Apr 16, 2024, 7:18:15 AM4/16/24
to Alice Frosi, Edward Haas, Benoit Gaussen, kubevirt-dev
On Tue, Apr 16, 2024 at 1:02 PM Alice Frosi <afr...@redhat.com> wrote:
Hi Miguel

On Tue, Apr 16, 2024 at 12:43 PM Miguel Duarte de Mora Barroso <mdba...@redhat.com> wrote:


On Tue, Apr 16, 2024 at 9:04 AM Alice Frosi <afr...@redhat.com> wrote:
Hi Miguel and Edward,

You are more familiar with Multus than me and I hope you can help us here.

As Benoit explained in the first email, they are currently finding the vhost-user socket by looking into the pods filesystem (a similar approach as the handler connects to the launcher). However, as far as I understand, starting from the next Multus version, the CNI plugin won't have access anymore to the host filesystem, therefore they aren't able to find the socket anymore (@Benoit please correct any wrong details here).

Generally, QEMU supports the vhost-user protocol where the backend device handling is delegated to a third party component. It is true for networking, but also for other types of devices, see [1].

I was thinking if we could introduce a new plugin mechanism where we could expose the content of a directory of virt-launcher to an external plugin. See the attachment for a picture with the flow.
I see a couple of advantages of this approach:
- is generic and could also be potentially reused by other device types
- hides the KubeVirt implementation details. Currently, you need to know where the KubeVirt sockets are located in the virt-launcher filesystem. Potentially, if we change the directory path for the sockets, this would break the CNI plugin
- can isolate the resources dedicate to that particular plugin

I have to say at this abstraction level it seems to make sense (at least to me). 

What I don't kind of understand is how is this any different from mounting a volume from the launcher pod to the node ? .... 

Virt-launcher cannot use an hostPath volume because this requires the pod to be privileged.

I see. But *conceptually*, it's like a volume ... But iiuc, it would be virt-handler pulling the strings instead.

Makes sense !
 

Additionally, in this way, the CNI plugin only needs access to a single directory where all the directory dedicated to different VM will appear.

I fail to see the implication. Please elaborate :) 

Alice Frosi

unread,
Apr 16, 2024, 7:25:56 AM4/16/24
to Miguel Duarte de Mora Barroso, Edward Haas, Benoit Gaussen, kubevirt-dev
On Tue, Apr 16, 2024 at 1:18 PM Miguel Duarte de Mora Barroso <mdba...@redhat.com> wrote:


On Tue, Apr 16, 2024 at 1:02 PM Alice Frosi <afr...@redhat.com> wrote:
Hi Miguel

On Tue, Apr 16, 2024 at 12:43 PM Miguel Duarte de Mora Barroso <mdba...@redhat.com> wrote:


On Tue, Apr 16, 2024 at 9:04 AM Alice Frosi <afr...@redhat.com> wrote:
Hi Miguel and Edward,

You are more familiar with Multus than me and I hope you can help us here.

As Benoit explained in the first email, they are currently finding the vhost-user socket by looking into the pods filesystem (a similar approach as the handler connects to the launcher). However, as far as I understand, starting from the next Multus version, the CNI plugin won't have access anymore to the host filesystem, therefore they aren't able to find the socket anymore (@Benoit please correct any wrong details here).

Generally, QEMU supports the vhost-user protocol where the backend device handling is delegated to a third party component. It is true for networking, but also for other types of devices, see [1].

I was thinking if we could introduce a new plugin mechanism where we could expose the content of a directory of virt-launcher to an external plugin. See the attachment for a picture with the flow.
I see a couple of advantages of this approach:
- is generic and could also be potentially reused by other device types
- hides the KubeVirt implementation details. Currently, you need to know where the KubeVirt sockets are located in the virt-launcher filesystem. Potentially, if we change the directory path for the sockets, this would break the CNI plugin
- can isolate the resources dedicate to that particular plugin

I have to say at this abstraction level it seems to make sense (at least to me). 

What I don't kind of understand is how is this any different from mounting a volume from the launcher pod to the node ? .... 

Virt-launcher cannot use an hostPath volume because this requires the pod to be privileged.

I see. But *conceptually*, it's like a volume ... But iiuc, it would be virt-handler pulling the strings instead.

Makes sense !

Not from kubernetes' perspective as we don't have such a volume type.
 
 

Additionally, in this way, the CNI plugin only needs access to a single directory where all the directory dedicated to different VM will appear.

I fail to see the implication. Please elaborate :) 

If you read Benoit's last response they have the issue that for seeing the socket, they need to use the Bidirectional mountPropagation for Multus. 
With this approach fromt the CNI perspective, they only need to have a hostPath volume with HostToContainer mount propagation and it doesn't need to chroot in the host filesystem anymore.
The CNI plugin pod will detect the sockets as far as virt-handler makes them available in the shared directory.

Miguel Duarte de Mora Barroso

unread,
Apr 16, 2024, 10:10:27 AM4/16/24
to Alice Frosi, Edward Haas, Benoit Gaussen, kubevirt-dev
The CNI that chroots into the host namespace is multus - which in turn calls (os exec call) the CNI plugin in the node's mount namespace . Because CNIs are run as binaries on the node's filesystem. Per definition.

Thus, no matter what you do, multus will still need to chroot to the correct mount namespace - so the CNI plugin can grab CNI configurations, access the CNI binaries, pass log file paths, access to plugin specific data, etc.

Still, the effort is valid, and makes sense. I am just saying you won't avoid the chroot call. 
 
The CNI plugin pod will detect the sockets as far as virt-handler makes them available in the shared directory.

I get that - it all boils down to having a way for the launcher pod to make the sockets available in the node's filesystem afaict. 

An alternative would be to request qemu (or libvirt ??) to consume an existing socket file, which would be created by something else - like a device plugin. Just throwing it out there.

Let me reiterate I like your proposal - sounds simpler than what I wrote above.



Alice Frosi

unread,
Apr 16, 2024, 10:28:12 AM4/16/24
to Miguel Duarte de Mora Barroso, Edward Haas, Benoit Gaussen, kubevirt-dev
Ah thanks for the clarification! I didn't know it-
I still think that decoupling the CNI and KubeVirt will be good. If KubeVirt exposes the sockets in a directory on the host then the CNI plugin doesn't need to know KubeVirt internals and simply access the host directory.
 
 
The CNI plugin pod will detect the sockets as far as virt-handler makes them available in the shared directory.

I get that - it all boils down to having a way for the launcher pod to make the sockets available in the node's filesystem afaict. 

An alternative would be to request qemu (or libvirt ??) to consume an existing socket file, which would be created by something else - like a device plugin. Just throwing it out there.

I don't think it is possible. In this particular configuration is QEMU acting as a server and DPDK being the client that needs to connect to it.

Benoit Gaussen

unread,
Apr 24, 2024, 10:00:45 AM4/24/24
to kubevirt-dev
Hi All,

Since last community meeting (thank you all for the discussion!) I had a look at several solutions to share the sockets:

- a new hostPath volume defined at the vhostuser binding plugin configuration: obviously it work but requires privileged pod for both virt-launcher and the dataplane that consumes the socket

- a PVC volume defined at the vhostuser binding plugin configuration: must be a per node volume, for example using Hostpath Provisionner, mounted by both virt-launcher and dataplane pods. Issue with PVC is that they are namespaced so both pods need to be in the same namespace: it will not happen.

- I finally looked at device plugin: we can think about the dataplane as a switch with ports resources, and a VM would request one or several ports on it. The device plugin can define a list of host path mounts to be injected in the pods: the directories that will host the sockets. We'll need to push some annotations to the virt-launcher pod, to have a way to define the complete host paths. CNI reads the annotations of the pod, names the socket (push it in pod annotations?) and configures the dataplane. Then vhostuser network binding plugin needs to get access to the pod annotations (downward API?) to get the sockets names and modifies the domain XML.

Alice, maybe this device plugin approach is not reusable for other needs? It's also more complex and requires to introduce a new component...

Regards,

Benoit.

Edward Haas

unread,
Apr 25, 2024, 3:16:15 AM4/25/24
to Benoit Gaussen, kubevirt-dev
On Wed, Apr 24, 2024 at 5:00 PM Benoit Gaussen <benoit....@orange.com> wrote:
Hi All,

Since last community meeting (thank you all for the discussion!) I had a look at several solutions to share the sockets:

- a new hostPath volume defined at the vhostuser binding plugin configuration: obviously it work but requires privileged pod for both virt-launcher and the dataplane that consumes the socket

- a PVC volume defined at the vhostuser binding plugin configuration: must be a per node volume, for example using Hostpath Provisionner, mounted by both virt-launcher and dataplane pods. Issue with PVC is that they are namespaced so both pods need to be in the same namespace: it will not happen.

- I finally looked at device plugin: we can think about the dataplane as a switch with ports resources, and a VM would request one or several ports on it. The device plugin can define a list of host path mounts to be injected in the pods: the directories that will host the sockets. We'll need to push some annotations to the virt-launcher pod, to have a way to define the complete host paths. CNI reads the annotations of the pod, names the socket (push it in pod annotations?) and configures the dataplane. Then vhostuser network binding plugin needs to get access to the pod annotations (downward API?) to get the sockets names and modifies the domain XML.

Per what I managed to learn so far from this thread, to me, this seems the only possible option.
The DP can setup the socket at the node level and make it available to the pod through a mount.
It is used for other network resources and is not dependent on components like Multus.

It will probably also solve the need to have annotation on the pod, communicated from the CNI (which is nasty).
(AFAIK the DP can pass env variables into the container, communicating whatever information is needed)

I think you will need to pass a socket and not a folder, but that is an implementation detail that can be sorted out later.

If you have a draft design proposal available, it may be easier to communicate ideas through it.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.

Alice Frosi

unread,
Apr 26, 2024, 3:05:16 AM4/26/24
to Benoit Gaussen, kubevirt-dev
Hi Benoit,

On Wed, Apr 24, 2024 at 4:00 PM Benoit Gaussen <benoit....@orange.com> wrote:
Hi All,

Since last community meeting (thank you all for the discussion!) I had a look at several solutions to share the sockets:

- a new hostPath volume defined at the vhostuser binding plugin configuration: obviously it work but requires privileged pod for both virt-launcher and the dataplane that consumes the socket

- a PVC volume defined at the vhostuser binding plugin configuration: must be a per node volume, for example using Hostpath Provisionner, mounted by both virt-launcher and dataplane pods. Issue with PVC is that they are namespaced so both pods need to be in the same namespace: it will not happen.

- I finally looked at device plugin: we can think about the dataplane as a switch with ports resources, and a VM would request one or several ports on it. The device plugin can define a list of host path mounts to be injected in the pods: the directories that will host the sockets. We'll need to push some annotations to the virt-launcher pod, to have a way to define the complete host paths. CNI reads the annotations of the pod, names the socket (push it in pod annotations?) and configures the dataplane. Then vhostuser network binding plugin needs to get access to the pod annotations (downward API?) to get the sockets names and modifies the domain XML.

Alice, maybe this device plugin approach is not reusable for other needs? It's also more complex and requires to introduce a new component...

The evil is in the details here.

One thing to consider about device plugins. Device plugins expose resources on the entire node. Hence, they aren't namespaced and any pod in any namespace can consume them. Therefore, there aren't any RBAC controls for who can access and use the socket of your CNI plugin.
If you want to provide a generic way how any pod could consume your CNI plugin, then probably is a good option.
Would be possible that you just define a single resource with an unlimited number of instances and then you create a dpdk vhost-user socket on request for the pod creation.
 
Alice

Benoit Gaussen

unread,
Apr 27, 2024, 1:57:18 PM4/27/24
to kubevirt-dev
On Thursday 25 April 2024 at 09:16:15 UTC+2 Edward Haas wrote:
On Wed, Apr 24, 2024 at 5:00 PM Benoit Gaussen <benoit....@orange.com> wrote:
Hi All,

Since last community meeting (thank you all for the discussion!) I had a look at several solutions to share the sockets:

- a new hostPath volume defined at the vhostuser binding plugin configuration: obviously it work but requires privileged pod for both virt-launcher and the dataplane that consumes the socket

- a PVC volume defined at the vhostuser binding plugin configuration: must be a per node volume, for example using Hostpath Provisionner, mounted by both virt-launcher and dataplane pods. Issue with PVC is that they are namespaced so both pods need to be in the same namespace: it will not happen.

- I finally looked at device plugin: we can think about the dataplane as a switch with ports resources, and a VM would request one or several ports on it. The device plugin can define a list of host path mounts to be injected in the pods: the directories that will host the sockets. We'll need to push some annotations to the virt-launcher pod, to have a way to define the complete host paths. CNI reads the annotations of the pod, names the socket (push it in pod annotations?) and configures the dataplane. Then vhostuser network binding plugin needs to get access to the pod annotations (downward API?) to get the sockets names and modifies the domain XML.

Per what I managed to learn so far from this thread, to me, this seems the only possible option.
The DP can setup the socket at the node level and make it available to the pod through a mount.
It is used for other network resources and is not dependent on components like Multus.

I think we'll still need a CNI/Multus to configure the port on the dataplane, because DP doesn't have enough info to configure it by itself.
 
 
It will probably also solve the need to have annotation on the pod, communicated from the CNI (which is nasty).
(AFAIK the DP can pass env variables into the container, communicating whatever information is needed)

 
DP can pass env variables into the container requesting the resource. In our case it's the "compute" container and not the vhostuser network binding plugin container that will receive the env.
Annotations would be available to any container through downwardAPI.
 
I think you will need to pass a socket and not a folder, but that is an implementation detail that can be sorted out later.

Today the vhostuser socket is created in server mode by qemu. This seems to be the recommended way: ovs-dpdk is deprecating the server mode on its side, and it allows the data plane to be restarted without impact on the VM.
With this mode we can only mount a folder.

 
If you have a draft design proposal available, it may be easier to communicate ideas through it.

Sure I will update the design proposal initially sent here and do a PR as soon as I get back from PTO.

Benoit Gaussen

unread,
Apr 27, 2024, 2:04:32 PM4/27/24
to kubevirt-dev
Hi Alice,

Indeed, any pod can consume the DP resources, but as we need the CNI to configure it with a Network Attachment Definition, we can set RBAC/namespace on this NAD.
 
Would be possible that you just define a single resource with an unlimited number of instances and then you create a dpdk vhost-user socket on request for the pod creation.

It seems the resource count has to be a finite number, that can be user configurable. Anyway the DP can update this count, adding one new resource for each consumed one...

Benoit.
 
 
Alice

Alice Frosi

unread,
Apr 29, 2024, 2:45:44 AM4/29/24
to Benoit Gaussen, kubevirt-dev
Another question here. Doesn't the DP have the same issues as you currently have with the HostToContainer mount propagation. AFAIU, you want to use a DP that will mount an empty directory in virt-launcher (or any pod) and then QEMU running in the pod will create the socket there.
Would you be able to see the socket outside the virt-launcher pod?

Alice 

Benoit Gaussen

unread,
May 7, 2024, 4:17:47 AM5/7/24
to kubevirt-dev
With DP we shouldn't have mount propagation issue as there will be no further mount inside the shared socket directory. So yes the socket created by qemu should be accessible in other pod. The may be some issues with directory and socket access rights and SELinux?

Benoit.
 

Alice 

Fabian Deutsch

unread,
May 15, 2024, 5:12:56 AM5/15/24
to Benoit Gaussen, kubevirt-dev
Hi Benoit,

do you think that there was enough discussion to put this topic into a design proposal or shared google document in order to help to converge the discussion?

Greetings
- fabian
 
Benoit.
 

Alice 

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.

Benoit Gaussen

unread,
May 16, 2024, 4:04:48 AM5/16/24
to kubevirt-dev
Hi Fabian,

Yes I'll update the design proposal with the possible solutions (volumes and device plugin proposal) and go through a PR in kubevirt/community.
I hope this can happen quickly.

Benoit.

Fabian Deutsch

unread,
May 16, 2024, 4:12:36 AM5/16/24
to Benoit Gaussen, kubevirt-dev
Hi Benoit,

On Thu, May 16, 2024 at 10:04 AM Benoit Gaussen <benoit....@orange.com> wrote:
Hi Fabian,

Yes I'll update the design proposal with the possible solutions (volumes and device plugin proposal) and go through a PR in kubevirt/community.
I hope this can happen quickly.

Cool - Looking forward to it!
 

Benoit Gaussen

unread,
May 22, 2024, 8:29:19 AM5/22/24
to kubevirt-dev
Hi Fabian, Alice & all,

I finally created a PR #294 for this design proposal. Sorry for the delay !

Benoit.

Alice Frosi

unread,
Jun 5, 2024, 9:39:25 AM6/5/24
to Benoit Gaussen, kubevirt-dev, Edward Haas, Fabian Deutsch
Hi Benoit,

First, many thanks for the design proposal! 
Now, I have the impression that we have some common understanding of the problem and I feel we could solve the design questions with a meeting. 
Would it be fine for you? Does next week work for you? Anyone interested in the topic is also, of course, free to join.

Many thanks,
Alice

Benoit Gaussen

unread,
Jun 7, 2024, 12:32:38 PM6/7/24
to kubevirt-dev
Hi Alice,

Sorry for my late answer. Yes a meeting would be great to progress on the design.
Next week would be OK for me if not too late to organize, week after is OK also.

Thanks,

Benoit.

Alice Frosi

unread,
Jun 10, 2024, 4:25:19 AM6/10/24
to Benoit Gaussen, Edward Haas, kubevirt-dev
Hi Benoit, hi Ed,

Would it work for you Wednesday 12 at 1 PM UTC? Please let me known if it works for you and I'll send an invitation.

Regards,
Alice Frosi

Benoit Gaussen

unread,
Jun 10, 2024, 5:53:09 AM6/10/24
to kubevirt-dev
Hi All,

Wedsnesday 12, 12AM UTC or 2PM would work better for me, but I can manage 1PM if it's not possible otherwise.

Benoit.

Alice Frosi

unread,
Jun 10, 2024, 9:06:11 AM6/10/24
to Benoit Gaussen, Edward Haas, kubevirt-dev
Hi Benoit,

Edward is at a conference this week. Would it work for you next week (June 19) at 12 PM UTC?

Alice

Benoit Gaussen

unread,
Jun 10, 2024, 10:18:00 AM6/10/24
to kubevirt-dev
It's OK for me on June 19, 12 PM UTC.

Benoit.

Alice Frosi

unread,
Jun 11, 2024, 4:52:44 AM6/11/24
to Benoit Gaussen, kubevirt-dev
Hi Benoit, hi Ed,

you can find the invitation to the meeting in KubeVirt calendar [1] (including the zoom link[2]). Anyone interested in the topic is, of course, free to join!


Best regards,
Alice

Message has been deleted

Benoit Gaussen

unread,
Oct 14, 2024, 4:00:12 PM10/14/24
to kubevirt-dev

Hi All,

We did progress on the vhostuser network binding plugin and have an implementation working along with a device plugin that enables vhostuser sockets sharing between virt-launcher pods and dataplane.

This is documented in the PR #294.

However, we encountered an issue with Live Migration. To summarize, as the migration domain from source pod, is overriding the domain created at destination pod, the socket path must be the same at source and des