Proposal: GPU & vGPU pass through support for Kubevirt VMs

400 views
Skip to first unread message

Vishesh Tanksale

unread,
Aug 9, 2019, 1:57:57 AM8/9/19
to kubevirt-dev

Hi All,


We have been working for sometime on Kubevirt. As part of my work I have implemented NVIDIA GPU and vGPU pass through support for Kubevirt VMs. I would love to contribute this change to Kubevirt.

To get the ball rolling I have brief informal design of my implementation here. Please review and share your thoughts.


Thanks,

Vishesh Tanksale

Roman Mohr

unread,
Aug 9, 2019, 4:56:02 AM8/9/19
to Vishesh Tanksale, kubevirt-dev
Hi Vishesh,

On Fri, Aug 9, 2019 at 7:58 AM Vishesh Tanksale <vtan...@nvidia.com> wrote:

Hi All,


We have been working for sometime on Kubevirt. As part of my work I have implemented NVIDIA GPU and vGPU pass through support for Kubevirt VMs. I would love to contribute this change to Kubevirt.

That sounds absolutely great.
 

To get the ball rolling I have brief informal design of my implementation here. Please review and share your thoughts.


I looked over the design. I think that the design looks correct from the kubevirt point-of-view. Could you also share the intended API in the doc? (btw. looks like the doc does not allow comments). After briefly discussing the API in the doc, the rest could already be sorted out on a PR.

One last thing will be that we are used to also test all our functionalities end-to-end on PRs. We can discuss possibilities on the PR too.

Best Regards,

Roman
 


Thanks,

Vishesh Tanksale

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/f215a800-ed09-4027-9112-b817736ad027%40googlegroups.com.

Vishesh Tanksale

unread,
Aug 9, 2019, 1:53:08 PM8/9/19
to kubevirt-dev
Hi Roman,

Thank you for your reply.

I have opened the doc for comments. I have also added API design in there. 

I have end to end tests but those are very specific to my environment. As you mentioned we can discuss it more on PR.

Please let me know if you have any other questions.

Thanks,
Vishesh Tanksale

On Friday, August 9, 2019 at 1:56:02 AM UTC-7, Roman Mohr wrote:
Hi Vishesh,

On Fri, Aug 9, 2019 at 7:58 AM Vishesh Tanksale <vtan...@nvidia.com> wrote:

Hi All,


We have been working for sometime on Kubevirt. As part of my work I have implemented NVIDIA GPU and vGPU pass through support for Kubevirt VMs. I would love to contribute this change to Kubevirt.

That sounds absolutely great.
 

To get the ball rolling I have brief informal design of my implementation here. Please review and share your thoughts.


I looked over the design. I think that the design looks correct from the kubevirt point-of-view. Could you also share the intended API in the doc? (btw. looks like the doc does not allow comments). After briefly discussing the API in the doc, the rest could already be sorted out on a PR.

One last thing will be that we are used to also test all our functionalities end-to-end on PRs. We can discuss possibilities on the PR too.

Best Regards,

Roman
 


Thanks,

Vishesh Tanksale

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevi...@googlegroups.com.

Fabian Deutsch

unread,
Aug 13, 2019, 3:26:46 PM8/13/19
to Vishesh Tanksale, Vladik Romanovsky, David Vossel, Ihar Hrachyshka, kubevirt-dev
Hey Vishesh!

Great to see that you took the time to work on the proposal and implementation.
Just as Roman mentioned: The general design and direction looks great - and also the comments on your PR (kudos!) are positive.

One remaining question to me (also emerged on the PR) is: How do we reference the device to be passed through.
Your proposal is a) assuming that an NVIDIA DP is exposing the device, and b) the impl is using this specific resource
And it's implicit that there is a sort of protocol used between the DP and implementation (the env vars) in order to attach the correct device to the VM.
IOW: KubeVirt has knowledge about the resource name, which tells it to pass it through.

If another vendor would provide a DP for it's devices, then they would not be automagically passed through. Instead KubeVirt would need to see an update to recognize these new resources and know what to do.

The other option is that DPs to be used with KubeVirt need to fullfill certain requirements which give KubeVirt enough hints to tell it what to do. I.e. an env var is telling that it should be a plain PCI passthrough, and kubevirt is doing this.

Thus in the first case (a) KubeVirt is made aware of every new DP and in the second case (b) KubeVirt establishes a convention which a DP needs to fullfill to allow KubeVirt to introspect the resource and do the right thing.

Both have their pros and cons, and both don't put any burden on the user. But i still wonder if there are thoughts which one we want to start with?

Side note: I do think that it's reasonable to keep the device plugins external to core kubevirt, as it's a pretty specific use-case.

Despite these thoughts : Great to see this moving forward!

- fabian



On Fri, Aug 9, 2019 at 7:58 AM Vishesh Tanksale <vtan...@nvidia.com> wrote:
--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.

Vladik Romanovsky

unread,
Aug 13, 2019, 4:37:53 PM8/13/19
to Fabian Deutsch, Vishesh Tanksale, David Vossel, Ihar Hrachyshka, kubevirt-dev
On Tue, Aug 13, 2019 at 3:26 PM Fabian Deutsch <fdeu...@redhat.com> wrote:
Hey Vishesh!

Great to see that you took the time to work on the proposal and implementation.
Just as Roman mentioned: The general design and direction looks great - and also the comments on your PR (kudos!) are positive.

One remaining question to me (also emerged on the PR) is: How do we reference the device to be passed through.
Your proposal is a) assuming that an NVIDIA DP is exposing the device, and b) the impl is using this specific resource
And it's implicit that there is a sort of protocol used between the DP and implementation (the env vars) in order to attach the correct device to the VM.
IOW: KubeVirt has knowledge about the resource name, which tells it to pass it through.

If another vendor would provide a DP for it's devices, then they would not be automagically passed through. Instead KubeVirt would need to see an update to recognize these new resources and know what to do.

The other option is that DPs to be used with KubeVirt need to fullfill certain requirements which give KubeVirt enough hints to tell it what to do. I.e. an env var is telling that it should be a plain PCI passthrough, and kubevirt is doing this.

Thus in the first case (a) KubeVirt is made aware of every new DP and in the second case (b) KubeVirt establishes a convention which a DP needs to fullfill to allow KubeVirt to introspect the resource and do the right thing.

Both have their pros and cons, and both don't put any burden on the user. But i still wonder if there are thoughts which one we want to start with?

From my point of view, the device plugin and the way the devices are presented and being consumed by kubevirt can be generic.
I'm not sure why each vendor needs to provide its own plugin.

Currently, the PR uses an env var to pass all pci and mdev devices, virt-launcher can pass-through each in its own way or we can have two generic variables for gpus and vgpus (nvidea or intel)
 

Side note: I do think that it's reasonable to keep the device plugins external to core kubevirt, as it's a pretty specific use-case.
I don't see what vendor-specific activities should the device plugin do... I would image that it's better for kubevirt to keep the plugin in the core to make sure that it stays generic.
Message has been deleted

pawarsm...@gmail.com

unread,
Aug 13, 2019, 5:09:05 PM8/13/19
to kubevirt-dev


On Tuesday, August 13, 2019 at 1:37:53 PM UTC-7, Vladik Romanovsky wrote:


On Tue, Aug 13, 2019 at 3:26 PM Fabian Deutsch <fdeu...@redhat.com> wrote:
Hey Vishesh!

Great to see that you took the time to work on the proposal and implementation.
Just as Roman mentioned: The general design and direction looks great - and also the comments on your PR (kudos!) are positive.

One remaining question to me (also emerged on the PR) is: How do we reference the device to be passed through.
Your proposal is a) assuming that an NVIDIA DP is exposing the device, and b) the impl is using this specific resource
And it's implicit that there is a sort of protocol used between the DP and implementation (the env vars) in order to attach the correct device to the VM.
IOW: KubeVirt has knowledge about the resource name, which tells it to pass it through.

If another vendor would provide a DP for it's devices, then they would not be automagically passed through. Instead KubeVirt would need to see an update to recognize these new resources and know what to do.

The other option is that DPs to be used with KubeVirt need to fullfill certain requirements which give KubeVirt enough hints to tell it what to do. I.e. an env var is telling that it should be a plain PCI passthrough, and kubevirt is doing this.

Thus in the first case (a) KubeVirt is made aware of every new DP and in the second case (b) KubeVirt establishes a convention which a DP needs to fullfill to allow KubeVirt to introspect the resource and do the right thing.

Both have their pros and cons, and both don't put any burden on the user. But i still wonder if there are thoughts which one we want to start with?

From my point of view, the device plugin and the way the devices are presented and being consumed by kubevirt can be generic.
I'm not sure why each vendor needs to provide its own plugin.

Currently, the PR uses an env var to pass all pci and mdev devices, virt-launcher can pass-through each in its own way or we can have two generic variables for gpus and vgpus (nvidea or intel)

>>>>(Smitesh): We support the idea of keeping kubevirt code vendor neutral. I think that's how device plugins came into kubernetes world in first place.

Fabian, I think you are hinting towards establishing an interface that various device plugins can implement to achieve what needs to be done. I think that's good idea and we can retrofit the code according to whatever you guys define.  Current version of PR was inspired by the way things are done for SR-IOV NICs in kubevirt.
 
 

Side note: I do think that it's reasonable to keep the device plugins external to core kubevirt, as it's a pretty specific use-case.
I don't see what vendor-specific activities should the device plugin do... I would image that it's better for kubevirt to keep the plugin in the core to make sure that it stays generic.

>>>> (Smitesh): Few reasons that come to my mind - 1. Each vendor would have different ways and tools to monitor devices.  2. There are might be additional device configurations device plugins may have to do before allocating device  3. In future vGPU may be created on the fly by device plugin. 4. Allows kubevirt independent way to temporarily disable devices for maintenance/update etc..
 

Despite these thoughts : Great to see this moving forward!

- fabian



On Fri, Aug 9, 2019 at 7:58 AM Vishesh Tanksale <vtan...@nvidia.com> wrote:

Hi All,


We have been working for sometime on Kubevirt. As part of my work I have implemented NVIDIA GPU and vGPU pass through support for Kubevirt VMs. I would love to contribute this change to Kubevirt.

To get the ball rolling I have brief informal design of my implementation here. Please review and share your thoughts.


Thanks,

Vishesh Tanksale

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevi...@googlegroups.com.

Vladik Romanovsky

unread,
Aug 13, 2019, 5:29:19 PM8/13/19
to pawarsm...@gmail.com, kubevirt-dev
On Tue, Aug 13, 2019 at 5:07 PM <pawarsm...@gmail.com> wrote:

I don't see what vendor-specific activities should the device plugin do... I would image that it's better for kubevirt to keep the plugin in the core to make sure that it stays generic.
>>>> Few reasons that come to my mind - 1. Each vendor would have different ways and tools to monitor devices. 
Afaik, this is usually being done via the sysfs. Please correct me if I'm wrong.
2. There are might be additional device configurations device plugins may have to do before allocating device
I think we are discussing pre-created devices. Please correct me if I'm wrong. 
  3. In future vGPU may be created on the fly by device plugin.
I don't know how do you envision that, but given a scenario we could find a generic way to handle this.
 
4. Allows kubevirt independent way to temporarily disable devices for maintenance/update etc.. 
Again, I'm sure it can be done via sysfs. 


On Tuesday, August 13, 2019 at 1:37:53 PM UTC-7, Vladik Romanovsky wrote:


On Tue, Aug 13, 2019 at 3:26 PM Fabian Deutsch <fdeu...@redhat.com> wrote:
Hey Vishesh!

Great to see that you took the time to work on the proposal and implementation.
Just as Roman mentioned: The general design and direction looks great - and also the comments on your PR (kudos!) are positive.

One remaining question to me (also emerged on the PR) is: How do we reference the device to be passed through.
Your proposal is a) assuming that an NVIDIA DP is exposing the device, and b) the impl is using this specific resource
And it's implicit that there is a sort of protocol used between the DP and implementation (the env vars) in order to attach the correct device to the VM.
IOW: KubeVirt has knowledge about the resource name, which tells it to pass it through.

If another vendor would provide a DP for it's devices, then they would not be automagically passed through. Instead KubeVirt would need to see an update to recognize these new resources and know what to do.

The other option is that DPs to be used with KubeVirt need to fullfill certain requirements which give KubeVirt enough hints to tell it what to do. I.e. an env var is telling that it should be a plain PCI passthrough, and kubevirt is doing this.

Thus in the first case (a) KubeVirt is made aware of every new DP and in the second case (b) KubeVirt establishes a convention which a DP needs to fullfill to allow KubeVirt to introspect the resource and do the right thing.

Both have their pros and cons, and both don't put any burden on the user. But i still wonder if there are thoughts which one we want to start with?

From my point of view, the device plugin and the way the devices are presented and being consumed by kubevirt can be generic.
I'm not sure why each vendor needs to provide its own plugin.

Currently, the PR uses an env var to pass all pci and mdev devices, virt-launcher can pass-through each in its own way or we can have two generic variables for gpus and vgpus (nvidea or intel)

>>>>(Smitesh): We support the idea of keeping kubevirt code vendor neutral. I think that's how device plugins came into kubernetes world in first place.

Fabian, I think you are hinting towards establishing an interface that various device plugins can implement to achieve what needs to be done. I think that's good idea and we can retrofit the code according to whatever you guys define.  Current version of PR was inspired by the way things are done for SR-IOV NICs in kubevirt.

 
 
Side note: I do think that it's reasonable to keep the device plugins external to core kubevirt, as it's a pretty specific use-case.
I don't see what vendor-specific activities should the device plugin do... I would image that it's better for kubevirt to keep the plugin in the core to make sure that it stays generic.
>>>>>(Smitesh): Few reasons that come to my mind - 1. Each vendor would have different ways and tools to monitor devices.  2. There are might be additional device configurations device plugins may have to do before allocating device  3. In future vGPU may be created on the fly by device plugin. 4. Allows kubevirt independent way to temporarily disable devices for maintenance/update etc..

 
Despite these thoughts : Great to see this moving forward!

- fabian


On Fri, Aug 9, 2019 at 7:58 AM Vishesh Tanksale <vtan...@nvidia.com> wrote:

Hi All,


We have been working for sometime on Kubevirt. As part of my work I have implemented NVIDIA GPU and vGPU pass through support for Kubevirt VMs. I would love to contribute this change to Kubevirt.

To get the ball rolling I have brief informal design of my implementation here. Please review and share your thoughts.


Thanks,

Vishesh Tanksale

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevi...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.

Vishesh Tanksale

unread,
Aug 13, 2019, 10:22:10 PM8/13/19
to kubevirt-dev
Hi,

Yes, sysfs is one way of monitoring devices. But Nvidia has some libraries that would give multiple metrics into the health of GPU. Using these libraries provides better insight into GPU health. Adding these libraries into virt-handler would add more complexity into it. Therefore we decided to have the device plugin as a separate project.

Thanks,
Vishesh Tanksale 


Reply all
Reply to author
Forward
0 new messages