VMI Specifications for Nvidia A100 Series

221 views
Skip to first unread message

Vaibhav Raizada

unread,
Jun 18, 2021, 10:07:26 AM6/18/21
to kubevirt-dev
Hi,

I am trying to write a VMI specification with GPU access. The GPU card is Nvidia A100 (A100-SXM4-40GB), however I am not sure how to write the gpu device name in the specification. For Tesla T4, I used TU104GL_Tesla_T4 but I am not sure about A100 series.

---
kind: VirtualMachineInstance
metadata:
  labels:
    special: vmi-gpu
  name: vmi-gpu
spec:
  domain:
    devices:
      disks:
      - disk:
          bus: virtio
        name: containerdisk
      - disk:
          bus: virtio
        name: cloudinitdisk
      gpus:
      - deviceName: nvidia.com/??????????????????????????
        name: gpu1
      rng: {}
    machine:
      type: ""
    resources:
      requests:
        memory: 1024M
  terminationGracePeriodSeconds: 0
  volumes:
  - containerDisk:
      image: ovaleanu/centos:latest
    name: containerdisk
  - cloudInitNoCloud:
      userData: |-
        #cloud-config
        ssh_pwauth: True
        password: centos
        chpasswd: { expire: False }
    name: cloudinitdisk



Thanks,
Vaibhav

Vladik Romanovsky

unread,
Jun 18, 2021, 11:44:57 AM6/18/21
to Vaibhav Raizada, kubevirt-dev
Hi Vaibhav,

Thank you for raising this topic. 
Perhaps, it would be best to start with reading our user guide about host devices assignment with KubeVirt[1]

In general, the name of the resource/device depends on the device plugin that provides it.
KubeVirt has a built-in generic mechanism for discovering and allocating host devices, including GPU and vGPU (PCI/MDEVs in general).

As an admin, you would simply provide a list of devices that are permitted in the cluster (as below), naming it according to the admins' preference. 
KubeVirt will then discover these devices on the cluster nodes and will start a device plugin for each.

configuration: permittedHostDevices: pciHostDevices: - pciVendorSelector: "10DE:1EB8" resourceName: "nvidia.com/Tesla_T4"

When this device is requested by a user for the VMI it will need to be referenced by this name

gpus:
      - deviceName: nvidia.com/Tesla_T4

However, if you are using an external device plugin, it is up to the device plugin to choose the resource name for the device it advertises.
In most cases, you could find the name by querying the node with `kubectl describe node [nodeName]`
The name of the device could be found in the Capacity/Allocatable sections: 

Allocatable:
  nvidia.com/TU104GL_Tesla_T4:   1

Let me know if you have any other questions.
Thanks,
Vladik


--
You received this message because you are subscribed to the Google Groups "kubevirt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubevirt-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/f42d9d61-1a9f-4d89-961a-947267352668n%40googlegroups.com.

Vaibhav Raizada

unread,
Jun 21, 2021, 1:17:52 AM6/21/21
to kubevirt-dev
Hi Vladik,

Thanks for the help. I followed the way suggested in your mail but still ran into same problem. 
Here is what I did:

1. Find the device and vendor ID 

[root@node003 ~]# lspci -nnv|grep -i nvidia
01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:20b0] (rev a1)
        Subsystem: NVIDIA Corporation Device [10de:144e]
        Kernel driver in use: nvidia

2. Modified kubevirt-cr.yaml

---
apiVersion: kubevirt.io/v1
kind: KubeVirt
metadata:
  name: kubevirt
  namespace: kubevirt
spec:
  certificateRotateStrategy: {}
  configuration:
    permittedHostDevices:
      pciHostDevices:
      - pciVendorSelector: "10de:20b0"
        resourceName: "nvidia.com/A100-SXM4-40GB"
        externalResourceProvider: true
    developerConfiguration:
      featureGates: []
  customizeComponents: {}
  imagePullPolicy: IfNotPresent
  workloadUpdateStrategy: {}

3. Installed kubevirt using above CR definition and the default operator for v0.42.1
4. Fetch status of pods and describe the pod
[root@node003 ~]# kubectl get pods
NAME                          READY   STATUS    RESTARTS   AGE
virt-launcher-vmi-gpu-sgwp6   0/2     Pending   0          7s

[root@node003 ~]# kubectl describe pod virt-launcher-vmi-gpu-sgwp6
Name:           virt-launcher-vmi-gpu-sgwp6
...............................................................................
...............................................................................
Events:
  Type     Reason            Age               From               Message
  ----     ------            ----              ----               -------
  Warning  FailedScheduling  4s (x3 over 87s)  default-scheduler  0/1 nodes are available: 1 Insufficient nvidia.com/A100-SXM4-40GB.





Thanks,
Vaibhav

Fabian Deutsch

unread,
Jun 21, 2021, 4:24:24 AM6/21/21
to Vaibhav Raizada, kubevirt-dev
^^ Please try removing this line from the config (or set it to false)
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/5007989f-7210-47e7-bd09-0991400d2f7dn%40googlegroups.com.

Vaibhav Raizada

unread,
Jun 21, 2021, 6:28:21 AM6/21/21
to kubevirt-dev
I tried after removing the suggested line but still the same issue.

[root@node003 ~]# kubectl get pods
NAME                          READY   STATUS    RESTARTS   AGE
virt-launcher-vmi-gpu-8fw2x   0/2     Pending   0          11m

[root@node003 ~]# kubectl describe pod virt-launcher-vmi-gpu-8fw2x  
Warning  FailedScheduling  14s (x4 over 2m32s)  default-scheduler  0/1 nodes are available: 1 Insufficient nvidia.com/A100-SXM4-40GB.


Thanks,
Vaibhav

Vladik Romanovsky

unread,
Jun 21, 2021, 11:07:02 AM6/21/21
to Vaibhav Raizada, kubevirt-dev

Hi Vaibhav,

To to assign any PCI device to a virtual machine, the relevant device needs to be bound to the vfio-pci driver.
KubeVirt, therefore, will look for and start a device plugin only for host devices that are bound to the vfio-pci driver. 

Our user guide has a section about how this can be achieved in a dynamic (non-persistent) way. I would suggest reading the whole document.

For a persistent configuration, the administrator can create a modprobe configuration file, listing the relevant devices, such as the following:
echo options vfio-pci ids=10de:20b0 > /etc/modprobe.d/vfio.conf
echo vfio-pci > /etc/modules-load.d/vfio-pci.conf

Once the devices are bound to the vfio-pci driver, KubeVirt will be able to discover these and will be listed in the Capacity/Allocatable sections in `kubectl describe node [node_name]`

Thanks,
Vladik


Message has been deleted

Vaibhav Raizada

unread,
Jun 23, 2021, 12:44:16 AM6/23/21
to kubevirt-dev
Hi Vladik,

Thanks again. I have bound one of the gpu to vfio-pci driver. See output of below command:

[root@node003 ~]# lspci -nnk -d 10de:
01:00.0 3D controller [0302]: NVIDIA Corporation GA100 [A100 SXM4 40GB] [10de:20b0] (rev a1)
        Subsystem: NVIDIA Corporation Device [10de:144e]
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia
41:00.0 3D controller [0302]: NVIDIA Corporation GA100 [A100 SXM4 40GB] [10de:20b0] (rev a1)
        Subsystem: NVIDIA Corporation Device [10de:144e]
        Kernel driver in use: vfio-pci
        Kernel modules: nouveau, nvidia_drm, nvidia

The execution of command "kubectl describe node" gives me below output which indicates that the device is available.

Allocatable:
  cpu:                              96
  devices.kubevirt.io/kvm:          110
  devices.kubevirt.io/tun:          110
  ephemeral-storage:                86973087600
  hugepages-1Gi:                    0
  hugepages-2Mi:                    0
  memory:                           1056587164Ki
  pods:                             110

Yet, when I launch VM I get below error:

[root@node003 ~]# kubectl describe pod virt-launcher-vmi-gpu-bcg7n
Name:           virt-launcher-vmi-gpu-bcg7n
.................................................................................
Events:
  Type     Reason                    Age   From               Message
  ----     ------                    ----  ----               -------
  Normal   Scheduled                 32s   default-scheduler  Successfully assigned default/virt-launcher-vmi-gpu-bcg7n to node003
  Warning  UnexpectedAdmissionError  33s   kubelet            Allocate failed due to requested number of devices unavailable for nvidia.com/GA100_A100_SXM4_40GB. Requested: 1, Available: 0, which is unexpected

My kubevirt-cr.yaml has below changes for your reference:

---
apiVersion: kubevirt.io/v1
kind: KubeVirt
metadata:
  name: kubevirt
  namespace: kubevirt
spec:
  certificateRotateStrategy: {}
  configuration:
    permittedHostDevices:
      pciHostDevices:
      - pciVendorSelector: "10de:20b0"
        resourceName: "nvidia.com/GA100_A100_SXM4_40GB"
    developerConfiguration:
      featureGates: []
  customizeComponents: {}
  imagePullPolicy: IfNotPresent
  workloadUpdateStrategy: {}


Thanks,
Vaibhav

Vaibhav Raizada

unread,
Jun 23, 2021, 1:24:58 AM6/23/21
to kubevirt-dev
A quick update. After one hour or so, now when I run command: "kubectl decribe node" then the gpu count is back to zero.

Allocatable:
  cpu:                              96
  devices.kubevirt.io/kvm:          110
  devices.kubevirt.io/tun:          110
  ephemeral-storage:                86973087600
  hugepages-1Gi:                    0
  hugepages-2Mi:                    0
  memory:                           1056587164Ki
  pods:                             110

I will chack again after an hour.

Thanks,
Vaibhav

Vaibhav Raizada

unread,
Jun 23, 2021, 6:34:08 AM6/23/21
to kubevirt-dev

I also want to add that I have also deployed Nvidia's Kubevirt-gpu-device-plugin. 

https://github.com/NVIDIA/kubevirt-gpu-device-plugin

Thanks,
Vaibhav 

Kedar Bidarkar

unread,
Jun 23, 2021, 7:33:26 AM6/23/21
to Vaibhav Raizada, kubevirt-dev
On Wed, Jun 23, 2021 at 4:04 PM Vaibhav Raizada <writeto...@gmail.com> wrote:

I also want to add that I have also deployed Nvidia's Kubevirt-gpu-device-plugin. 

https://github.com/NVIDIA/kubevirt-gpu-device-plugin

1) NVIDIA device-plugin is not required from v0.36.0 Unless there is some specific need for that device plugin. 
That's when this feature "Generalize host devices assignment." was added.

But looking at this mail thread, it appears you are using v0.42.1,
so AFAIK NVIDIA device-plugin should not be required

2) If using https://github.com/NVIDIA/kubevirt-gpu-device-plugin  for some specific purpose
Then you may need to set "externalResourceProvider: true"

3) In both the cases though, when
a) "using Nvidia device plugin" or
b) "using KubeVirt's built-in generic mechanism"
We would still need the same steps  [1] for configuring the Node first.

4) Configuring the Node  [ Assuming, we need PCI Passthrough here ]
a) "Enable IOMMU and blacklist nouveau driver on KVM Host"
b)  "echo "options vfio-pci ids=vendor-ID:device-ID" > /etc/modprobe.d/vfio.conf"
c) "echo 'vfio-pci' > /etc/modules-load.d/vfio-pci.conf"

NOTE: Unless you intend to use it for a specific purpose, avoid this command  "kubectl apply -f nvidia-kubevirt-gpu-device-plugin.yaml"  and let
"KubeVirt's built-in generic mechanism"  take over.


Best Regards,
Kedar Bidarkar

Fabian Deutsch

unread,
Jun 24, 2021, 8:51:17 AM6/24/21
to Kedar Bidarkar, Vaibhav Raizada, kubevirt-dev
Thanks Kedar.

Vaibhav, were you able to use the GPU after dropping the nvidia DP?
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubevirt-dev/CAHHN7kmr6AoRyDc6FtjYkfntM0N%3D%3DSx5nN90Z5xGG7rdOdzoSw%40mail.gmail.com.

Reply all
Reply to author
Forward
0 new messages