You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Getting the following error when trying to use a L40 GPU with PCI Passthrough to a Virtual Machine - which then won't assign the GPU or start the VM.
From the nvidia-sandbox-device-plugin-daemonset
2023/07/10 19:41:03 Nvidia device 0000:e2:00.0
2023/07/10 19:41:03 Iommu Group 128
2023/07/10 19:41:03 Device Id 26b5
2023/07/10 19:41:03 Error accessing file path "/sys/bus/mdev/devices": lstat /sys/bus/mdev/devices: no such file or directory
2023/07/10 19:41:03 Iommu Map map[128:[{0000:e2:00.0}]]
2023/07/10 19:41:03 Device Map map[26b5:[128]]
2023/07/10 19:41:03 vGPU Map map[]
2023/07/10 19:41:03 GPU vGPU Map map[]
2023/07/10 19:41:03 Error: Could not find device name for device id: 26b5
2023/07/10 19:41:03 DP Name 26b5
2023/07/10 19:41:03 Devicename 26b5
2023/07/10 19:41:03 26b5 Device plugin server ready
virt-launcher pod trying to allocate the device
server error. command SyncVMI failed: "failed to create GPU host-devices: the number of GPU/s do not match the number of devices:\nGPU: [{26b5 nvidia.com/26b5 }]\nDevice: []"
{"component":"virt-launcher","level":"warning","msg":"PCI_RESOURCE_NVIDIA_COM_26B5 not set for resource nvidia.com/26b5","pos":"addresspool.go:50","timestamp":"2023-07-11T16:11:34.667518Z"}
2. Steps to reproduce the issue
Trying to launch a VM using an L40 GPU vs an A40 GPU using pci-passthrough
The text was updated successfully, but these errors were encountered:
@clrfuerst can you try using the latest kubevirt-gpu-device-plugin image, v1.2.2? Set sandboxDevicePlugin.version=v1.2.2 in ClusterPolicy. Note, the pci id database was updated in v1.2.2 so the L40 GPU should be named with its device name (rather than device id) -- you will have to update your hyperconverged configuration accordingly.
1. Quick Debug Checklist
kubevirt-hyperconfig
spec:
permittedHostDevices:
pciHostDevices:
- resourceName: "nvidia.com/GA102GL_A40"
pciDeviceSelector: "10DE:2235"
externalResourceProvider: true
- resourceName: "nvidia.com/26b5"
pciDeviceSelector: "10DE:26B5"
externalResourceProvider: true
oc describe node XXXX
Capacity:
nvidia.com/26b5: 1
Allocatable:
nvidia.com/26b5: 1
1. Issue or feature description
Getting the following error when trying to use a L40 GPU with PCI Passthrough to a Virtual Machine - which then won't assign the GPU or start the VM.
From the nvidia-sandbox-device-plugin-daemonset
2023/07/10 19:41:03 Nvidia device 0000:e2:00.0
2023/07/10 19:41:03 Iommu Group 128
2023/07/10 19:41:03 Device Id 26b5
2023/07/10 19:41:03 Error accessing file path "/sys/bus/mdev/devices": lstat /sys/bus/mdev/devices: no such file or directory
2023/07/10 19:41:03 Iommu Map map[128:[{0000:e2:00.0}]]
2023/07/10 19:41:03 Device Map map[26b5:[128]]
2023/07/10 19:41:03 vGPU Map map[]
2023/07/10 19:41:03 GPU vGPU Map map[]
2023/07/10 19:41:03 Error: Could not find device name for device id: 26b5
2023/07/10 19:41:03 DP Name 26b5
2023/07/10 19:41:03 Devicename 26b5
2023/07/10 19:41:03 26b5 Device plugin server ready
virt-launcher pod trying to allocate the device
server error. command SyncVMI failed: "failed to create GPU host-devices: the number of GPU/s do not match the number of devices:\nGPU: [{26b5 nvidia.com/26b5 }]\nDevice: []"
{"component":"virt-launcher","level":"warning","msg":"PCI_RESOURCE_NVIDIA_COM_26B5 not set for resource nvidia.com/26b5","pos":"addresspool.go:50","timestamp":"2023-07-11T16:11:34.667518Z"}
2. Steps to reproduce the issue
Trying to launch a VM using an L40 GPU vs an A40 GPU using pci-passthrough
The text was updated successfully, but these errors were encountered: