Skip to content

Commit

Permalink
fix: enable pod attributes for gpu-sharing in gke
Browse files Browse the repository at this point in the history
We make a small fix to the Kubernetes PodMapper tranform processor.
Specifically we update the regular expression used in building the
device mapping to properly capture pod attributes in both MIG and
MIG-with-sharing GPUs in GKE.
  • Loading branch information
pintohutch committed Dec 14, 2024
1 parent 900d465 commit 86575ba
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 1 deletion.
3 changes: 2 additions & 1 deletion pkg/dcgmexporter/kubernetes.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,8 @@ import (
var (
connectionTimeout = 10 * time.Second

gkeMigDeviceIDRegex = regexp.MustCompile(`^nvidia([0-9]+)/gi([0-9]+)$`)
// Allow for MIG devices with or without GPU sharing to match in GKE.
gkeMigDeviceIDRegex = regexp.MustCompile(`^nvidia([0-9]+)/gi([0-9]+)(/vgpu[0-9]+)?$$`)
gkeVirtualGPUDeviceIDSeparator = "/vgpu"
nvmlGetMIGDeviceInfoByIDHook = nvmlprovider.GetMIGDeviceInfoByID
)
Expand Down
14 changes: 14 additions & 0 deletions pkg/dcgmexporter/kubernetes_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,20 @@ func TestProcessPodMapper_WithD_Different_Format_Of_DeviceID(t *testing.T) {
MetricGPUDevice: "0",
PODGPUID: "0/vgpu",
},
{
KubernetesGPUIDType: DeviceName,
ResourceName: nvidiaResourceName,
MetricMigProfile: "1g.10gb",
GPUInstanceID: 0,
PODGPUID: "nvidia0/gi0/vgpu0",
},
{
KubernetesGPUIDType: DeviceName,
ResourceName: nvidiaResourceName,
MetricMigProfile: "1g.10gb",
GPUInstanceID: 1,
PODGPUID: "nvidia0/gi1/vgpu0",
},
{
KubernetesGPUIDType: GPUUID,
ResourceName: nvidiaResourceName,
Expand Down

0 comments on commit 86575ba

Please sign in to comment.