Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Add a way to set pod annotations for dcgm exporter #341

Closed
landorg opened this issue Apr 21, 2022 · 7 comments
Closed

Comments

@landorg
Copy link

landorg commented Apr 21, 2022

Our monitoring system (datadog) requires us to set pod annotations to the exporter pods.
Would be great if you could add a way to set spec.template.metadata.annotations of the daemonset.
Thanks

@shivamerla
Copy link
Contributor

we will look into adding this with future releases.

@syandroo
Copy link

syandroo commented Nov 9, 2022

+1 This would be super useful for us too

@kosyak
Copy link

kosyak commented Mar 13, 2023

I can see daemonsets.annotations here helm -n gpu-operator get values gpu-operator --all (app version v22.9.2), are they intended for this issue usecase?

When I declare these annotations in chart values

daemonsets:
  annotations:
    prometheus.io/path: /metrics
    prometheus.io/port: "9400"
    prometheus.io/scrape: "true"

the chart deploys successfully but gpu-operator pod crashes with this error:

{"level":"info","ts":1678700104.895753,"logger":"controllers.ClusterPolicy","msg":"Found Resource, skipping update","ServiceAccount":"nvidia-operator-validator","Namespace":"gpu-operator"}
{"level":"info","ts":1678700104.8987215,"logger":"controllers.ClusterPolicy","msg":"Found Resource, updating...","Role":"nvidia-operator-validator","Namespace":"gpu-operator"}
{"level":"info","ts":1678700104.903535,"logger":"controllers.ClusterPolicy","msg":"Found Resource, updating...","ClusterRole":"nvidia-operator-validator","Namespace":"gpu-operator"}
{"level":"info","ts":1678700104.9083395,"logger":"controllers.ClusterPolicy","msg":"Found Resource, updating...","RoleBinding":"nvidia-operator-validator","Namespace":"gpu-operator"}
{"level":"info","ts":1678700104.9132628,"logger":"controllers.ClusterPolicy","msg":"Found Resource, updating...","ClusterRoleBinding":"nvidia-operator-validator","Namespace":"gpu-operator"}
{"level":"info","ts":1678700104.9150171,"msg":"Observed a panic in reconciler: assignment to entry in nil map","controller":"clusterpolicy-controller","object":{"name":"cluster-policy"},"namespace":"","name":"cluster-policy","reconcileID":"558a2f1a-5f56-41fe-a896-23a7b965c55b"}
panic: assignment to entry in nil map [recovered]
	panic: assignment to entry in nil map

goroutine 893 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118 +0x1f4
panic({0x1902300, 0x1df3cf0})
	/usr/local/go/src/runtime/panic.go:884 +0x212
github.com/NVIDIA/gpu-operator/controllers.applyCommonDaemonsetMetadata(...)
	/workspace/controllers/object_controls.go:589
github.com/NVIDIA/gpu-operator/controllers.preProcessDaemonSet(0xc002288480, {{0x1e0e8f8, 0xc0011953e0}, 0xc000a34000, {0xc00004a053, 0xc}, {0xc00159a000, 0x10, 0x10}, {0xc0003c5680, ...}, ...})
	/workspace/controllers/object_controls.go:567 +0xab8
github.com/NVIDIA/gpu-operator/controllers.DaemonSet({{0x1e0e8f8, 0xc0011953e0}, 0xc000a34000, {0xc00004a053, 0xc}, {0xc00159a000, 0x10, 0x10}, {0xc0003c5680, 0x10, ...}, ...})
	/workspace/controllers/object_controls.go:3099 +0x4a5
github.com/NVIDIA/gpu-operator/controllers.(*ClusterPolicyController).step(0x2b80c40)
	/workspace/controllers/state_manager.go:885 +0x136
github.com/NVIDIA/gpu-operator/controllers.(*ClusterPolicyReconciler).Reconcile(0xc0003e90e0, {0x1e0e8f8, 0xc0011953e0}, {{{0x0, 0x0}, {0xc000881d80, 0xe}}})
	/workspace/controllers/clusterpolicy_controller.go:135 +0x4e5
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1e0e850?, {0x1e0e8f8?, 0xc0011953e0?}, {{{0x0?, 0x1a78ee0?}, {0xc000881d80?, 0xc0013a35d0?}}})
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00022c8c0, {0x1e0e850, 0xc000b33080}, {0x1982860?, 0xc0009ace80?})
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320 +0x33c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00022c8c0, {0x1e0e850, 0xc000b33080})
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:230 +0x333

This crash doesn't happen without daemonsets.annotations in chart values

@alep
Copy link

alep commented Jun 21, 2023

Hi All! Does this work? Did you find anything that works on any of the new releases?

dcgmExporter:
  podAnnotations:

@shivamerla
Copy link
Contributor

The issue reported should be fixed with later releases. Please try out latest version. Setting daemonsets.annotations helm parameter should be reflected on all Daemonsets that we create.

@cdesiniotis
Copy link
Contributor

Closing this issue as GPU Operator v23.3.0+ supports the daemonsets.annotations field for configuring custom annotations for all DaemonSets that GPU Operator manages.

@changhyuni
Copy link

@cdesiniotis
Can't I split annotations by container?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants