We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kbcli version Kubernetes: v1.27.3-gke.100 KubeBlocks: 0.7.0-alpha.4 kbcli: 0.7.0-alpha.4
kbcli addon enable nvidia-gpu-exporter addon.extensions.kubeblocks.io/nvidia-gpu-exporter enabled k get pod NAME READY STATUS RESTARTS AGE csi-attacher-s3-0 1/1 Running 0 4m7s csi-provisioner-s3-0 2/2 Running 0 4m7s csi-s3-8nxsv 2/2 Running 0 4m7s csi-s3-njdjz 2/2 Running 0 4m7s csi-s3-v4fw9 2/2 Running 0 4m7s install-neon-addon-44d69 0/1 Error 0 3m36s install-neon-addon-5zvzx 0/1 Error 0 4m14s install-neon-addon-nhv4s 0/1 Error 0 2m52s install-neon-addon-vzftz 0/1 Error 0 3m59s kb-addon-kubebench-fd7f9cd56-xn9r5 1/1 Running 0 13m kb-addon-nvidia-gpu-exporter-82z5d 0/1 CreateContainerError 0 4s kb-addon-nvidia-gpu-exporter-jjn8n 0/1 CreateContainerError 0 4s kb-addon-nvidia-gpu-exporter-zfp9d 0/1 CreateContainerError 0 4s kb-addon-snapshot-controller-65fcc74964-s57qd 1/1 Running 0 13m kubeblocks-5bffff55b8-c8794 1/1 Running 0 17m kubeblocks-dataprotection-5d96f4b8cd-wc8xz 1/1 Running 0 17m k describe pod kb-addon-nvidia-gpu-exporter-82z5d Name: kb-addon-nvidia-gpu-exporter-82z5d Namespace: default Priority: 0 Node: gke-yjtest-default-pool-f59be211-2vqs/10.128.0.46 Start Time: Fri, 01 Sep 2023 10:47:25 +0800 Labels: app.kubernetes.io/instance=kb-addon-nvidia-gpu-exporter app.kubernetes.io/name=nvidia-gpu-exporter controller-revision-hash=74d969d6bd pod-template-generation=1 Annotations: <none> Status: Pending IP: 10.104.1.168 IPs: IP: 10.104.1.168 Controlled By: DaemonSet/kb-addon-nvidia-gpu-exporter Containers: nvidia-gpu-exporter: Container ID: Image: docker.io/utkuozdemir/nvidia_gpu_exporter:0.3.0 Image ID: Port: 9835/TCP Host Port: 0/TCP Args: --web.listen-address :9835 --web.telemetry-path /metrics --nvidia-smi-command nvidia-smi --query-field-names AUTO --log.level info --log.format logfmt State: Waiting Reason: CreateContainerError Ready: False Restart Count: 0 Liveness: http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3 Readiness: http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /dev/nvidia0 from nvidia0 (rw) /dev/nvidiactl from nvidiactl (rw) /usr/bin/nvidia-smi from nvidia-smi (rw) /usr/lib/x86_64-linux-gnu/libnvidia-ml.so from libnvidia-ml-so (rw) /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 from libnvidia-ml-so-1 (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xklqp (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: nvidiactl: Type: HostPath (bare host directory volume) Path: /dev/nvidiactl HostPathType: nvidia0: Type: HostPath (bare host directory volume) Path: /dev/nvidia0 HostPathType: nvidia-smi: Type: HostPath (bare host directory volume) Path: /usr/bin/nvidia-smi HostPathType: libnvidia-ml-so: Type: HostPath (bare host directory volume) Path: /usr/lib64/libnvidia-ml.so HostPathType: libnvidia-ml-so-1: Type: HostPath (bare host directory volume) Path: /usr/lib64/libnvidia-ml.so.1 HostPathType: kube-api-access-xklqp: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists node.kubernetes.io/pid-pressure:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists node.kubernetes.io/unschedulable:NoSchedule op=Exists Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 20s default-scheduler Successfully assigned default/kb-addon-nvidia-gpu-exporter-82z5d to gke-yjtest-default-pool-f59be211-2vqs Warning Failed 20s kubelet Error: failed to generate container "0f4412cde8b12a9725755ac71e5b6688f99ffcc2809ea5e02d05dbae4962a819" spec: failed to generate spec: failed to mkdir "/usr/bin/nvidia-smi": mkdir /usr/bin/nvidia-smi: read-only file system Warning Failed 20s kubelet Error: failed to generate container "e9013e1bcda9a694c59cc5227ff079a69080e01137a5fa8a6f73d83763818bf5" spec: failed to generate spec: failed to mkdir "/usr/bin/nvidia-smi": mkdir /usr/bin/nvidia-smi: read-only file system Normal Pulled 5s (x3 over 20s) kubelet Container image "docker.io/utkuozdemir/nvidia_gpu_exporter:0.3.0" already present on machine Warning Failed 5s kubelet Error: failed to generate container "a0ab1f4da8105f5c3407805dd3cd85b05731aa6a6fd95b1c7d66eae1279a1964" spec: failed to generate spec: failed to mkdir "/usr/bin/nvidia-smi": mkdir /usr/bin/nvidia-smi: read-only file system
The text was updated successfully, but these errors were encountered:
This issue has been marked as stale because it has been open for 30 days with no activity
Sorry, something went wrong.
it's because the nvidia-gpu-exporter relies on the nvidia-smi binary in the host, but the container operating system of GKE doesn't have that binary.
nvidia-gpu-exporter
nvidia-smi
the nvidia-gpu-exporter should be only available on EKS.
iziang
Successfully merging a pull request may close this issue.
kbcli version
Kubernetes: v1.27.3-gke.100
KubeBlocks: 0.7.0-alpha.4
kbcli: 0.7.0-alpha.4
The text was updated successfully, but these errors were encountered: