-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU: Error response from daemon: invalid volume specification #1564
Comments
Thanks for reporting this. Did you verify that it's only on docker runtime? |
The change that is causing this was introduced on 0.26.1 version. You can workaround it by using 0.26.0 in the mean while. |
I remember we have had similar cases with volume mounts where the paths have had colons and docker is used. Is docker mandatory here or could proper CRI runtime be used? |
@tkatila I can confirm that with containerd it's working fine. |
BMRA/VMRA uses docker as a default container runtime. |
That's a bit old. Oldest Docker version listed e.g. in Ubuntu packages site is v20.10.21, and Ubuntu 20.04 LTS updates are already at 24.0.5: https://packages.ubuntu.com/focal-updates/docker.io Have you tried any newer Docker version?
They could consider updating that default, as Kubernetes deprecated Docker support after k8s v1.20: https://kubernetes.io/blog/2020/12/02/dont-panic-kubernetes-and-docker/ |
I tried a newer version and it reproduces with it:
Pod fails with:
Docker Engine is mentioned in container runtimes in k8s docs: https://kubernetes.io/docs/setup/production-environment/container-runtimes/#docker that would suggest it's still "ok" to use it. But to me this is a bug with the docker engine as it works fine with containerd and cri-o. My thought process for this is:
I do not want to remove the "by-path" mounting as it's required by distributed training. And adding some cli arg or env variable to temporarily disable it feels icky. |
It seems that a colon in volumes/binds is a known issue: |
Looks like there's a workaround to use --mount arg with Docker but there's no clear way to utilize this from the side of Kubernetes. The most suitable fix for this bug seems to be avoiding using /dev/dri/by-path/xxx as they are basically symlinks to devices in /dev/dri |
Avoid using docker is not an option? |
@mythi BMRA/VMRA still uses docker as a "primary" container runtime. The product is build around customers and their needs, so avoiding using Docker is not an option for us. Downgrading Intel DP to 0.26.0 can be considered as a workaround, but not a fix. |
Workaround: Prevent Creation of
|
Environment:
Steps to reproduce:
Expected behaviour: pod running
Actual behaviour:
pod in CreateContainerError state
Warning Failed 2m49s (x12 over 5m3s) kubelet Error: Error response from daemon: invalid volume specification: '/dev/dri/by-path/pci-0000:b7:00.0-card:/dev/dri/by-path/pci-0000:b7:00.0-card:ro'
Likely caused by this commit: 943e34f
The text was updated successfully, but these errors were encountered: