-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Options for GPU Sharing between Containers Running on a Workstation #1769
Comments
[1] GPU Aware Scheduling: https://github.com/intel/platform-aware-scheduling/tree/master/gpu-aware-scheduling
If each container (in cluster) is supposed to have exclusive access to the GPU device, use |
But this does not allow the GPU to be shared between containers, correct? Maybe a bit more context about the use case would help. We are building an application that simplifies the deployment of GPU-enabled containers (for example, using Intel's ITEX and IPEX images). This is not meant for deployments across a clusters of nodes. There is just a single node (user's laptop or workstation). Each container runs a Jupyter Notebook server. Ideally, a user could be on a workstation with a single GPU and multiple containers running, with each provided full access to the GPU. Notebook workloads are typically very bursty, so container A may run a notebook cell that is very GPU intensive while container B is idle. In cases where both containers are simultaneously requesting GPU acceleration, ideally that would be handled the same way (or close to the same way) as two applications running directly on the host OS requesting GPU resources. |
@frenchwr sharedDevNum is the option you would most likely want. Any container requesting the |
@tkatila Thanks for clarifying! I agree this sounds like the way to go. A few more follow up questions:
|
Yes, that's correct, keep it disabled. To enable resource management you would also need another k8s component (GPU Aware Scheduling, or GAS). It's setup requires some hassle and I don't see any benefit from it in your case.
I don't think we have any guide for selecting the number, but something between 10 and 100 would be fine. The downside with an extremely large number is that it might incur some extra CPU and network bandwidth utilization. GPU plugin will detect the number of GPUs, multiply the number with the |
Describe the support request
Hello, I'm trying to understand options that would allow multiple containers to share a single GPU.
I see that K8s device plugins in general are not meant to allow a device to be shared between containers.
I also see from the GPU plugin docs in this repo that there is a
sharedDevNum
that can be used for sharing a GPU, but I infer this is partitioning the resources on the GPU so each container is only allocated a fraction of the GPU's resources. Is that correct?My use case is a tool called data-science-stack that is being built to automate the deployment/management of GPU-enabled containers for quick AIML experimentation on a user's laptop or workstation. In this scenario we'd prefer the containers have the ability to each have access to the full GPU resources - much like you'd expect for applications running directly on the host. Is this possible?
System (please complete the following information if applicable):
The text was updated successfully, but these errors were encountered: