Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gpu config api #684

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

Gpu config api #684

wants to merge 19 commits into from

Conversation

at88mph
Copy link
Member

@at88mph at88mph commented Aug 15, 2024

  • Configurable GPU using gpu-count:<gpu-vendor>
    • Set the NVIDIA_CUDA_MAJOR_VERSION environment variable in User Sessions from querying Kubernetes
  • Cleanup to prevent needing to modify each job launch file each time using a single template
  • Added lookup for SKAHA_SERVICE_ID environment variable locally for integration tests to run


try {
final int majorNVIDIACUDAVersion = CommandExecutioner.getMajorNvidiaCudaGPUVersion();
jobLaunchString = setConfigValue(jobLaunchString, SOFTWARE_GPU_NVIDIA_CUDA_MAJOR_VERSION, Integer.toString(majorNVIDIACUDAVersion));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users already have access to the GPU version through their software, but this may be useful, not sure.

The general idea is to allow users to land on the right GPU (brand, version, gpu-core count). I think the ideas from ExecutionBroker are useful here and will help us align with that potential integration.

We currently expose at /context the content of k8s-resources. So this is just a static config reflecting the underlying capabilities of the cluster. Ideally, those values should come from the cluster instead. However, that is probably beyond the scope of this story. Also beyond the scope is adding the 'brokering' part of client interaction.

So I think, for now at least, the story is to simply let users specify, through API params, those 3 gpu conditions. I haven't gone through this whole PR yet but I'm guessing that a lot of that is already there. Let's chat about it tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. According to CADC-13476, we wanted the Major CUDA version supplied. This way scripts can look it up.

Also, there should be two (2) parameters specified; the gpu-type and the gpus (count) parameter.

@at88mph at88mph marked this pull request as ready for review September 3, 2024 16:45
…nto gpu-config-api

# Conflicts:
#	deployment/helm/skaha/Chart.yaml
#	deployment/helm/skaha/skaha-config/launch-desktop.yaml
#	deployment/helm/skaha/values.yaml
#	skaha/VERSION
#	skaha/src/intTest/java/org/opencadc/skaha/DesktopAppLifecycleTest.java
#	skaha/src/intTest/java/org/opencadc/skaha/ExpiryTimeRenewalTest.java
#	skaha/src/intTest/java/org/opencadc/skaha/ImagesTest.java
#	skaha/src/intTest/java/org/opencadc/skaha/SessionLifecycleTest.java
#	skaha/src/intTest/java/org/opencadc/skaha/SessionUtil.java
#	skaha/src/main/java/org/opencadc/skaha/session/PostAction.java
#	skaha/src/main/java/org/opencadc/skaha/session/SessionAction.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants