Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial remote hermetic cuda toolchain #72

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

jsharpe
Copy link
Member

@jsharpe jsharpe commented Mar 20, 2023

Depends on #66

The BUILD files generated for each of the downloaded repos are a bit hacky at the moment and BUILD.remote_cuda is probably not fully updated (I only updated the bits I needed).

CUDA 12 requires bringing in libcu++ as an external dep otherwise nv/target is missing.

I've not checked this for reproducibility but ancedotedly in the debugger I've seen source paths that are RBE worker dependent so I suspect there are some reproducibility issues with the current setup.

The other thing likely missing is runfiles for runtime dependencies from the remote toolchain.

@@ -2,5 +2,5 @@
cc_binary(
name = "main",
srcs = ["cublas.cpp"],
deps = ["@local_cuda//:cublas"],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't come up with a good way of avoiding knowing the resolved toolchain repo name here to find cublas.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@local_cuda//:cuda_runtime is also missing, so in #66 the basic CI failed. And it should be resolved before this one.

@ahans
Copy link

ahans commented Jun 28, 2024

What's the status here? This would be a very welcome feature for more than one project I'm contributing to! So far the solution is a custom archive with the CUDA SDK and some local patch to rules_cuda that makes it download that first and then set it up as if it was local. Having this work out of the box with rules_cuda and also only download what is actually used would be much nicer, of course!

@jsharpe
Copy link
Member Author

jsharpe commented Jun 28, 2024

What's the status here? This would be a very welcome feature for more than one project I'm contributing to! So far the solution is a custom archive with the CUDA SDK and some local patch to rules_cuda that makes it download that first and then set it up as if it was local. Having this work out of the box with rules_cuda and also only download what is actually used would be much nicer, of course!

The code in this PR works (although it has bit rotten a bit - the branch I'm actually using is remote_toolchain in my fork of the repo) but it breaks the support for the local setup use case. I don't really have the time at the moment to make both work in a single repo so some help on getting this working woudl be appreciated; there's likely some bits that can be broken out into separate PRs and landed independently to get us there in smaller steps as this is a rather large change otherwise.

@cloudhan
Copy link
Collaborator

I think this should be split into mulitple step.

  1. support composing multiple components (say, local_cccl, local_cublas, local_thrust, local_cub) into a unified local_cuda
    • this might allow reusing pip and conda installs
  2. support instantiating thoes local_* from local tar balls
  3. support downloading tar balls
  4. support parsing json at https://developer.download.nvidia.cn/compute/cuda/redist/

@jsharpe
Copy link
Member Author

jsharpe commented Jun 28, 2024

Ah yes I remember now - NVIDIA/cccl#622 was the issue I raised so that I could effectively get CCCL into a bzlmod support repository as a fully hermetic toolchain will require downloading these separately.
IMO the cccl / thrust / cub targets shouldn't be inside local_cuda; they're just libraries, the fact that they can be provided by a local_cuda repo is incidental. Note that thrust in particular can be used independently of a CUDA install - it works just as well in an OpenMP context on a host..

@cloudhan
Copy link
Collaborator

cloudhan commented Jun 28, 2024

local_cuda is a name inherited from tf_runtime impl, this should have been called local_cuda_toolkit, so every components stated in the doc will have a position (maybe overrideable).

The last step might not be as trivial as it seems to be. For example, in CI, you might want to override all those links with your server. If we fetch the json instead of checkin directly, we don't need the URL to be stable. And if we let the user provide the json url, we don't even require the URL itself to be stable. (we need the json schema to be stable tho...

@jsharpe
Copy link
Member Author

jsharpe commented Jun 28, 2024

Unfortunately the json schema hasn't proven to be stable - its changed in the 12 series of releases.. - its only the addition of some extra keys, but it was enough to break the logic I had in here.

@ahans
Copy link

ahans commented Jun 28, 2024

Thanks for the update, @jsharpe and @cloudhan, much appreciated! I will look at the mentioned branch in @jsharpe's fork. I don't care about supporting anything locally installed too much myself, but since that has been the only option for rules_cuda, I understand that it shouldn't be taken away. I will see if I can help with anything, but no promises.


cuda_toolchain(
name = "nvcc-local",
compiler_executable = "external/cuda_nvcc-linux-x86_64/bin/nvcc",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this bit only works with spawn_strategy=local, right ? is there a way to make it work with sandboxing ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it works with sandboxing - in fact I use this with RBE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants