Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various fixes to get rules_cuda to work with Lambda-stack #74

Merged
merged 7 commits into from
Mar 24, 2023

Conversation

EthanSteinberg
Copy link
Contributor

The rules for this repository do not currently work with the following commonly used CUDA configuration: https://lambdalabs.com/lambda-stack-deep-learning-software

The reason is because the Lambda deep stack installs CUDA in /usr/bin, which breaks a couple of assumptions with this software.

In particular:

  • /usr/lib64 is the wrong path here, /usr/lib/x86_64-linux-gnu/ is correct.
  • /usr/bin/** is invalid since it contains recursive symlinks

This fixes those two issues.

@EthanSteinberg EthanSteinberg requested review from jsharpe and removed request for cloudhan and ryanleary March 22, 2023 19:36
@cloudhan
Copy link
Collaborator

Just curious, in BUILD.local_cuda, all cc_import for .sos are non-versioned, will this work with Ubuntu's always versioned .so.<version>s?

@EthanSteinberg
Copy link
Contributor Author

Ubuntu provides both versioned and unversioned copies so it works.

@cloudhan
Copy link
Collaborator

I need to add a test stage for this before a merge. In the future, we might need a process to re-compose a canonical tree, and it will also be useful for #72

@cloudhan
Copy link
Collaborator

@lalaland Could you please rebase on main and unconmment

# - { os: "ubuntu-22.04", cuda-version: "11.5.1-1ubuntu1", source: "ubuntu" }

@EthanSteinberg
Copy link
Contributor Author

EthanSteinberg commented Mar 23, 2023

@cloudhan Rebased and enabled. Note that you will have to manually trigger the workflows.

@EthanSteinberg
Copy link
Contributor Author

@cloudhan Can you try rerunning it? I think you forgot to include the nvidia toolkit in your build script. I just added it so it should work now.

@EthanSteinberg
Copy link
Contributor Author

Hmm. It looks like ubuntu ships an incompatible gcc and nvidia toolkit in their default packages? That's really quite lame. I added some code to the build script to try to downgrade the compiler.

@EthanSteinberg
Copy link
Contributor Author

Sorry for the bother, but could someone try triggering the workflow again? I think I finally have the solution.

@EthanSteinberg
Copy link
Contributor Author

Looks like everything passes now! This should be good to merge.

@cloudhan cloudhan merged commit 738060e into bazel-contrib:main Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants