-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Packaged RF2-linux.yml pins pytorch-cuda=11.7, may lead to issues with CUDA version #25
Comments
Hi @matspunt, Hope this clarifies the issue. Debadutta |
Restricting pyTorch to 2.1.1 and updating pytorch-cuda to 11.8. Per instructions here: uw-ipd#25 Motivating issue: Traceback (most recent call last): File "/storage1/fs1/ghaller/Active/lloydt/LT2_Protein-Modeling/RosettaFold2/network/predict.py", line 493, in <module> pred.predict( File "/storage1/fs1/ghaller/Active/lloydt/LT2_Protein-Modeling/RosettaFold2/network/predict.py", line 316, in predict torch.cuda.reset_peak_memory_stats() File "/opt/conda/envs/RF2/lib/python3.10/site-packages/torch/cuda/memory.py", line 307, in reset_peak_memory_stats return torch._C._cuda_resetPeakMemoryStats(device) AttributeError: module 'torch._C' has no attribute '_cuda_resetPeakMemoryStats'
This was very helpful to me in my HPC environment (RIS WUSTL)! |
Wrong. The current yml does not work with the actual RF2 code in this repo. |
Hey @stianale, as I mentioned in my comment you need to change |
For me that yields the following errors:
|
I got around those errors, but now the same errors that appeared with the old yml file still arise with the new one: Running on CPU |
The Rosettafold repos are train wrecks as of now, with recipes not being close to working with the code provided... Similar, although not identical issues are faced with the RF2NA software, and it feels as if it is up to to the users themselves to figure a way out of the incompatabilities. |
@stianale, I thought I would add to this thread. I was able to get RF2 to install on today, August 7th, 2024. I am using a WSL CUDA install of cuda_11.8.r11.8/compiler.31833905_0. First, I edited @lloydtripp 's yml file to read:
That way, we can have the needed cudatoolkit already installed before we reinstall pytorch. To my understanding, the pytorch error with respect to cuda was mainly because cuda does not appear available given the yml-directed install of pytorch. Additionally, parts of pytorch that were used to make RF2 functional are already deprecated. Installing RF2 will likely continue to be a serious difficulty. I recommend looking into old forums/github posts, or even looking at the backend of Google colab notebooks. Those notebooks have to perform fresh installs of software upon every callable instance. This may provide some clues. Anyways, here are the steps in order that I took to have success: Then, I did the following steps:
STEP 6. Lastly, when actually running the predictions I had to either I can confirm that this has worked for me. Again, while I pose a solution, there may be some underlying difficulties that vary based on your computing environment. But overall, the main issue is that the pytorch installation that is directed from the yml file does not natively read your cuda library. This thread has done a good job identifying the specific cuda and pytorch versions that are needed. But there may (likely) come a time where the default pulls for software will grab the wrong dependencies and mess everything up. Here I went directly to the pytorch website for the installation command, and then recreated deprecated files that are imported in RF2's I think we can close this ticket. Hope this helps, |
Hi,
To users: if RF2 defaults to CPU and upon running
torch.cuda.is_available()
you obtain False, read below.Be careful when building your conda environment that the CUDA version that is found (
which nvcc
) in the RF2 conda environment is compatible with thepytorch-cuda
version in the environment. I.e. if system CUDA is used, it cannot be greater than >11.7 (seenvidia-smi
). If using Python CUDA package is used, ensurecudatoolkit
version in your environment matches 11.7 . Default behaviour for conda is to install the latest versioncudatoolkit-12.2
, which leads to the PyTorch issue.To developers: perhaps a dependency on
cudatoolkit=11.7
orcudatoolkit-dev=11.7
can be added to the environment?Note: I have used CUDA 12.0 succesfully (with upgraded
pytorch-cuda
) and saw no difference in the performance or output of RoseTTAFold2 but I can't comment in detail on that. 11.7 works fine too.Cheers,
Mats
The text was updated successfully, but these errors were encountered: