Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow-gpu warnings with cuda-11.2 #17

Open
pkiri056 opened this issue Jan 17, 2021 · 2 comments
Open

tensorflow-gpu warnings with cuda-11.2 #17

pkiri056 opened this issue Jan 17, 2021 · 2 comments

Comments

@pkiri056
Copy link

Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/cuda-11.2/lib

Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib/cuda/lib64

The above warning was after installing the NVIDIA CUDA Toolkit using the following commands
sudo apt install system76-cuda-latest
sudo apt install system76-cudnn-10.2

@Yiming-M
Copy link

Yiming-M commented Feb 8, 2021

Why you installed an incompatible version of CUDNN? Currently, there is no cudnn-11.2 in the repository, so you might want to downgrade your CUDA version to 11.1 and install system76-cudnn-11.1.

And BTW the newest stable version of TensorFlow is 2.4 which support CUDA 11.0 (which is also not in the repository), so you may need to install it from Nvidia.

EDIT: I have tried TensorFlow 2.4 with system76-cuda-11.1 and system76-cudnn-11.1 with Keras Simple MNIST Convnet, and the code can run without error.

Epoch 13/15
422/422 [==============================] - 6s 14ms/step - loss: 0.0220 - accuracy: 0.9933 - val_loss: 0.0290 - val_accuracy: 0.9932
Epoch 14/15
422/422 [==============================] - 6s 15ms/step - loss: 0.0224 - accuracy: 0.9922 - val_loss: 0.0277 - val_accuracy: 0.9937
Epoch 15/15
422/422 [==============================] - 6s 14ms/step - loss: 0.0200 - accuracy: 0.9932 - val_loss: 0.0315 - val_accuracy: 0.9933

@theofpa
Copy link

theofpa commented May 19, 2021

cuda 11.2 is now the latest and still has the problem as cudnn-11.2 is not available:

$ dpkg -s system76-cuda-latest
Package: system76-cuda-latest
Status: install ok installed
Priority: optional
Section: metapackages
Installed-Size: 9
Maintainer: Michael Aaron Murphy <michael@system76.com>
Architecture: all
Multi-Arch: foreign
Version: 11.2~20.04
Depends: system76-cuda-11.2
Description: Metapackage for the latest version of the CUDA Toolkit
Homepage: https://developer.nvidia.com/cuda-downloads
$ dpkg -l|grep cud
ii  libcudart10.1:amd64                              10.1.243-3                                                amd64        NVIDIA CUDA Runtime Library
ii  system76-cuda                                    0pop1                                                     amd64        NVIDIA CUDA Compiler / Libraries / Toolkit Metapackage
ii  system76-cuda-11.1                               0pop1                                                     amd64        NVIDIA CUDA 11.1 Compiler / Libraries / Toolkit
ii  system76-cuda-11.2                               0pop1                                                     amd64        NVIDIA CUDA 11.2 Compiler / Libraries / Toolkit
ii  system76-cuda-latest                             11.2~20.04                                                all          Metapackage for the latest version of the CUDA Toolkit
ii  system76-cudnn-11.1                              8.0.4                                                     amd64        NVIDIA CUDA Deep Neural Network library (cuDNN) for CUDA 11.1

removing the latest version fixes the error:

sudo apt-get remove system76-cuda-11.2
Python 3.8.8 (default, Apr 13 2021, 19:58:26) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2021-05-19 10:36:48.625315: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
>>> tf.config.list_physical_devices('GPU')
2021-05-19 10:36:51.734239: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-05-19 10:36:51.762295: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-19 10:36:51.762755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:05:00.0 name: GeForce GTX 1050 Ti computeCapability: 6.1
coreClock: 1.4425GHz coreCount: 6 deviceMemorySize: 3.94GiB deviceMemoryBandwidth: 104.43GiB/s
2021-05-19 10:36:51.762797: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-05-19 10:36:51.766115: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-05-19 10:36:51.766185: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-05-19 10:36:51.767462: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-05-19 10:36:51.767767: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-05-19 10:36:51.771203: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-05-19 10:36:51.771963: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-05-19 10:36:51.772119: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-05-19 10:36:51.772257: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-19 10:36:51.772685: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-19 10:36:51.773004: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


Adapted the instructions in system76/docs#598

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants