Replies: 3 comments
-
As additional info: if I activate anaconda chaztikov@priority: Enable just-in-time compilation with XLA. 2022-09-13 14:59:21.715319: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA Training model... WARNING:tensorflow:AutoGraph could not transform <function at 0x7f915c32faf0> and will run it as-is. coding=utf-8lambda x, on: np.array([on_boundary(x[i], on[i]) for i in range(len(x))]) coding=utf-8lambda x, on: np.array([on_boundary(x[i], on[i]) for i in range(len(x))]) |
Beta Was this translation helpful? Give feedback.
-
Try disable XLA https://deepxde.readthedocs.io/en/latest/modules/deepxde.html#deepxde.config.disable_xla_jit |
Beta Was this translation helpful? Give feedback.
-
Hi, I am sorry to be a bother here, but it seems that my fresh install of deepxde on Ubuntu 22.04 does not seem to be working properly. Any ideas on why this is? It looks on the surface to be related to XLA compilation issues, but the dcgan_test from tensorflow/examples works (seemingly) without an issue.
Should I try something simple like pip3 removing and purging deepxde? I cloned the repository and used
pip3 install .
Note that I "conda deactivate" before running the Burgers example (as well as the dcgan_test example) below.
This
Also, I have in my ~/.bashrc the following:
PATH=$HOME/.local/bin/:$PATH
#LD_LIBRARY_PATH=$HOME/.local/lib/:$LD_LIBRARY_PATH
PATH=$HOME/anaconda3/condabin/:$PATH
>>> conda initialize >>>
!! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/home/chaztikov/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/home/chaztikov/anaconda3/etc/profile.d/conda.sh" ]; then
. "/home/chaztikov/anaconda3/etc/profile.d/conda.sh"
else
export PATH="/home/chaztikov/anaconda3/bin:$PATH"
fi
fi
unset __conda_setup
<<< conda initialize <<<
export CUDA_DIR="/home/chaztikov/anaconda3/pkgs/cuda-nvcc-11.7.99-0/"
export CUDA_DIR=/usr/local/cuda
export CUDA=/usr/local/cuda
export XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/local/cuda
Here is the output from dcgan_test followed by the output from the Burgers example
chaztikov@priority:~/git/tensorflow/examples/tensorflow_examples/models/dcgan$ python3 dcgan_test.py
2022-09-13 14:53:26.445512: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-13 14:53:26.591583: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-09-13 14:53:27.270285: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-09-13 14:53:27.270359: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-09-13 14:53:27.270372: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Running tests under Python 3.10.4: /usr/bin/python3
[ RUN ] DcganTest.test_one_epoch_with_function
2022-09-13 14:53:28.582647: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-13 14:53:29.160738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10031 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1
/usr/lib/python3.10/random.py:370: DeprecationWarning: non-integer arguments to randrange() have been deprecated since Python 3.10 and will be removed in a subsequent version
return self.randrange(a, b+1)
2022-09-13 14:53:31.651282: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8101
2022-09-13 14:53:32.005822: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:53:32.006615: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:53:32.006644: W tensorflow/stream_executor/gpu/asm_compiler.cc:80] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2022-09-13 14:53:32.007282: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:53:32.007351: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] INTERNAL: Failed to launch ptxas
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
Epoch 0, Generator loss 0.6931471824645996, Discriminator Loss 1.3785552978515625
INFO:tensorflow:time(main.DcganTest.test_one_epoch_with_function): 3.67s
I0913 14:53:32.252396 140095046402048 test_util.py:2460] time(main.DcganTest.test_one_epoch_with_function): 3.67s
[ OK ] DcganTest.test_one_epoch_with_function
[ RUN ] DcganTest.test_one_epoch_without_function
Epoch 0, Generator loss 0.6931471824645996, Discriminator Loss 1.4018439054489136
INFO:tensorflow:time(main.DcganTest.test_one_epoch_without_function): 0.37s
I0913 14:53:32.631311 140095046402048 test_util.py:2460] time(main.DcganTest.test_one_epoch_without_function): 0.37s
[ OK ] DcganTest.test_one_epoch_without_function
[ RUN ] DcganTest.test_session
[ SKIPPED ] DcganTest.test_session
Ran 3 tests in 4.054s
OK (skipped=1)
chaztikov@priority:
/git/tensorflow/examples/tensorflow_examples/models/dcgan$ ls/git/tensorflow/examples/tensorflow_examples/models/dcgan$ cddcgan.py dcgan_test.py init.py
chaztikov@priority:
/git/deepxde//git/tensorflow/examples/tensorflow_examples/models/dcgan$ cdbuild/ deepxde/ DeepXDE.egg-info/ docker/ docs/ examples/ .git/ .github/
chaztikov@priority:
/git/deepxde/examples/pinn_forward//git/deepxde/examples/pinn_forward$ python3 Burgerschaztikov@priority:
python3: can't open file '/home/chaztikov/git/deepxde/examples/pinn_forward/Burgers': [Errno 2] No such file or directory
chaztikov@priority:~/git/deepxde/examples/pinn_forward$ python3 Burgers.py
Using backend: tensorflow
2022-09-13 14:54:12.951776: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-13 14:54:13.096667: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-09-13 14:54:13.806262: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-09-13 14:54:13.806335: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-09-13 14:54:13.806347: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Enable just-in-time compilation with XLA.
2022-09-13 14:54:15.986969: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-13 14:54:16.560942: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2022-09-13 14:54:16.560996: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10011 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1
Compiling model...
'compile' took 0.000464 s
Training model...
/home/chaztikov/.local/lib/python3.10/site-packages/keras/initializers/initializers_v2.py:120: UserWarning: The initializer GlorotNormal is unseeded and being called multiple times, which will return identical values each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initalizer instance more than once.
warnings.warn(
WARNING:tensorflow:AutoGraph could not transform <function at 0x7fbd9dfea320> and will run it as-is.
Cause: could not parse the source code of <function at 0x7fbd9dfea320>: no matching AST found among candidates:
coding=utf-8
lambda x, on: np.array([on_boundary(x[i], on[i]) for i in range(len(x))])
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function at 0x7fbd9dfea560> and will run it as-is.
Cause: could not parse the source code of <function at 0x7fbd9dfea560>: no matching AST found among candidates:
coding=utf-8
lambda x, on: np.array([on_boundary(x[i], on[i]) for i in range(len(x))])
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
2022-09-13 14:54:18.222399: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x562edf6ef590 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-09-13 14:54:18.222427: I tensorflow/compiler/xla/service/service.cc:181] StreamExecutor device (0): NVIDIA GeForce GTX 1080 Ti, Compute Capability 6.1
2022-09-13 14:54:18.234732: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var
MLIR_CRASH_REPRODUCER_DIRECTORY
to enable.2022-09-13 14:54:18.638407: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:54:18.639224: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:54:18.639250: W tensorflow/stream_executor/gpu/asm_compiler.cc:80] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2022-09-13 14:54:18.639969: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:54:18.640034: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] INTERNAL: Failed to launch ptxas
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
2022-09-13 14:54:18.642494: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:54:18.642522: W tensorflow/stream_executor/gpu/asm_compiler.cc:80] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2022-09-13 14:54:18.643316: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:54:18.643376: W tensorflow/compiler/xla/service/gpu/buffer_comparator.cc:641] INTERNAL: Failed to launch ptxas
Relying on driver to perform ptx compilation.
Setting XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda or modifying $PATH can be used to set the location of ptxas
This message will only be logged once.
2022-09-13 14:54:18.733421: W tensorflow/compiler/xla/service/gpu/nvptx_helper.cc:56] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
/usr/local/cuda
/usr/local/cuda-11.2
/usr/local/cuda
.
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions. For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2022-09-13 14:54:18.904422: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:54:18.904462: W tensorflow/stream_executor/gpu/asm_compiler.cc:80] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2022-09-13 14:54:18.905418: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:54:18.905498: F tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:453] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: Failed to launch ptxas' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.
Aborted (core dumped)
Beta Was this translation helpful? Give feedback.
All reactions