Tensorflow working, but deepxde is not? Here: output from dcgan_test.py (from tensorflow/examples) and Burgers.py #912

Chaztikov · 2022-09-13T18:59:38Z

Chaztikov
Sep 13, 2022

Hi, I am sorry to be a bother here, but it seems that my fresh install of deepxde on Ubuntu 22.04 does not seem to be working properly. Any ideas on why this is? It looks on the surface to be related to XLA compilation issues, but the dcgan_test from tensorflow/examples works (seemingly) without an issue.

Should I try something simple like pip3 removing and purging deepxde? I cloned the repository and used
pip3 install .

Note that I "conda deactivate" before running the Burgers example (as well as the dcgan_test example) below.
This

Also, I have in my ~/.bashrc the following:

PATH=$HOME/.local/bin/:$PATH
#LD_LIBRARY_PATH=$HOME/.local/lib/:$LD_LIBRARY_PATH
PATH=$HOME/anaconda3/condabin/:$PATH

>>> conda initialize >>>

!! Contents within this block are managed by 'conda init' !!

__conda_setup="$('/home/chaztikov/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/home/chaztikov/anaconda3/etc/profile.d/conda.sh" ]; then
. "/home/chaztikov/anaconda3/etc/profile.d/conda.sh"
else
export PATH="/home/chaztikov/anaconda3/bin:$PATH"
fi
fi
unset __conda_setup

<<< conda initialize <<<

export CUDA_DIR="/home/chaztikov/anaconda3/pkgs/cuda-nvcc-11.7.99-0/"

export CUDA_DIR=/usr/local/cuda
export CUDA=/usr/local/cuda
export XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/local/cuda

Here is the output from dcgan_test followed by the output from the Burgers example

chaztikov@priority:~/git/tensorflow/examples/tensorflow_examples/models/dcgan$ python3 dcgan_test.py
2022-09-13 14:53:26.445512: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-13 14:53:26.591583: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-09-13 14:53:27.270285: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-09-13 14:53:27.270359: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-09-13 14:53:27.270372: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Running tests under Python 3.10.4: /usr/bin/python3
[ RUN ] DcganTest.test_one_epoch_with_function
2022-09-13 14:53:28.582647: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-13 14:53:29.160738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10031 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1
/usr/lib/python3.10/random.py:370: DeprecationWarning: non-integer arguments to randrange() have been deprecated since Python 3.10 and will be removed in a subsequent version
return self.randrange(a, b+1)
2022-09-13 14:53:31.651282: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8101
2022-09-13 14:53:32.005822: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:53:32.006615: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:53:32.006644: W tensorflow/stream_executor/gpu/asm_compiler.cc:80] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2022-09-13 14:53:32.007282: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:53:32.007351: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] INTERNAL: Failed to launch ptxas
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
Epoch 0, Generator loss 0.6931471824645996, Discriminator Loss 1.3785552978515625
INFO:tensorflow:time(main.DcganTest.test_one_epoch_with_function): 3.67s
I0913 14:53:32.252396 140095046402048 test_util.py:2460] time(main.DcganTest.test_one_epoch_with_function): 3.67s
[ OK ] DcganTest.test_one_epoch_with_function
[ RUN ] DcganTest.test_one_epoch_without_function
Epoch 0, Generator loss 0.6931471824645996, Discriminator Loss 1.4018439054489136
INFO:tensorflow:time(main.DcganTest.test_one_epoch_without_function): 0.37s
I0913 14:53:32.631311 140095046402048 test_util.py:2460] time(main.DcganTest.test_one_epoch_without_function): 0.37s
[ OK ] DcganTest.test_one_epoch_without_function
[ RUN ] DcganTest.test_session
[ SKIPPED ] DcganTest.test_session

Ran 3 tests in 4.054s

OK (skipped=1)
chaztikov@priority:/git/tensorflow/examples/tensorflow_examples/models/dcgan$ ls
dcgan.py dcgan_test.py init.py
chaztikov@priority:/git/tensorflow/examples/tensorflow_examples/models/dcgan$ cd /git/deepxde/
build/ deepxde/ DeepXDE.egg-info/ docker/ docs/ examples/ .git/ .github/
chaztikov@priority:/git/tensorflow/examples/tensorflow_examples/models/dcgan$ cd /git/deepxde/examples/pinn_forward/
chaztikov@priority:/git/deepxde/examples/pinn_forward$ python3 Burgers
python3: can't open file '/home/chaztikov/git/deepxde/examples/pinn_forward/Burgers': [Errno 2] No such file or directory
chaztikov@priority:~/git/deepxde/examples/pinn_forward$ python3 Burgers.py
Using backend: tensorflow

2022-09-13 14:54:12.951776: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-13 14:54:13.096667: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-09-13 14:54:13.806262: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-09-13 14:54:13.806335: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-09-13 14:54:13.806347: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Enable just-in-time compilation with XLA.

2022-09-13 14:54:15.986969: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-13 14:54:16.560942: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2022-09-13 14:54:16.560996: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10011 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1
Compiling model...
'compile' took 0.000464 s

Training model...

/home/chaztikov/.local/lib/python3.10/site-packages/keras/initializers/initializers_v2.py:120: UserWarning: The initializer GlorotNormal is unseeded and being called multiple times, which will return identical values each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initalizer instance more than once.
warnings.warn(
WARNING:tensorflow:AutoGraph could not transform <function at 0x7fbd9dfea320> and will run it as-is.
Cause: could not parse the source code of <function at 0x7fbd9dfea320>: no matching AST found among candidates:

coding=utf-8

lambda x, on: np.array([on_boundary(x[i], on[i]) for i in range(len(x))])
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function at 0x7fbd9dfea560> and will run it as-is.
Cause: could not parse the source code of <function at 0x7fbd9dfea560>: no matching AST found among candidates:

coding=utf-8

lambda x, on: np.array([on_boundary(x[i], on[i]) for i in range(len(x))])
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
2022-09-13 14:54:18.222399: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x562edf6ef590 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-09-13 14:54:18.222427: I tensorflow/compiler/xla/service/service.cc:181] StreamExecutor device (0): NVIDIA GeForce GTX 1080 Ti, Compute Capability 6.1
2022-09-13 14:54:18.234732: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY to enable.
2022-09-13 14:54:18.638407: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:54:18.639224: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:54:18.639250: W tensorflow/stream_executor/gpu/asm_compiler.cc:80] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2022-09-13 14:54:18.639969: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:54:18.640034: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] INTERNAL: Failed to launch ptxas
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
2022-09-13 14:54:18.642494: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:54:18.642522: W tensorflow/stream_executor/gpu/asm_compiler.cc:80] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2022-09-13 14:54:18.643316: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:54:18.643376: W tensorflow/compiler/xla/service/gpu/buffer_comparator.cc:641] INTERNAL: Failed to launch ptxas
Relying on driver to perform ptx compilation.
Setting XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda or modifying $PATH can be used to set the location of ptxas
This message will only be logged once.
2022-09-13 14:54:18.733421: W tensorflow/compiler/xla/service/gpu/nvptx_helper.cc:56] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
/usr/local/cuda
/usr/local/cuda-11.2
/usr/local/cuda
.
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions. For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2022-09-13 14:54:18.904422: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:54:18.904462: W tensorflow/stream_executor/gpu/asm_compiler.cc:80] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2022-09-13 14:54:18.905418: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-09-13 14:54:18.905498: F tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:453] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: Failed to launch ptxas' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.
Aborted (core dumped)

Chaztikov · 2022-09-13T19:00:43Z

Chaztikov
Sep 13, 2022
Author

As additional info: if I activate anaconda
conda activate
and run the Burgers example, I get a bit further. Perhaps I should create a separate deepxde environment within anaconda and try again?

chaztikov@priority:/git/deepxde/examples/pinn_forward$ conda activate
(base) chaztikov@priority:/git/deepxde/examples/pinn_forward$ python3 Burgers.py
Using backend: tensorflow

Enable just-in-time compilation with XLA.

2022-09-13 14:59:21.715319: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-13 14:59:22.254944: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2022-09-13 14:59:22.254993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9959 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1
Compiling model...
'compile' took 0.000475 s

Training model...

WARNING:tensorflow:AutoGraph could not transform <function at 0x7f915c32faf0> and will run it as-is.
Cause: could not parse the source code of <function at 0x7f915c32faf0>: no matching AST found among candidates:

coding=utf-8

lambda x, on: np.array([on_boundary(x[i], on[i]) for i in range(len(x))])
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function at 0x7f915c32fd30> and will run it as-is.
Cause: could not parse the source code of <function at 0x7f915c32fd30>: no matching AST found among candidates:

coding=utf-8

lambda x, on: np.array([on_boundary(x[i], on[i]) for i in range(len(x))])
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
2022-09-13 14:59:23.903143: I tensorflow/compiler/xla/service/service.cc:170] XLA service 0x561805bb8f40 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-09-13 14:59:23.903175: I tensorflow/compiler/xla/service/service.cc:178] StreamExecutor device (0): NVIDIA GeForce GTX 1080 Ti, Compute Capability 6.1
2022-09-13 14:59:23.915183: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:263] disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY to enable.
2022-09-13 14:59:24.477847: W tensorflow/compiler/xla/service/gpu/nvptx_helper.cc:56] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
/usr/local/cuda
/usr/local/cuda-11.2
/usr/local/cuda
.
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions. For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2022-09-13 14:59:24.765765: I tensorflow/compiler/jit/xla_compilation_cache.cc:478] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
Step Train loss Test loss Test metric
0 [1.34e-03, 1.28e-03, 5.14e-01] [1.34e-03, 1.28e-03, 5.14e-01] []
2022-09-13 14:59:25.781372: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:330] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2022-09-13 14:59:25.785319: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at xla_ops.cc:296 : INTERNAL: libdevice not found at ./libdevice.10.bc
Traceback (most recent call last):
File "/home/chaztikov/git/deepxde/examples/pinn_forward/Burgers.py", line 38, in
model.train(iterations=15000)
File "/home/chaztikov/anaconda3/lib/python3.9/site-packages/deepxde/utils/internal.py", line 22, in wrapper
result = f(*args, **kwargs)
File "/home/chaztikov/anaconda3/lib/python3.9/site-packages/deepxde/model.py", line 573, in train
self._train_sgd(iterations, display_every)
File "/home/chaztikov/anaconda3/lib/python3.9/site-packages/deepxde/model.py", line 590, in _train_sgd
self._train_step(
File "/home/chaztikov/anaconda3/lib/python3.9/site-packages/deepxde/model.py", line 491, in _train_step
self.train_step(inputs, targets, auxiliary_vars)
File "/home/chaztikov/anaconda3/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/chaztikov/anaconda3/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: libdevice not found at ./libdevice.10.bc [Op:__inference_train_step_1221]
(base) chaztikov@priority:~/git/deepxde/examples/pinn_forward$

0 replies

lululxvi · 2022-09-18T16:40:18Z

lululxvi
Sep 18, 2022
Maintainer

Try disable XLA https://deepxde.readthedocs.io/en/latest/modules/deepxde.html#deepxde.config.disable_xla_jit

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorflow working, but deepxde is not? Here: output from dcgan_test.py (from tensorflow/examples) and Burgers.py #912

{{title}}

Replies: 3 comments

{{title}}

{{title}}

Select a reply

Tensorflow working, but deepxde is not? Here: output from dcgan_test.py (from tensorflow/examples) and Burgers.py #912

Chaztikov Sep 13, 2022

>>> conda initialize >>>

!! Contents within this block are managed by 'conda init' !!

<<< conda initialize <<<

export CUDA_DIR="/home/chaztikov/anaconda3/pkgs/cuda-nvcc-11.7.99-0/"

coding=utf-8

coding=utf-8

Replies: 3 comments

Chaztikov Sep 13, 2022 Author

coding=utf-8

coding=utf-8

lululxvi Sep 18, 2022 Maintainer

Chaztikov
Sep 13, 2022

Chaztikov
Sep 13, 2022
Author

lululxvi
Sep 18, 2022
Maintainer