Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Long Run Time (bayes_nonconj) #323

Open
1 task
mmcky opened this issue Mar 22, 2023 · 4 comments
Open
1 task

Fix Long Run Time (bayes_nonconj) #323

mmcky opened this issue Mar 22, 2023 · 4 comments
Assignees

Comments

@mmcky
Copy link
Contributor

mmcky commented Mar 22, 2023

It is currently taking the cache almost 3 hours to run this lecture set (from scratch).

There are a number of issues reported by tensor flow such as:

/__w/lecture-python.myst/lecture-python.myst/lectures/wealth_dynamics.md: Executing notebook using local CWD [mystnb]
2023-03-22 01:10:05.558920: E external/org_tensorflow/tensorflow/compiler/xla/service/slow_operation_alarm.cc:65] 
********************************
[Compiling module jit_wealth_time_series_for_loop_jax] Very slow compile?  If you want to file a bug, run with envvar XLA_FLAGS=--=/tmp/foo and attach the results.
********************************

which suggests jax isn't running as it should again.

  • once quantecon gpu machine is up and running we should test the lectures and jax on the 3080 to benchmark and diagnose what is going on.
@mmcky
Copy link
Contributor Author

mmcky commented Dec 13, 2023

@HumphreyYang it looks like the bayes_nonconj maybe uninstalling jax due to the following

# install dependencies
!pip install numpyro pyro-ppl torch jax

It looks like torch is reinstalling all the Cuda infrastructure which may mean jax isn't running as it should and could explain the long run time.

Not urgent but would you be able to:

  1. Compare local runtime of this lecture to the web version
  2. See if we can minimise the installs here so that Cuda libraries are not reinstalled.

@mmcky
Copy link
Contributor Author

mmcky commented Dec 13, 2023

See output

An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu.

@mmcky mmcky changed the title Lecture Long Run Time -- Cache Fix Long Run Time (bayes_nonconj) Dec 13, 2023
@HumphreyYang
Copy link
Collaborator

Many thanks @mmcky, I will look into these options.

@HumphreyYang
Copy link
Collaborator

HumphreyYang commented Dec 15, 2023

Hi @mmcky,

I tested this locally and on Colab. it still runs for around 30 mins. It takes around 1 minute for each MCMC and SVI simulation.

I think it is expected as the size of the simulation is quite large and MCMC is quite slow.

See if we can minimise the installs here so that Cuda libraries are not reinstalled.

Both of them (torch and jax) played important roles in this lecture. I think we can put torch and jax into our requirement file to minimize reinstalls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants