Add a docker container (and docker-compose file) to run the model/notebooks in a containerize environment #166

leothomas · 2024-02-26T20:22:36Z

In order to make it easier to run the model/notebooks without having to manage installing dependencies across various machines/environments, I've added a micromamba based Dockerfile which will create the conda environment with the specified libraries.

I've also added a docker compose file, in order to specify the build-time and run-time arguments for exposing the jupyter lab port and mounting the current directly as a volume, into the docker container. This will allow users to modify any of the model/notebook code locally, without having to re-build the image.

By default the docker image starts with running jupyter lab but this can be overridden both in the docker-compose or even in the command line with any other python or bash command.

The platform=linux/amd64 build and run-time args enable the image to be built on Mac M1 while maintaining compatibility with Linux.

The container can be run with:
docker-compose up or docker-compose run claymodel <command> where command is a command which override the jupyter lab startup

The container can also be built directly (bypassing the need for docker-compose) with:

docker build . -t clay --platform linux/amd64

and then run with:

docker run --rm -it -v $(pwd):/model -p 8888:8888 -e ENV_NAME=claymodel --platform linux/amd64 clay:latest

weiji14

Thanks @leothomas! Just some small comments for now. Do you think we should also add a .dockerignore file to keep the Docker image a pure virtual environment? Also, what are your thoughts on setting up some CI to push pre-built containers to a docker registry (can be done in a separate PR)?

weiji14 · 2024-02-27T20:33:10Z

environment.yml

+  - pytorch~=2.1.0
+  - pyarrow~=15.0.0


Could you resolve the merge conflict here with the main branch? Also see if removing the pyarrow pin works. I managed to get conda to solve for osx-arm64 recently on #164 after https://github.com/conda-forge/torchvision-feedstock/pull/89/files was merged.

Agreed. You should be able to simply ditch your changes to environment.yml. If not, then keep in mind that whenever you modify environment.yml, you must regenerate conda-lock.yml, so you should never be committing changes to only environment.yml.

Did you mean conflicts with ci/osx-arm64, rather than conflicts with main? I didn't notice any conflicts with main.

I merged the changes from ci/osx-arm64 and, while I'm able to install from conda-lock.yml (as @chuckwondo suggested) I'm not able to install from environment.yml due to cuda being unavailable in the docker container:

> [claymodel 4/4] RUN micromamba create -y -n claymodel --file environment.yml && micromamba clean --all --yes: 135.1 error libmamba Could not solve for environment specs 135.1 The following package could not be installed 135.1 └─ pytorch ~=2.1.0 *cuda12* is not installable because it requires 135.1 └─ __cuda, which is missing on the system. 135.3 critical libmamba Could not solve for environment specs ------ failed to solve: process "/usr/local/bin/_dockerfile_shell.sh micromamba create -y -n claymodel --file environment.yml && micromamba clean --all --yes" did not complete successfully: exit code: 1

I'm thinking that the conda-lock.yml and environment.yml files aren't synchronized?

Let me know what the best course of action is. I can look into making a composite docker image, based off of both micromamba and nvidia/cuda so that we can install pytorch with cuda support in the docker container

Did you mean conflicts with ci/osx-arm64, rather than conflicts with main? I didn't notice any conflicts with main.

I meant with main actually. The changes in the ci/osx-arm64 branch aren't much.

I merged the changes from ci/osx-arm64 and, while I'm able to install from conda-lock.yml (as @chuckwondo suggested) I'm not able to install from environment.yml due to cuda being unavailable in the docker container:

Yet, best to go with conda-lock.yml as Chuck suggested. If you want to install from environment.yml on a device without CUDA GPUs, set CONDA_OVERRIDE_CUDA=12.0 following https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-virtual.html#overriding-detected-packages.

Hm that's odd - there were no merge conflict with main in the case. I'm might have been missing something. Do the environment.yml and conda-lock.yml files look correct in their current state?

Dockerfile

docker-compose.yml

chuckwondo

In addition to the individual comments/suggestions, please also add a section to README.md about how to run the Docker container (docker compose up) as an alternative to installing things locally, and how to access JupyterLab once the container is started. In particular, mention the URLs that appears in the logging messages:

claymodel-1  |     Or copy and paste one of these URLs:
claymodel-1  |         http://ffd23ea64b9b:8888/lab?token=abebb7b9476b8fff1fb8b543cc552e8c9641b5b38547c3cb
claymodel-1  |         http://127.0.0.1:8888/lab?token=abebb7b9476b8fff1fb8b543cc552e8c9641b5b38547c3cb

Dockerfile

environment.yml

Dockerfile

chuckwondo · 2024-02-28T14:25:27Z

Thanks @leothomas! Just some small comments for now. Do you think we should also add a .dockerignore file to keep the Docker image a pure virtual environment? Also, what are your thoughts on setting up some CI to push pre-built containers to a docker registry (can be done in a separate PR)?

Absolutely add a .dockerignore file to this PR. You can probably start with much (or all) of what's in .gitignore.

However, keep in mind that there are cases where you must be more explicit in .dockerignore.

For example, you cannot use only __pycache__/ in .dockerignore because that will ignore only a top level __pycache__/ directory. To ignore such a directory at all levels, you must use **/__pycache__/ in .dockerignore. Another good candidate for this is to add **/.ipynb_checkpoints/ since there are notebooks in the docs directory.

chuckwondo · 2024-02-28T14:33:35Z

Thanks @leothomas! Just some small comments for now. Do you think we should also add a .dockerignore file to keep the Docker image a pure virtual environment? Also, what are your thoughts on setting up some CI to push pre-built containers to a docker registry (can be done in a separate PR)?

Absolutely add a .dockerignore file to this PR. You can probably start with much (or all) of what's in .gitignore.

However, keep in mind that there are cases where you must be more explicit in .dockerignore.

For example, you cannot use only __pycache__/ in .dockerignore because that will ignore only a top level __pycache__/ directory. To ignore such a directory at all levels, you must use **/__pycache__/ in .dockerignore. Another good candidate for this is to add **/.ipynb_checkpoints/ since there are notebooks in the docs directory.

Alternatively, as I mentioned to you ages ago, I tend to "invert" my use of .dockerignore to make it act more like an "allow" list rather than a "deny" list, which I find to be safer and clearer. For example, consider making a .dockerignore that ignores everything and then does not ignore the things you want to "allow":

*
!environment.yml
!conda-lock.yml
!**/*.py
!**/*.ipynb
!**/*.sh

leothomas · 2024-02-28T20:16:14Z

Awesome! Thank you both! I've addressed some of the request changes:

Regenerating the conda-lock.yml
Installing the micromamba environment from conda-lock.yml rather than environment.yml
Removing the pyarrow dependency
Adding a .dockerignore which ignores everything by default and only allows the files specifically needed
Updating the README with relevant info on the docker image/container and how to run it
Updating the jupyter lab flags to avoid the unnecessary usage of --allow-root and address some warning/debugging logs

Do y'all think it would be valuable to try to see if I can get this to run with a cuda docker image to enable the pytorch-cuda installation?

README.md

docs/partial-inputs-flood-tutorial.ipynb

conda-lock.yml

.github/workflows/test.yml

weiji14

Some suggested changes to reduce diff from merge conflict handling.

docs/partial-inputs-flood-tutorial.ipynb

docs/partial-inputs.ipynb

environment.yml

src/callbacks_wandb.py

weiji14 · 2024-03-17T22:00:52Z

Dockerfile

+
+COPY --chown=$MAMBA_USER:$MAMBA_USER . .
+
+RUN micromamba create -y -n claymodel --file conda-lock.yml && \


Hmm, I'm getting this error when running docker build . -t clay --platform linux/amd64 locally:

0.384 Transaction starting 79.51 critical libmamba Failed to create dir 'info' 79.51 error libmamba Error opening for reading "/opt/conda/pkgs/cudnn-8.9.7.29-h092f7fd_3/info/index.json": No such file or directory 79.51 error libmamba Error when extracting package: [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - unexpected end of input; expected '[', '{', or a literal 79.51 cudnn-8.9.7.29-h092f7fd_3.conda extraction failed 79.73 critical libmamba Found incorrect download: cudnn. Aborting

Not sure if it's because something changed when I re-locked the conda-lock.yml file manually when doing the merge from main at 5be9fa2#diff-63113c19c5d310b8e350b302f279e2297ba6faa9ac9c99c1c82e83e508447865

Oh jeez - I'm getting a segmentation fault:

2.612 Transaction starting 1441.7 qemu: uncaught target signal 11 (Segmentation fault) - core dumped 1441.7 bash: line 1: 10 Segmentation fault micromamba create -y -n claymodel --file conda-lock.yml ------ Dockerfile:11 -------------------- 10 | 11 | >>> RUN micromamba create -y -n claymodel --file conda-lock.yml && \ 12 | >>> micromamba clean --all --yes 13 | -------------------- ERROR: failed to solve: process "/usr/local/bin/_dockerfile_shell.sh micromamba create -y -n claymodel --file conda-lock.yml && micromamba clean --all --yes" did not complete successfully: exit code: 139

Wondering if it's either related to a lack of available memory or installing CUDA in docker.

I recall that when we installed the model libraries locally we had to install pytorch without CUDA - could the CUDA installation have made its way into the conda-lock file and causing issues?

… a docker container

for more information, see https://pre-commit.ci

yellowcap · 2024-06-06T11:19:36Z

Putting some life into this to see if we can get this to work for binder and friends.

for more information, see https://pre-commit.ci

yellowcap · 2024-06-06T14:07:45Z

Got this to work locally. Docker-compose and Jupyter notebook working fine with the image built from the Dockerfile. 🌮

@leothomas have a look and let me know if you think the small changes make sense to you. Happy to merge this in after some review. Not sure how to use the functionality in things like binder but would be nice to use the dockerized version for those if that unbreaks the deployments!

yellowcap

🐳

yellowcap · 2024-06-11T09:31:07Z

@chuckwondo re-requested your review to unblock this. I think with v1 it's now working ok. If you have some time try it out 🐋

yeelauren · 2024-06-21T22:44:40Z

Hey just wanted to flag - I noticed some of the paths for the notebook are broken in docker as well:

I think it should be two directories up or explicit instructions on changing your workspace directory paths.

weiji14 reviewed Feb 27, 2024

View reviewed changes

weiji14 added the maintenance Boring but important stuff for the core devs label Feb 27, 2024

chuckwondo requested changes Feb 28, 2024

View reviewed changes

Dockerfile Outdated Show resolved Hide resolved

environment.yml Outdated Show resolved Hide resolved

Dockerfile Outdated Show resolved Hide resolved

Dockerfile Outdated Show resolved Hide resolved

chuckwondo requested changes Feb 29, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

docs/partial-inputs-flood-tutorial.ipynb Outdated Show resolved Hide resolved

conda-lock.yml Outdated Show resolved Hide resolved

chuckwondo reviewed Feb 29, 2024

View reviewed changes

.github/workflows/test.yml Outdated Show resolved Hide resolved

weiji14 mentioned this pull request Mar 5, 2024

EPIC: Support running on macOS #161

Closed

7 tasks

weiji14 reviewed Mar 12, 2024

View reviewed changes

weiji14 mentioned this pull request Mar 17, 2024

Add osx-arm64 platform to conda-lock.yml file and GitHub Actions CI #164

Merged

weiji14 reviewed Mar 17, 2024

View reviewed changes

weiji14 mentioned this pull request Mar 18, 2024

Binder launch is broken because install requires NVIDIA GPUs #181

Closed

leothomas and others added 7 commits June 6, 2024 12:02

Add dockerfile and docker-compose file for running a conda env within…

fd21b6e

… a docker container

Remove unused code from Dockerfile

c62fa32

Regenerate conda-lock.yml without pytorch-cuda

a92b264

[pre-commit.ci] auto fixes from pre-commit.com hooks

ad89671

for more information, see https://pre-commit.ci

Regenerate conda-lock.yml

2b1497e

Update readme

93d7304

Fixed readme after rebase

ce425e2

yellowcap force-pushed the feature/dockerized-environment branch from 867e3bd to ce425e2 Compare June 6, 2024 11:18

pre-commit-ci bot and others added 2 commits June 6, 2024 11:19

[pre-commit.ci] auto fixes from pre-commit.com hooks

d71794c

for more information, see https://pre-commit.ci

Updates on docker files

a952edc

Merge branch 'main' into feature/dockerized-environment

30f01c3

yellowcap approved these changes Jun 11, 2024

View reviewed changes

yellowcap requested a review from chuckwondo June 11, 2024 09:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a docker container (and docker-compose file) to run the model/notebooks in a containerize environment #166

Add a docker container (and docker-compose file) to run the model/notebooks in a containerize environment #166

leothomas commented Feb 26, 2024

weiji14 left a comment

weiji14 Feb 27, 2024

chuckwondo Feb 28, 2024

leothomas Feb 28, 2024

weiji14 Feb 28, 2024

leothomas Feb 28, 2024

chuckwondo left a comment

chuckwondo commented Feb 28, 2024

chuckwondo commented Feb 28, 2024

leothomas commented Feb 28, 2024

weiji14 left a comment

weiji14 Mar 17, 2024

leothomas Mar 22, 2024

yellowcap commented Jun 6, 2024

yellowcap commented Jun 6, 2024

yellowcap left a comment

yellowcap commented Jun 11, 2024

yeelauren commented Jun 21, 2024


		COPY --chown=$MAMBA_USER:$MAMBA_USER . .

		RUN micromamba create -y -n claymodel --file conda-lock.yml && \

Add a docker container (and docker-compose file) to run the model/notebooks in a containerize environment #166

Are you sure you want to change the base?

Add a docker container (and docker-compose file) to run the model/notebooks in a containerize environment #166

Conversation

leothomas commented Feb 26, 2024

weiji14 left a comment

Choose a reason for hiding this comment

weiji14 Feb 27, 2024

Choose a reason for hiding this comment

chuckwondo Feb 28, 2024

Choose a reason for hiding this comment

leothomas Feb 28, 2024

Choose a reason for hiding this comment

weiji14 Feb 28, 2024

Choose a reason for hiding this comment

leothomas Feb 28, 2024

Choose a reason for hiding this comment

chuckwondo left a comment

Choose a reason for hiding this comment

chuckwondo commented Feb 28, 2024

chuckwondo commented Feb 28, 2024

leothomas commented Feb 28, 2024

weiji14 left a comment

Choose a reason for hiding this comment

weiji14 Mar 17, 2024

Choose a reason for hiding this comment

leothomas Mar 22, 2024

Choose a reason for hiding this comment

yellowcap commented Jun 6, 2024

yellowcap commented Jun 6, 2024

yellowcap left a comment

Choose a reason for hiding this comment

yellowcap commented Jun 11, 2024

yeelauren commented Jun 21, 2024