That project produces Docker images, which provide ready-to-use Artificial Intelligence (AI) / Machine Learning (ML) Python Jupyter environments on a few well known and stable Linux distributions (e.g., CentOS 9 Stream, CentOS 8 Stream, Debian 12 (Bookworm), Debian 11 (Bullseye), Ubuntu 22.04 LTS (Jammy Jellyfish), Ubuntu 20.04 LTS (Focal Fossa) and Ubuntu 18.04 LTS (Bionic Beaver)).
The Docker images just add some Jupyter notebook and data set samples on top of other general purpose C++/Python Docker images, produced by a dedicated project on GitHub and available on Docker Hub too.
The Python virtual environments are installed thanks to Pyenv and pipenv
,
as detailed in the
dedicated procedure
on the
Python induction notebook sub-project.
Any additional Python module should be installed in a dedicated
virtual environment, controlled by pipenv
through local Pipfile
(and Pipfile.lock
) files, which should be versioned. The Docker images
therefore do not install those modules globally; only the pyenv
and pipenv
utilities are provided (and correctly configured).
Those Docker images are intended to run any collection of Jupyter notebooks, using any collection of data sets, which you may have locally. Those Docker images provide the engine (Jupyter Lab), and you provide the gas (Jupyter notebooks and data sets). With that analogy, some sample gas is provided for convenience purpose:
- Sample Jupyter notebooks,
available in the
/notebook
top directory of the Docker images (when not overshadowed by your own Jupyter notebook volume) - Sample data sets,
available in the
/data
top directory of the Docker images (when not overshadowed by your own data set volume)
Another GitHub repository features Python Docker light images, aimed at deploying Data Science applications on operational environments such as cloud-based Kubernetes clusters or services (e.g., AWS EKS, Azure AKS, IBM/RedHat OpenShift v4 or Google GKE). Those images are available on their own Docker Hub repository.
- Python Data Science images for every day use:
- Production-ready Python Data Science images:
- Production-ready Python cloud images:
- Production-ready Data Processing Pipelines (DPP) images:
- General purpose C++/Python images:
- Native Docker Python images:
- On GitHub: https://github.com/docker-library/python
- On Docker Hub: https://hub.docker.com/_/python
- Native Jupyter ready-to-run Docker images: https://github.com/jupyter/docker-stacks
- Dataquest's Docker for Data Science: https://www.dataquest.io/blog/docker-data-science
- Download the Docker image for your preferred Linux distribution (where
<linux-distrib>
is one ofcentos9
,centos8
,debian12
,debian11
,ubuntu2204
,ubuntu2004
orubuntu1804
):
$ docker pull infrahelpers/python-jupyter:<linux-distrib>
- Launch Jupyter Lab within the Docker image (where
<port>
corresponds to the local port on which Jupyter Lab is launched; the default is8888
):
$ docker run -d -p <port>:8888 infrahelpers/python-jupyter:<linux-distrib>
- Launch Jupyter Lab within the Docker image (where
<port>
corresponds to the local port on which Jupyter Lab is launched; the default is8888
):
$ docker run -d -p <port>:8888 -v ${PWD}/notebook/induction:/notebook -v ${PWD}/data/induction:/data infrahelpers/python-jupyter:<linux-distrib>
Jupyter Lab (run from the Docker image) is now available on the Web browser:
http://localhost:8888
Note that the port (8888
by default) may be changed as per your convenience.
- Clone the Git repository:
$ mkdir -p ~/dev/ml && cd ~/dev/ml
$ git clone https://github.com/machine-learning-helpers/docker-python-jupyter.git
$ cd docker-python-jupyter
- Build the Docker image:
$ docker build -t infrahelpers/python-jupyter:<linux-distrib> <linux-distrib>/
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
infrahelpers/python-jupyter linux-distrib 33a1ad533140 About a minute ago 2.29GB
- (Optional) Push the newly built image to Docker Cloud. That step is usually not needed, as the images are automatically built everytime there is a change on GitHub)
$ docker login
$ docker push infrahelpers/python-jupyter:<linux-distrib>
- Shutdown the Docker image
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7b69efc9dc9a ai/python-jupyter:centos9 "/bin/sh -c 'pipenv …" 48 seconds ago Up 47 seconds 0.0.0.0:9000->8888/tcp vigilant_merkle
$ docker kill vigilant_merkle
vigilant_merkle
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES