Skip to content

Latest commit

 

History

History
315 lines (198 loc) · 13.2 KB

README.md

File metadata and controls

315 lines (198 loc) · 13.2 KB

JupyterLab for Data Science and Knowledge Graphs

Publish CPU images Publish GPU image

JupyterLab image with VisualStudio Code server integrated, based on the jupyter/docker-stacks scipy image, with additional packages and kernels installed for data science and knowledge graphs.

Screenshot

🔋 Features

List of features for the images available running on CPU

ghcr.io/maastrichtu-ids/jupyterlab:latest

This is the base image with useful interfaces and libraries for data science preinstalled:

📋️ VisualStudio Code server is installed, and accessible from the JupyterLab Launcher

🐍 Python 3.8 with notebook kernel supporting autocomplete and suggestions (jupyterlab-lsp)

☕️ Java OpenJDK 11 with IJava notebook kernel

🐍 Conda and mamba are installed, each conda environment created will add a new option to create a notebook using this environment in the JupyterLab Launcher (with nb_conda_kernels). You can create environments using different version of Python if necessary.

🧑‍💻 ZSH is used by default for the JupyterLab and VisualStudio Code terminals

The following JupyterLab extensions are also installed: jupyterlab-git, jupyterlab-system-monitor, jupyter_bokeh, plotly, jupyterlab-spreadsheet, jupyterlab-drawio.

ghcr.io/maastrichtu-ids/jupyterlab:knowledge-graph

Extended from ghcr.io/maastrichtu-ids/jupyterlab:latest, it contains

✨️ SPARQL kernel to query RDF knowledge graphs

✨️ Apache Spark and PySpark are installed for distributed data processing

💎 OpenRefine is installed, and accessible from the JupyterLab Launcher

🦀 Oxigraph SPARQL database

⚡️ Blazegraph SPARQL database

☕️ Java .jar programs for knowledge graph processing are pre-downloaded in the /opt folder, such as RDF4J, Apache Jena, OWLAPI, RML mapper.

ghcr.io/maastrichtu-ids/jupyterlab:r-notebook

📈 R kernel

Automatically install your code and dependencies

With those docker images, you can optionally provide the URL to a git repository to be automatically cloned in the workspace at the start of the container using the environment variable GIT_URL

The following files will be automatically installed if they are present at the root of the provided Git repository:

  • The conda environment described in environment.yml will be installed, make sure you added ipykernel and nb_conda_kernels to the environment.yml to be able to easily start notebooks using this environment from the JupyterLab Launcher page. See this repository as example.
  • The python packages in requirements.txt will be installed with pip
  • The debian packages in packages.txt will be installed with apt-get
  • The JupyterLab extensions in extensions.txt will be installed with jupyter labextension

You can also create a conda environment from a file in a running JupyterLab (we use mamba which is like conda but faster):

mamba env create -f environment.yml

You'll need to wait a minute before the new conda environment becomes available on the JupyterLab Launcher page.

📝 Extend a CPU image

The easiest way to build a custom image is to extend the existing images.

For notebooks running on CPU, we use images from the official jupyter/docker-stacks, which run as non root user. So you will need to make sure the folders permissions are properly set for the notebook user.

Here is an example Dockerfile to extend ghcr.io/maastrichtu-ids/jupyterlab:latest:

FROM ghcr.io/maastrichtu-ids/jupyterlab:latest
# Change to root user to install packages requiring admin privileges:
USER root
RUN apt-get update && \
    apt-get install -y vim
RUN fix-permissions /home/$NB_USER
# Switch back to the notebook user for other packages:
USER ${NB_UID}
RUN mamba install -c defaults -y rstudio
RUN pip install jupyter-rsession-proxy

For docker image that are not based on the jupyter/docker-stack, such as the GPU images, you the root user is used by default. See at the further in this README for more information on how to extend GPU images.

🐳 Run a CPU image with Docker

For the ghcr.io/maastrichtu-ids/jupyterlab:latest image volumes should be mounted into /home/jovyan/work folder.

This command will start JupyterLab as jovyan user with sudo privileges, use JUPYTER_TOKEN to define your password:

docker run --rm -it --user root -p 8888:8888 -e GRANT_SUDO=yes -e JUPYTER_TOKEN=password -v $(pwd)/data:/home/jovyan/work ghcr.io/maastrichtu-ids/jupyterlab

You should now be able to install anything in the JupyterLab container, try:

sudo apt-get update

You can check the docker-compose.yml file to run it easily with Docker Compose.

Run with a restricted jovyan user, without sudo privileges:

docker run --rm -it --user $(id -u) -p 8888:8888 -e CHOWN_HOME=yes -e CHOWN_HOME_OPTS='-R' -e JUPYTER_TOKEN=password -v $(pwd)/data:/home/jovyan/work ghcr.io/maastrichtu-ids/jupyterlab:latest

⚠️ Potential permission issue when running locally. The official jupyter/docker-stacks images use the jovyan user by default which does not grant admin rights (sudo). This can cause issues when writing to the shared volumes, to fix it you can change the owner of the folder, or start JupyterLab as root user.

To create the folder with the right permissions, replace 1000:100 by your username:group if necessary and run:

mkdir -p data/
sudo chown -R 1000:100 data/

📦 Build CPU images

Instructions to build the various image aiming to run on CPU.

JupyterLab for Data Science

This repository contains multiple folders with Dockerfile to build various flavor of JupyterLab for Data Science.

With Python 3.8, conda integration, VisualStudio Code, Java and SPARQL kernels

Build:

docker build -t ghcr.io/maastrichtu-ids/jupyterlab .

Run:

docker run --rm -it --user root -p 8888:8888 -e JUPYTER_TOKEN=password -v $(pwd)/data:/home/jovyan/work ghcr.io/maastrichtu-ids/jupyterlab

Push:

docker push ghcr.io/maastrichtu-ids/jupyterlab

JupyterLab for Knowledge graph

With Oxigraph and Blazegraph SPARQL database, and additional python/java library for RDF processing:

docker build -f knowledge-graph/Dockerfile -t ghcr.io/maastrichtu-ids/jupyterlab:knowledge-graph .
docker run --rm -it -p 8888:8888 -e JUPYTER_TOKEN=password ghcr.io/maastrichtu-ids/jupyterlab:knowledge-graph

Python 2.7

With a python2.7 kernel only (python3 not installed). Build and run (workdir is /root):

docker build -t ghcr.io/maastrichtu-ids/jupyterlab:python2.7 ./python2.7
docker run --rm -it -p 8888:8888 -e JUPYTER_TOKEN=password ghcr.io/maastrichtu-ids/jupyterlab:python2.7

Ricopili

Based on https://github.com/bruggerk/ricopili_docker. Build and run (workdir is /root):

docker build -t ghcr.io/maastrichtu-ids/jupyterlab:ricopili ./ricopili
docker run --rm -it -p 8888:8888 -v $(pwd)/data:/root -e JUPYTER_TOKEN=password ghcr.io/maastrichtu-ids/jupyterlab:ricopili

FSL on CPU

Built with https://github.com/ReproNim/neurodocker. Build and run (workdir is /root):

docker build -t ghcr.io/maastrichtu-ids/jupyterlab:fsl ./fsl
docker run --rm -it -p 8888:8888 -v $(pwd)/data:/root -e JUPYTER_TOKEN=password ghcr.io/maastrichtu-ids/jupyterlab:fsl

⚡️JupyterLab on GPU

To deploy JupyterLab on GPU we use the official Nvidia images, we defined the same gpu.dockerfile to install additional dependencies, such as JupyterLab and VisualStudio Code, with different images from Nvidia:

🗜️ TensorFlow with nvcr.io/nvidia/tensorflow:

  • ghcr.io/maastrichtu-ids/jupyterlab:tensorflow

🔥 PyTorch with nvcr.io/nvidia/pytorch:

  • ghcr.io/maastrichtu-ids/jupyterlab:pytorch

👁️ CUDA with nvcr.io/nvidia/cuda:

  • ghcr.io/maastrichtu-ids/jupyterlab:cuda

Volumes should be mounted into the /workspace/persistent or /workspace folder.

📝 Extend a GPU image

The easiest way to build a custom image is to extend the existing images.

Here is an example Dockerfile to extend ghcr.io/maastrichtu-ids/jupyterlab:tensorflow based on nvcr.io/nvidia/tensorflow:

FROM ghcr.io/maastrichtu-ids/jupyterlab:tensorflow
RUN apt-get update && \
    apt-get install -y vim
RUN pip install jupyter-tensorboard

📦 Build GPU images

You will find here the commands to use to build our different GPU docker images, most of them are using the gpu.dockerfile

CUDA on GPU

Change the build-arg and run from the root folder of this repository:

docker build --build-arg NVIDIA_IMAGE=nvcr.io/nvidia/cuda:11.4.2-devel-ubuntu20.04 -f gpu.dockerfile -t ghcr.io/maastrichtu-ids/jupyterlab:cuda .

Run an image on http://localhost:8888

docker run --rm -it -p 8888:8888 -e JUPYTER_TOKEN=password -v $(pwd)/data:/workspace/persistent ghcr.io/maastrichtu-ids/jupyterlab:cuda

PyTorch on GPU

Change the build-arg and run from the root folder of this repository:

docker build --build-arg NVIDIA_IMAGE=nvcr.io/nvidia/pytorch:23.03-py3 -f gpu.dockerfile -t ghcr.io/maastrichtu-ids/jupyterlab:pytorch .

Run an image on http://localhost:8888

docker run --rm -it -p 8888:8888 -e JUPYTER_TOKEN=password -v $(pwd)/data:/workspace/persistent ghcr.io/maastrichtu-ids/jupyterlab:pytorch

Tensorflow on GPU

Change the build-arg and run from the root folder of this repository:

docker build --build-arg NVIDIA_IMAGE=nvcr.io/nvidia/tensorflow:21.11-tf2-py3 -f gpu.dockerfile -t ghcr.io/maastrichtu-ids/jupyterlab:tensorflow .

Run an image on http://localhost:8888

docker run --rm -it -p 8888:8888 -e JUPYTER_TOKEN=password -v $(pwd)/data:/workspace/persistent ghcr.io/maastrichtu-ids/jupyterlab:tensorflow

FSL on GPU

This build use a different image, go to the fsl-gpu folder. And check the README.md for more details.

Build:

docker build -t ghcr.io/maastrichtu-ids/jupyterlab:fsl-gpu ./fsl-gpu

Run (workdir is /workspace):

docker run --rm -it -p 8888:8888 -e JUPYTER_TOKEN=password ghcr.io/maastrichtu-ids/jupyterlab:fsl-gpu

☁️ Deploy on Kubernetes and OpenShift

This image is compatible with OpenShift and OKD security constraints to run as non root user.

We recommend to use this Helm chart to deploy these JupyterLab images on Kubernetes or OpenShift: https://artifacthub.io/packages/helm/dsri-helm-charts/jupyterlab

If you are working or studying at Maastricht University, you can easily deploy this notebook on the Data Science Research Infrastructure (DSRI) 🌉

🕊️ Contribute to this repository

Choose which image fits your need: latest, tensorflow, cuda, pytorch, freesurfer, python2.7...

  1. Fork this repository.

  2. Clone the forked repository

  3. Edit the Dockerfile for the image you want to improve. Preferably use mamba or conda to install new packages, you can also install with apt-get (need to run as root or with sudo) and pip

  4. Go to the folder and rebuild the Dockerfile:

docker build -t jupyterlab -f Dockerfile .
  1. Run the docker image built on http://localhost:8888 to test it
docker run -it --rm -p 8888:8888 -e JUPYTER_TOKEN=yourpassword jupyterlab

If the built Docker image works well feel free to send a pull request to get your changes merged to the main repository and integrated in the corresponding published Docker image.

You can check the size of the image built in MB:

expr $(docker image inspect ghcr.io/maastrichtu-ids/jupyterlab:latest --format='{{.Size}}') / 1000000