This project sets up a data scientist workbench with :
- MLFlow for experiment tracking and model asset management;
- PostgreSQL for a SQL engine and to serve as a backend for MLFlow;
- MinIO to mimic S3 and act as an artifact and data store;
- Jupyterlab as an EDA environment.
-
Install and Configure WSL 2:
- Ensure that WSL 2 is installed and properly configured on your system.
- You can follow the Microsoft instructions to install WSL 2 here.
-
Install Docker Desktop for Windows:
- Download and install Docker Desktop from the official Docker website.
-
Configure Docker to Use WSL 2:
- Open Docker Desktop and go to the settings.
- Under the "General" tab, ensure that "Use the WSL 2 based engine" is checked.
- Under the "Resources" > "WSL Integration" tab, ensure that your WSL 2 distribution is checked.
- Under the "General" tab, check "Expose daemon on tcp://localhost:2375 without TLS".
-
Clone the Repository
git clone https://github.com/sawadogosalif/DS-backbone.git cd DS-backbone
-
Configure Environment Variables Update a
default.env
file in the root directory with the following variables as you want. -
Build and Run Services
docker-compose --env-file default.env up -d
- JupyterLab:
http://sawalle.ds.notebooks
- MLflow:
http://sawalle.ds.mlflow
- MinIO:
http://sawalle.ds.s3
- JupyterLab:
http:localhost:8888
- MLflow:
http:localhost:5555
- MinIO:
http:localhost:9000
Additionally, in the filenotebooks/tracking_example.py
, we demonstrate how to use MLflow efficiently.
- Image: postgres:11
- Port:
5432
- Image: minio/minio:RELEASE.2020-12-18T03-27-42Z
- Port:
9000
- Port:
5000
- Port:
8888
- Image: jupyter/datascience-notebook:latest
- Port:
80
- Image: nginx:1.25.5