Currently, this repo is following Stage 1. Read here.
app/ - application deployment
└── app.py - run application
data/ - data pipeline
├── params - Parameter
└── param.yaml - parameter file
├── log - log folder
├── figures - figures folder
├── notebooks - jupyter notebooks
├── eda.py - exploratory data analysis
├── preprocess.py - preprocessing data
└── validate.py - validating data
storage/
├── artifact - model
├── data-lake - raw data
└── data-warehouse - preprocessed data
Python version 3.8.5 is installed.
python -m venv pipeline-env
pip install -r requirements.txt
N.B: If you are in wsl, make sure run pip install wheel
if needed.
Install GCloud SDK from here.
gcloud auth login
gcloud auth application-default login
dvc remote add -d myremote gs://ml-pipeline-demo-storage
dvc push <file_name.dvc> -r myremote
dvc pull <file_name.dvc> -r myremote
Will run the mlflow server in a central server. If you want to run mlflow
locally, just run mlflow ui
from root folder.
Use wsl or linux as there is no support for windows. Read setup in here.