Skip to content

Latest commit

 

History

History
25 lines (17 loc) · 1 KB

README.md

File metadata and controls

25 lines (17 loc) · 1 KB

HSE university, spring 2024

Implementing automatic data processing, model training and experiment tracking with Airflow and MLflow.

The goal of the course was to write a DAG that trains 3 different regressor models and stores its code and metrics in a MLflow experiment.

The DAG:

  • Retrieves data from locally-running PostgreSQL server, and stores it into S3
  • Retrieves data from S3 and runs preprocessing. The results are also stored into S3
  • Initializes the MLflow experiment, providing experiment id to be used for training tasks
  • Runs 3 model training tasks in parallel, logging the model and storing regressor metrics into MLflow
  • Saves timestamps of the tasks into S3

DAG graph: DAG_graph

MLflow metrics of all models: MLflow_metrics

Metadata of one of the models - HistGB: MLflow_HistGB_artifacts

Code for the DAG is available here.