This repository contains Two's Solution to the ZenML Month of MLOps Competition.
The aim of this project is to develop a production-ready ML application for fraud detection using the ZenML MLOps framework. To train our fraud detection model, we make use of the "Synthetic data from a financial payment system" Dataset available on Kaggle.
This repository contains an end-to-end ML solution using ZenML, which covers the following responsiblities:
- Importing the Dataset
- Cleaning the data & engineering informative features
- Detecting data drift of new data
- Training a model to detect fraud on a transactional level
- Evaluating the performance of the model
- Deploying the model to a REST API endpoint
- Providing an interface for users to interact with the model
To address these requirements, we built a Training Pipeline, which we used for experimentation, and a Continuous Deployment Pipeline, which extended the capabilities of the Training Pipeline to identify data drift in new data, train a model on all available data, and evaluate the performance of this model prior to deploying this to an API endpoint.
To enable the aforementioned pipelines, we made use of the following ZenML Stack:
Artifact Storage: Google Cloud Storage
Container Registry: Google Cloud Container Registry
Data Validator: EvidentlyAI
Experiment Tracker: MLFlow
Orchestrator: Google Kuberenetes Engine
Model Deployer: Seldon
There are a number of ways of interacting with the code in this repository:
- Executing the Training & Continuous Deployment Pipelines
- Running the Streamlit App
- Running the Tests
-
Ensure you have Python 3.9 installed on your machine
-
Install the development requirements:
~ $ pip install -r test-requirements.txt
-
Deploy and register the ZenML stack described in the Solution Overview
-
Create an
.env
file from the.env.example
template -
To execute the train pipeline:
~ $ python src/run_train_pipeline.py
- To execute the deployment pipeline:
~ $ python src/run_deployment_pipeline.py
The Streamlit application entrypoint is the app.py
file at the root of the repository. We have deployed this app to Streamlit Cloud.
To recreate the app on your local machine, you must:
-
Ensure you have Python 3.9 installed on your machine
-
Install the Streamlit requirements:
~ $ pip install -r requirements.txt
-
Create an
.env
file according to the.env.example
template -
Deploy the Streamlit application
~ $ streamlit run app.py
-
Ensure you have Python 3.9 installed on your machine
-
Install the test requirements:
~ $ pip install -r test-requirements.txt
- Execute tests using
pytest
~ $ pytest
├── .github <- CI Pipeline Definition
├── src
│ ├── pipelines <- Pipeline Definition
│ │ ├── ...
│ ├── steps <- Step Definitons
│ │ ├── ...
│ ├── util <- Utility Definitions
│ │ ├── ...
│ ├── data_exploration.ipynb <- Data Exploration Notebook
│ ├── feature_engineering.ipynb <- Feature Engineering Experimentation Notebook
│ ├── run_deployment_pipeline.py <- Deployment Pipeline Execution script
│ ├── run_train_pipeline.py <- Training Pipeline Execution Script
├── tests
│ ├── util <- Utility Function Tests
│ │ ├── ...
├── app.py <- Streamlit App
├── docker-requirements.txt <- Step Container Dependencies
├── notebook-requirements.txt <- Notebook Dependencies
├── requirements.txt <- Streamlit App Dependencies
├── test-requirements.txt <- Development Dependencies