The ABC Multistate bank has churn problem, also known as a customer churn problem, is a machine learning problem focused on predicting whether a customer is likely to leave (churn) or stay with a bank based on historical data. Churn refers to the process by which customers discontinue their relationship with a company or service, and in the context of a bank, it means customers closing their accounts and moving to another bank.
- Problem type: Supervised/Classification
The dataset was found as a Kaggle dataset. Sample data:
customer_id | credit_score | country | gender | age | tenure | balance | products_number | credit_card | active_member | estimated_salary | churn |
---|---|---|---|---|---|---|---|---|---|---|---|
15634602 | 619 | France | Female | 42 | 2 | 0 | 1 | 1 | 1 | 101348.88 | 1 |
15647311 | 608 | Spain | Female | 41 | 1 | 83807.86 | 1 | 0 | 1 | 112542.58 | 0 |
15619304 | 502 | France | Female | 42 | 8 | 159660.8 | 3 | 1 | 0 | 113931.57 | 1 |
15701354 | 699 | France | Female | 39 | 1 | 0 | 2 | 0 | 0 | 93826.63 | 0 |
15737888 | 850 | Spain | Female | 43 | 2 | 125510.82 | 1 | 1 | 1 | 79084.1 | 0 |
As a machine learning problem, the goal is to build a predictive model that can accurately identify customers who are at risk of churning. This model can help banks take proactive measures to retain valuable customers by offering targeted incentives, personalized services, or early intervention strategies.
- Solution type: batch deployment for the model tranining and inference.
The tech stack used:
The project uses:
And th VM used for the project (AWS EC2 instance):
We use Makefile to reproduce the needed environment in any infrastructure.
SHELL=/bin/bash
build-environment-and-services:
@echo "Building Python environment"
pip install pipenv &&\
pipenv install
@echo "Running MLFlow Server on localhost:5000"
rm -rf mlflow.db mlruns/ &&\
nohup mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root ./artifacts \
--host localhost:5000 &
@echo "Deploying Prefect Server on localhost:4200"
nohup prefect server start &
@echo "Deploying Monitoring Service"
docker-compose -f monitoring/docker-compose.yml up -d
@echo "Deploying Grafana on localhost:3000"
@echo "Deploying Adminer on localhost:8080"
@echo "The local environment is ready to be used."
Execute entire environment:
make
A machine learning (ML) platform interface for deploying machine learning models using the stack of AWS, Flask, MLflow, and Prefect would provide a seamless and scalable solution for model deployment and management. Let's break down the components of the platform:
AWS (Amazon Web Services): AWS is a cloud computing service that offers a wide range of tools and services to build, deploy, and manage applications. In the context of the ML platform, AWS will provide the infrastructure and services for hosting the platform components, managing data, and deploying machine learning models.
Flask: Flask is a lightweight and flexible web framework for Python. It will be used to create the backend of the ML platform, handling HTTP requests and responses. Flask allows easy integration with other Python libraries and will serve as the API layer to interact with the ML models.
MLflow: MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It allows data scientists to track and version their experiments, package and deploy models, and manage model deployments. MLflow also provides tools for model registry and collaboration between team members.
Prefect: Prefect is a workflow management system that helps in orchestrating complex data workflows, including ML model training, evaluation, and deployment. It provides a way to define, schedule, and monitor workflows, making it easier to automate and manage the deployment pipeline for machine learning models.
UI for deployment:
For experiment tracking and model registry we use mlflow
- Register model
- Promote best model to Production
We use prefect for orchestration in:
- Model training
src/training_pipeline.py
- Model inference (Predict new data)
src/batch_scoring_pipeline.py
) - Model monitoring (Calculate drift, and model performance)
src/monitor_ml_churn_model.py
Deployment is done via Makefile
+ Dockerfile
- clone the repository
git clone https://github.com/abdala9512/mlops-zoomcamp-project-2023.git
- Execute
Makefile
to create servicesmake
- Execute model training pipeline
python src/training_pipeline.py
- Promote any model to PROD in Mlflow
- Execute scoring pipeline
python src/batch_scoring_pipeline.py
- Execute Monitoring pipeline
python src/monitor_ml_churn_model.py
Machine learning monitoring with Grafana and Postgres involves using these two tools to track, visualize, and analyze the performance and behavior of machine learning models deployed in production. Let's break down how each component contributes to the monitoring process:
Machine Learning Models in Production: When machine learning models are deployed in a production environment, they interact with real-world data, and their performance and behavior may change over time. Monitoring these models is essential to ensure they continue to make accurate predictions and maintain their desired performance.
Grafana: Grafana is an open-source data visualization and monitoring tool. It allows you to create interactive and customizable dashboards to visualize and analyze data from various sources, including databases, APIs, and monitoring systems. Grafana is highly extensible and supports numerous data sources, making it suitable for integrating with different monitoring and logging tools.
Postgres (PostgreSQL): Postgres is an open-source, powerful relational database management system (RDBMS). It is often used to store data from applications, including machine learning models. Postgres is known for its performance, scalability, and support for complex queries.
Adminer data explorer (PostgreSQL database)
- Run
Makefile
(Local or cloud) - Execute docker files
echo "Build dockerfile"
docker build -t customer_churn_ml_pipeline ./src/deployment
docker run -v $(pwd):/app/ -it customer_churn_ml_pipeline