End to End MLOps Project - Customer Churn Prediction

Objective

Predict whether a customer will change telco provider using this kaggle dataset.

Technologies

Cookiecutter: Data science project structure
Data version control (DVC): Version control of the data assets and to make pipeline
Github: For code version control
GitHub Actions: To create the CI-CD pipeline
MLFlow: For model registry
Dagshub: MLFlow and DVC integration
Heroku: To deploy the application
Flask: To create a web app
EvidentlyAI: To evaluate and monitor ML models in production
Pytest: To implement the unit tests
Flake8: Code linting

Environment Setup

Cookiecutter

pip install cookiecutter-data-science
cookiecutter https://github.com/drivendata/cookiecutter-data-science -c v1

project_name: mlops-project-customer-churn
repo_name: mlops-project-customer-churn
author_name: rohmats
description: End to End MLOps Project - Customer Churn Prediction
Select open_source_license: select MIT(option 1)
s3_bucket /aws_profile[Optional]: just press enter
Select python_interpreter:python3 ( Option 1)

Conda Environment

conda create -n customer_churn python=3.9 -y 
conda activate customer_churn

DVC

Version control of the data assets and to make pipeline

pip install dvc 
dvc init 
dvc add data/external/train.csv
dvc repro

MLflow Dagshub

pip install dagshub mlflow

MLflow UI

Unit testing

pytest -v

CI/CD Pipeline

To create a CI/CD pipeline for your project, you can use GitHub Actions. Here are the steps to set it up:

Create a .github/workflows directory in your repository.
Inside the workflows directory, create a YAML file (e.g., ci-cd.yml) to define your CI/CD workflow.
In the YAML file, define the workflow using the on keyword to specify the events that trigger the workflow (e.g., push to the main branch).
Use the jobs keyword to define the steps of your workflow. For example, you can have a job to build and test your code, and another job to deploy the application.
Configure the necessary environment variables and secrets for your workflow, such as API keys or deployment credentials.
Commit and push the YAML file to your repository.

With GitHub Actions, you can automate the build, test, and deployment processes of your project, ensuring that your application is always up-to-date and running smoothly.

Linting

This project uses flake8 for code linting

Final App Deployed on Heroku

Data Pipeline

Project Organization

├── artifacts               <- MLflow artifacts
│   └── 1
│       └── 465969c77a7341d1b58ee4b044cbbcf8
│           └── artifacts
│               └── model
├── data                    <- Data directory
│   ├── external            <- Data from third party sources
│   ├── processed           <- The final, canonical data sets for modeling
│   └── raw                 <- The original, immutable data dump
├── docs
├── models                  <- Trained and serialized models, model predictions, or model summaries
├── notebooks               <- Jupyter notebooks
├── references              <- Data dictionaries, manuals, and all other explanatory materials
├── reports                 <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures
├── src                     <- Source code for use in this project
│   ├── data                <- Scripts to download or generate data
│   ├── features            <- Scripts to turn raw data into features for modeling
│   ├── models              <- Scripts to train models and then use trained models to make predictions
│   └── visualization       <- Scripts to create exploratory and results oriented visualizations
├── tests                   <- Unit tests
└── webapp                  <- Web application
    ├── model_webapp_dir    <- Model web application directory
    ├── scripts             <- Scripts to run the web application
    ├── static              <- Static files
    │   └── css
    └── templates           <- HTML templates

Project based on the cookiecutter data science project template. #cookiecutterdatascience

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.dvc		.dvc
.github/workflows		.github/workflows
artifacts/1/465969c77a7341d1b58ee4b044cbbcf8/artifacts/model		artifacts/1/465969c77a7341d1b58ee4b044cbbcf8/artifacts/model
data		data
docs		docs
models		models
notebooks		notebooks
references		references
reports		reports
src		src
tests		tests
webapp		webapp
.dvcignore		.dvcignore
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
Procfile		Procfile
README.md		README.md
app.py		app.py
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
image.png		image.png
params.yaml		params.yaml
requirements.txt		requirements.txt
setup.py		setup.py
test_environment.py		test_environment.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End to End MLOps Project - Customer Churn Prediction

Objective

Technologies

Environment Setup

Cookiecutter

Conda Environment

DVC

MLflow Dagshub

Unit testing

CI/CD Pipeline

Linting

Final App Deployed on Heroku

Data Pipeline

Project Organization

About

Languages

License

rohmats/mlops-project-customer-churn

Folders and files

Latest commit

History

Repository files navigation

End to End MLOps Project - Customer Churn Prediction

Objective

Technologies

Environment Setup

Cookiecutter

Conda Environment

DVC

MLflow Dagshub

Unit testing

CI/CD Pipeline

Linting

Final App Deployed on Heroku

Data Pipeline

Project Organization

About

Topics

Resources

License

Stars

Watchers

Forks

Languages