Skip to content

Final Project of the MLOps Zoomcamp hosted by DataTalksClub.

License

Notifications You must be signed in to change notification settings

KarimLulu/mlops-loan-prediction

Repository files navigation

MLOps Lending Club Loan Prediction Project

This is the final project for the MLOps ZoomCamp course by DataTalks.Club.

Problem Statement

We need to build an end-to-end machine learning system that predicts whether a particular user returns the loan or not. We access the probability of full repayment based on various features - employment length, annual income, home ownership status, etc.

It should be a fault-tolerant, monitored, and containerized web service that receives the object's features and returns whether the loan will be repaid.

System Description

The system contains the following parts:

The project runs locally and uses AWS S3 to store model artifacts using MLFlow. It is containerized and can be easily deployed to the cloud.

Dataset

We took the Lending Club dataset and reduced it to 10k records to speed up training and prototyping. It contains 23 features and the target column is_bad - whether the loan was repaid or not. For this project, we use 3 features - employment length, annual income, and home ownership status.

How to Run

Serving Part

  1. Clone the repo
git clone https://github.com/KarimLulu/mlops-loan-prediction.git
  1. Navigate to the project folder
cd mlops-loan-prediction
  1. Build all required services
docker-compose build
  1. Create and start containers
docker-compose up
  1. Send some data records to the system in a separate terminal window:
make setup
pipenv run python -m monitoring.send_data
  1. Open Grafana in the browser and find Evidently Data Drift Dashboard
  2. Enjoy the live data drift detection!

monitoring

Experimentation and orchestration part

  1. Set up the environment and prepare the project
make setup
  1. Start Prefect server
pipenv run prefect orion start --host 0.0.0.0
  1. Install aws-cli and configure AWS profile
  • If you've already created an AWS account, head to the IAM section, generate your secret-key, and download it locally. Instructions

  • Configure aws-cli with your downloaded AWS secret keys:

       $ aws configure
       AWS Access Key ID [None]: xxx
       AWS Secret Access Key [None]: xxx
       Default region name [None]: eu-west-1
       Default output format [None]:
  • Verify aws config:

      $ aws sts get-caller-identity
  1. Set S3 bucket
export BUCKET_NAME=s3-bucket-name
  1. Run MLFlow server
pipenv run mlflow server --default-artifact-root s3://$BUCKET_NAME --backend-store-uri sqlite:///mlflow_db.sqlite
  1. Create the deployment for the training pipeline
pipenv run python -m prediction_service.train_workflow
  1. Run the deployment
pipenv run prefect deployment run 'main-flow/model_training_workflow'
  1. Run the agent
pipenv run prefect agent start -q 'mlops'
  1. Wait until it finishes and registers the new production model.

Run tests and code quality checks

Unit tests

make test

Integration tests

make integration_test

Code formatting and code quality checks (isort, black, pylint)

make quality_checks

Pre-commit hooks

Code formatting pre-commit hooks are triggered on each commit

CI/CD

PR triggers CI Workflow

  • Environment setup, Unit tests, and Integration test