Table of Contents
This project seeks to identify credit card customers that are most likely to churn.
Customer churn is the occurrence of customers discontinuing their usage of a product or service or leaving a business. Predicting it allows for targeted interventions (such as special discounts or follow-ups) to customers which are likely to churn in the near future. The cost of lost revenue is usually much greater than any discounts that could be offered. The task was to predict which customers are likely to churn in the near future.
🌟 Highlights:
- The project follows coding (PEP8) and engineering best practices for implementing software (modular, documented and tested).
- The model has outstanding accuracy and good sensitivity.
- Methods of Explainable AI (XAI) ensure the model is not just a black box but makes transparent and reliable decisions. This facilitates trust in the predictions and makes them more accountable.
On the backend predictions are made using two models:
-
Random Forest
classifier -
Logistic Regression
classifier
File structure:
.
├── churn_library.py # Main
├── churn_notebook.ipynb # Code before refactoring
├── churn_script_logging_and_tests.py # Unit tests module
├── data # Data
│ └── bank_data.csv # Data source in csv format
├── Guide.ipynb # Getting started and troubleshooting
├── images # Generated images
│ ├── eda # Exploratory Data Analysis
│ └── results # Results from training
├── logs # Stored logs
│ └── churn_library.log # Log-file from running unit test
├── models # Stored models
│ ├── logistic_model.pkl # Logistic regression model
│ └── rfc_model.pkl # Random forest model
├── * README.md # Project overview and instructions
├── requirements_py3.6.txt # Python 3.6 module requirements
└── requirements_py3.8.txt # Python 3.8 module requirements
Overall the Random Forest
model seems to have better performance as is visible in the ROC plot:
The Random Forest
forest has a weighted average F1-score of 0.93.
The Logistic regression
forest has a slightly lower weighted average F1-score of 0.89.
Both models display a better performance on the majority class, which is to be expected.
To ensure the model makes transparent predictions methods of explainable AI (XAI) were used.
One such method is the feature importance through Shapley
values:
According to this analysis the most impactful feature is the total transaction count in the last 12
monthsTotal_Trans_Ct
. This feature follows a bimodal distribution.
The second most important is the total transaction amount. On the third place is the revolving balance.
Make sure all module requirements are met or install them with:
python -m pip install -r requirements_py3.6.txt
or alternatively for python==3.8
python -m pip install -r requirements_py3.8.txt
The main code can be found in churn_library.py
.
Upon execution:
- The dataset
/data/bank_data.csv
is imported - Exploratory Data Analysis (EDA) will be performed
- all categorical variables will be encoded with the proportion of churn
- the dataset is split (feature engineering)
- the models are trained
- all results, graphics and models are saved and stored
The program can be run interactively:
ipython churn_library.py
or also purely from the command line (CLI):
python churn_library.py
This project is built with unit tests which also log any errors, info and warning messages to a log file.
To test all functions and log the respective results, please run:
ipython churn_script_logging_and_tests.py
To test a single function function_of_interest
run:
pytest -v churn_script_logging_and_tests.py::function_of_interest
The argument -v
is optional to see which function was tested
- The dataset is open source and stems from LEAPS Analyttica. It consists of 10,000 customers mentioning their age, salary, marital_status, credit card limit, credit card category, etc.
- The project is part of the ML DevOps Engineer Udacity Nanodegree.
Found a bug? Want more features? Find something missing in the documentation? Let me know! Please don't hesitate to file an issue and make a recommendation.
Customer Churn - Modeling of customer churn in credit card data
The MIT License (MIT)
Copyright (c) 2023 Thierry Grimm
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software ("Customer Churn") and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.