Customer churn is a major problem and one of the most important concerns for companies due to the direct effect on the revenues. Therefore it is important to develop means to predict potential customer to churn. Hence finding factors that increase customer churn is important to take necessary actions to reduce this churn.
To predict customer churn based on various variables like customer account information and customer activity.
Each row of data represents a customer and each column contains's a customer's attributes.
Customers who left/churned : Exited
Demographic Information of customers : Geography , Gender , Age
Customer Account Information :Tenure , HasCrCard , Balance , IsActiveMember , EstimatedSalary , NumOfProducts , CreditScore
-
Loading Data
-
Data Exploration
-
Spliting Data for Train, test and Validation
-
Data Visualization
- Univariate
- Bivariate
-
Finding Missing Values
-
Label Encoding
-
One Hot Encoding of Categorical Values
-
Feature Scaling and Normalization
-
Feature Selection
-
Training Model
- Logistic Regression
- SVM
- Decision Tree
In order to measure the performance of the model, the Area Under Curve (AUC) standard measure, and Accuracy is adopted
Product Distribution
Salary Distribution
Tenure Distribution
Balance Distribution
Customer Age vs Customer Churn
Account Balance vs Customer Churn
Correlation Heat Map of All Features
Selected Features
Training Data
Validating Data
Training Data
Validating Data
Training Data
Validating Data
Comparing All Clasifiers
From the Model Comparison we see that Decision Tree Model has better Area Under curve and Accuracy over the other two models.
The precision of the model on previously unseen test data is slightly higher with regard to predicting 1's i.e. those customers that churn. However, in as much as the model has a high accuracy, it still misses some of those who end up churning. The model could be improved by providing and retraining the model with more data over time. :-)