This repository contains a sample project for analyzing customer churn in a subscription service. The project uses Python for data analysis and model building, and Tableau for data visualization.
- Project Overview
- Dataset
- Exploratory Data Analysis (EDA)
- Model Building
- Evaluation
- Tableau Visualizations
The goal of this project is to predict customer churn for a subscription service. The analysis involves:
- Data Preprocessing
- Exploratory Data Analysis (EDA)
- Model Building and Training
- Model Evaluation
- Exporting Data for Tableau Visualization
The dataset used for this project is synthetically generated and consists of the following features:
CustomerID
: Unique identifier for each customerGender
: Gender of the customerAge
: Age of the customerTenure
: Number of months the customer has been with the companySubscriptionPlan
: Subscription plan of the customer (Basic, Standard, Premium)MonthlyCharges
: Monthly charges for the customerChurn
: Whether the customer has churned (0 = No churn, 1 = Churn)TotalCharges
: Total charges for the customer (calculated asTenure
*MonthlyCharges
)
The EDA section of the code includes:
- Churn Count Visualization: This plot shows the distribution of churned vs. non-churned customers, providing a quick look at the imbalance in the dataset.
- Age Distribution Visualization: A histogram that displays the age distribution of customers, helping to understand the age range and common age groups within the dataset.
- Monthly Charges by Subscription Plan Visualization: A box plot that illustrates the distribution of monthly charges across different subscription plans, highlighting the variations in charges among the plans.
- Correlation Matrix Visualization: A heatmap showing the correlation between different numerical features, which helps in identifying the relationships and dependencies among the features.
The model building process includes:
- Train-test split with stratification
- Feature scaling
- Training a RandomForestClassifier
The model evaluation includes:
- Classification Report: Provides precision, recall, and F1-score for the model, giving a detailed performance summary.
- Confusion Matrix: A matrix that shows the counts of true positives, true negatives, false positives, and false negatives, helping to evaluate the classification accuracy.
- ROC AUC Score: The ROC AUC score is used to measure the model's ability to distinguish between classes.
- ROC Curve Visualization: A plot of the True Positive Rate (TPR) against the False Positive Rate (FPR), showing the performance of the classification model at various threshold settings.
- Feature Importance Visualization: A bar plot that ranks the features based on their importance in the model, indicating which features have the most influence on predicting churn.
You can view the Tableau visualizations for this project here.