Churn Analysis Project

This repository contains a sample project for analyzing customer churn in a subscription service. The project uses Python for data analysis and model building, and Tableau for data visualization.

Project Overview

The goal of this project is to predict customer churn for a subscription service. The analysis involves:

Data Preprocessing
Exploratory Data Analysis (EDA)
Model Building and Training
Model Evaluation
Exporting Data for Tableau Visualization

Dataset

The dataset used for this project is synthetically generated and consists of the following features:

CustomerID: Unique identifier for each customer
Gender: Gender of the customer
Age: Age of the customer
Tenure: Number of months the customer has been with the company
SubscriptionPlan: Subscription plan of the customer (Basic, Standard, Premium)
MonthlyCharges: Monthly charges for the customer
Churn: Whether the customer has churned (0 = No churn, 1 = Churn)
TotalCharges: Total charges for the customer (calculated as Tenure * MonthlyCharges)

Exploratory Data Analysis (EDA)

The EDA section of the code includes:

Churn Count Visualization: This plot shows the distribution of churned vs. non-churned customers, providing a quick look at the imbalance in the dataset.
Age Distribution Visualization: A histogram that displays the age distribution of customers, helping to understand the age range and common age groups within the dataset.
Monthly Charges by Subscription Plan Visualization: A box plot that illustrates the distribution of monthly charges across different subscription plans, highlighting the variations in charges among the plans.
Correlation Matrix Visualization: A heatmap showing the correlation between different numerical features, which helps in identifying the relationships and dependencies among the features.

Model Building

The model building process includes:

Train-test split with stratification
Feature scaling
Training a RandomForestClassifier

Evaluation

The model evaluation includes:

Classification Report: Provides precision, recall, and F1-score for the model, giving a detailed performance summary.
Confusion Matrix: A matrix that shows the counts of true positives, true negatives, false positives, and false negatives, helping to evaluate the classification accuracy.
ROC AUC Score: The ROC AUC score is used to measure the model's ability to distinguish between classes.
ROC Curve Visualization: A plot of the True Positive Rate (TPR) against the False Positive Rate (FPR), showing the performance of the classification model at various threshold settings.
Feature Importance Visualization: A bar plot that ranks the features based on their importance in the model, indicating which features have the most influence on predicting churn.

Tableau Visualizations

You can view the Tableau visualizations for this project here.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Dashboard 1.pdf		Dashboard 1.pdf
README.md		README.md
churnAnalysis.ipynb		churnAnalysis.ipynb
churn_data.csv		churn_data.csv
correlation_matrix.csv		correlation_matrix.csv
correlation_matrix_long.csv		correlation_matrix_long.csv
feature_importances.csv		feature_importances.csv
roc_curve_data.csv		roc_curve_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Churn Analysis Project

Table of Contents

Project Overview

Dataset

Exploratory Data Analysis (EDA)

Model Building

Evaluation

Tableau Visualizations

About

Releases

Packages

Languages

chrispsang/CustomerChurnAnalysis

Folders and files

Latest commit

History

Repository files navigation

Churn Analysis Project

Table of Contents

Project Overview

Dataset

Exploratory Data Analysis (EDA)

Model Building

Evaluation

Tableau Visualizations

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages