Skip to content

Latest commit

 

History

History
39 lines (30 loc) · 2.41 KB

File metadata and controls

39 lines (30 loc) · 2.41 KB

Telco Customer Churn Feature Engineering

Overview

This project focuses on developing a machine learning model to predict potential customer churn for a fictitious telecom company based in California. We leverage data from 7043 customers to identify those likely to discontinue their services. The goal is to perform thorough data analysis and feature engineering before modeling to enhance prediction accuracy.

Data Description

The dataset contains detailed information about the telecom services used by 7043 customers in the third quarter, along with their demographic and account information. Here are the key details:

  • Observations: 7043
  • Variables: 21

Variables

  • Customer Demographics: Includes gender, senior citizen status, partnership and dependents status.
  • Service Information: Details on phone and internet services, including multiple lines, online security, device protection, tech support, and streaming options.
  • Account Information: Data on tenure, contract type, billing method, and charges.

Target Variable

  • Churn: Indicates whether a customer left the service (Yes or No).

Tasks

Exploratory Data Analysis (EDA)

  1. Overall Examination: Understand the general structure and summary of the data.
  2. Variable Identification: Distinguish between numeric and categorical variables.
  3. Detailed Analysis: Deep dive into the distributions and characteristics of both numeric and categorical variables.
  4. Target Analysis: Explore how different variables interact with the churn rate.
  5. Outlier Analysis: Identify and analyze anomalies in the dataset.
  6. Missing Values Analysis: Detect and address missing data points.
  7. Correlation Analysis: Examine the relationships between variables to identify potential collinearity.

Feature Engineering

  1. Data Cleaning: Handle missing data and outliers.
  2. Feature Creation: Develop new variables that could enhance model accuracy.
  3. Encoding: Convert categorical variables into a format suitable for modeling.
  4. Standardization: Scale numeric variables to standardize their range.
  5. Model Development: Construct a predictive model to identify at-risk customers.

Conclusion

The insights derived from this analysis will guide the strategic decisions to improve customer retention strategies. The final model aims to provide actionable predictions, helping the company to mitigate churn effectively.