Customer Churn Prediction for Music Streaming Business with Apache Spark

1. Project Motivation

Streaming businesses like Spotify, Pandora, or Netflix often provide services to customers on a month-to-month basis, which is more seductive for them. Customers can use the high-quality services at a low cost and have a secure feeling that they can cancel the service anytime if unsatisfied.

However, this autonomy could be a burden for the service providers. Streaming businesses often undergo a high turnover of users. Correctly predicting which customers are likely to cancel the service could minimize the churn rate, which would be valuable to the business. It is a well-known fact that the cost of acquiring a new customer is often much higher than retaining an existing one. If such users are accurately identified in advance, service providers can offer them incentives for staying and potentially save millions in revenues.

To help boosting the businesses of Sparkify, a fictitious music streaming app, this project tried to tackle the abovementioned problems through machine learning techniques.

2. Dependencies

The code is developed with Python 3.7.12 and is dependent on python packages listed as below:

matplotlib == 3.2.2
numpy == 1.19.5
pandas == 1.1.5
pyspark == 3.1.2
seaborn == 0.11.2

3. File Structure & Description

|-- images # contains visualizations generated by Mini-dataset_Submission.ipynb
|-- mini_sparkify_event_data.json # a 128MB subset of the full 12GB dataset
|-- Full_dataset_AWS-EMR.ipynb # the model develpoment with the 12GB full dataset & Amazon Elastic MapReduce
|-- Full_dataset_AWS-EMR.html # model develpoment with the 12GB full dataset & Amazon Elastic MapReduce
|-- Mini_dataset_Submission.ipynb # model develpoment with the 128MB mini dataset
|-- README.md

4. Summary of Results

The results are presented in a medium blog post available here.

5. Acknowledgements

Udacity is credited with simulating the data used in this project. Udacity imitated the data generated by the real-world music streaming service provider.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Churn Prediction for Music Streaming Business with Apache Spark

Table of Contents

1. Project Motivation

2. Dependencies

3. File Structure & Description

4. Summary of Results

5. Acknowledgements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
images		images
Full_dataset_AWS-EMR.html		Full_dataset_AWS-EMR.html
Full_dataset_AWS-EMR.ipynb		Full_dataset_AWS-EMR.ipynb
Mini_dataset_Submission.ipynb		Mini_dataset_Submission.ipynb
README.md		README.md

timchansdp/Churn-Prediction-with-PySpark

Folders and files

Latest commit

History

Repository files navigation

Customer Churn Prediction for Music Streaming Business with Apache Spark

Table of Contents

1. Project Motivation

2. Dependencies

3. File Structure & Description

4. Summary of Results

5. Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages