MusicBoxRetentionAnalysis

The reading sequence is 0_create_data_folders.sh
1_download_data.ipynb
2_unpack_and_clean_files.sh
3_etl_down_sample_by_user.ipynb
4_EDA_with_spark.ipynb
5_feature_engineer_with_spark.ipynb
6_train_model_sklearn.ipynb

Summary of data analysis
Part zero: create_data_folders.sh
■ Create data folder for data downloading

Part one: download_data.ipynb
■ Down load data from AWS

Part two: unpack_and_clean_files.sh
■ Un-compress the downloaded data and combine them to each category.

Part three: etl_down_sample_by_user.ipynb
■ Removed the robots users from data set based on unusual music play frequency.
■ Applied down sampling method to cut data size in half.
■ Introduced the data structure and content.

Part four: EDA_with_spark.ipynb
Explored the following questions:
■ What's the user activity from 04/01 to 05/12?
■ What's the device distribution?
■ What's the percentage of paid songs?
■ Why do users stop playing songs before it ends?
■ What's the weekly retention rate for users?

Part five: feature_engineer_with_spark.ipynb
■ Created and engineered features for retention analysis.

Part six: train_model_sklearn.ipynb
■ Based on generated feature, built logistic regression, random forest, GBDT and neural network to predict monthly retention behavior.
■ Compared the performance of each model through cross-validation.
■ Usd grid-search to fine tuning random forest model due to its good performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MusicBoxRetentionAnalysis

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
0_create_data_folders.sh		0_create_data_folders.sh
1_download_data.ipynb		1_download_data.ipynb
2_unpack_and_clean_files.sh		2_unpack_and_clean_files.sh
3_etl_down_sample_by_user.ipynb		3_etl_down_sample_by_user.ipynb
4_EDA_with_spark.ipynb		4_EDA_with_spark.ipynb
5_feature_engineer_with_spark.ipynb		5_feature_engineer_with_spark.ipynb
6_train_model_sklearn.ipynb		6_train_model_sklearn.ipynb
README.md		README.md

PursueHappinessDirectly/MusicBoxRetentionAnalysis

Folders and files

Latest commit

History

Repository files navigation

MusicBoxRetentionAnalysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages