This GitHub repository hosts the code and report for "Capstone project - Arvato Customer Segmentation" that I developed and completed as part of Udacity Machine Learning Engineer Nanodegree program.
In this project, I have employed the use of supervised and unsupervised machine learning algorithms to deal with real-life data provided by Bertelsmann Arvato Analytics. More specifically, I have worked on 4 demographics datasets and 2 metadata files provided by Arvato Financial Services with the goal of helping a client mailorder company target next probable customers.
- Data Description
- Technical Overview
- Requirements
- Results
- Acknowledgements
- Author
Demographics Data:
Customer Segmentation
- General Population demographics
- Customer demographics
Customer Acquisition
- Training data
- Test data
Metadata providing attribute information:
- DIAS Information Levels - Attributes
- DIAS Attributes - Values
The project have been divided into the following steps:
- Data Exploration and Pre-processing
- Feature Engineering
- Dimensionality Reduction
- Clustering
- Selection of Supervised Learning Models
- Model Tuning
- Model Evaluation
- Predictions on the Test Dataset
- Submission to Kaggle
Details are in Report.pdf
The Jupyter Notebook is written in Python (3.x. version required).
The required libraries for this project are in the requirement.txt file.
The main packages include: numpy, pandas, matplotlib, seaborn, scikit-learn, lightgbm and xgboost.
The results have been well docomented in the Jupiter Notebook. Please refer Arvato Project Workbook.ipynb
I would like to thank the commitment of Udacity for presenting me to this Capstone project and Arvato Financial Services for providing the real-life data.
The syllabus of this Machine Learning Nanodegree Program is here
Funing Tian
Contact: here
Email: tian.570@osu.edu