This GitHub repository hosts the Capstone project I have developed and completed as part of the Udacity Machine Learning Engineer Nanodegree.
In this project, I have worked on 4 demographics datasets provided by Arvato Financial Services, with the intermediary goal of extracting similarities/differences between the general population and the current customer base of a German-based company, in order to predict which individuals are more likely to become new customers (individuals who could then be targeted by the mail-order company campaign).
This project employs both unsupervised (PCA and Dimensionality Reduction, k-Means Clustering and Customer Segmentation) and supervised (from scikit-learn...) machine learning algorithms and techniques.
You can access the mentor review I have received for my Proposal submission (see folder), here.
Alternatively, you can access a pdf version in this repo, here.
You can access the mentor review I have received for my Project submission, here.
Alternatively, you can access a pdf version in this repo, here.
-
Project_Notebook
-
Proposal
-
Report
-
metadata (Information xlsx files given prior to project)
-
README.md
-
attributes.csv: csv file created to assist the data preprocessing step
-
kaggle.csv: csv file created for Kaggle Competition submission
The Jupyter Notebook is written in Python (3.x. version required).
This project requires you to install the listed libraries in the requirement.txt file and Anaconda distribution Python 3.6
The main packages used are:
numpy
: scientific computing tools
pandas
: data structures and data analysis tools
matplotlib
: data visualisation tools
seaborn
: data visualisation tools
scikit-learn
(sklearn): Machine Learning library in Python
You can have a look at the Leaderboard of the Kaggle Competition (as of now, I stand, with my first and only submission kaggle.csv file, as 174th out of 270 participants).
The AUC of the ROC Curve score I obtained with my only submission is: 0.74612
I would like to thank everyone involved in presenting me this Capstone Project, both at Udacity (for their support) and at Arvato Financial Services (for letting us access their private data).
Have a look at the Machine Learning Engineer Nanodegree, offered by Udacity (via the School of AI), here.
The syllabus for this online programme can be found here.
-
Dilay Fidan Ercelik
-
Contact: here
-
Email: dilay.ercelik.19@ucl.ac.uk
-