There's no need to install any libraries to run this code on the Anaconda environment. The code should run with no issues using Python versions 3.*.
This project's goal is to predict which individuals are most likely to convert into becoming customers for a company in Germany.
This is done based on an analysis of the demographics data for customers compared to demographics information for the general population of Germany.
Unsupervised learning techniques were used to perform customer segmentation and identify the parts of the population that best describe the core customer base of the company.
After that, a classification model was built to make predictions such as which individuals are most likely to respond to a marketing campaingn and become customers for the company.
The data used was provided by Bertelsmann Arvato Analytics, and represents a real-life data science task.
This project is composed of the following steps:
-
Data cleaning
Preparation of the data provided.
-
Customer Segmentation Report
Attribute analysis of established customers and the general population in order to create customers segments and be able to identify people of interest within the population.
-
Classification Model
The previous analysis will be used to predict what individuals will respond to the marketing campaing so that the company can focus on them instead of the entire population. PyCaret library will be used for this task!
Below are additional details about the project structure:
-
0_data_sampling.ipynb : notebook that takes in the full datasets and exports a sample version of them so it's a bit easier to work through all the other steps of the project.
-
1_data_preparation.ipynb : notebook containing all the data preparation steps.
-
2_cluster_prep.ipynb : notebook containing approaches testing for the the unsupervised learning model.
-
3_customer_segmentation.ipynb : notebook containing the unsupervised learning model for making customers segments.
-
4_classification_model.ipynb : notebook containing the classification model to predict which individuals to send the marketing campaing to.
-
/data : contains all the data files used on this project:
-
both .csv full dataset files used on the0_data_sampling
notebook to make a sample out of them.these files were deleted from the repo as they're too large.
-
both .csv files (
sample_AZDIAS
andsample_CUSTOMERS
) used on the1_data_preparation
notebook -
both .csv mail-out files (
Udacity_MAILOUT_052018_TRAIN.csv
andUdacity_MAILOUT_052018_TEST.csv
) used for training and testing of the supervised model in the4_classification_model
notebook. -
all four .csv files (
clean_AZDIAS
,clean_CUSTOMERS
,clean_TRAIN
andclean_TEST
) containing clean and prepped data exported at the end of the1_data_preparation
notebook notebook. -
/data_description : contains two Excel spreadsheets that holds detailed information about the attributes of the datasets.
-
/predictions : contains
pred
csv file which is the clean test dataset with predictions made by the model.
-
Each notebook holds one step of the project. They were developed with markdown cells in such a way that it's easy to follow and the conclusions are drawn as it goes.
Also, a blog post of the finding is available here.
Arvato Financial Solutions for providing the data.
Udacity as this project was developed during the Data Science Nanodegree Program.