Arvato is an internationally active services company that develops and implements innovative solutions for business customers from around the world. These include SCM solutions, financial services and IT services, which are continuously developed with a focus on innovations in automation as well as data and analytics.
In this project, the demographic data of the German population and the customer data have been analyzed to perform Customer Segmentation and Customer Acquisition. The goal of this project was to characterize the customer segment of the population, and to build a model that will be able to predict customers for Arvato Financial Solutions
The Project is divided in the following Sections:
-
Customer Segmentation Report: In this section, a thorough data analysis and feature engineering steps are performed to prepare the data for further steps. A Principal Component Analysis (PCA) is performed for dimensionality reduction. Then K-Means Clustering is performed on the PCA components to cluster the general population and the customer population into different segments. These clusters are studied to determine what features make a customer with the help of cluster weights and component weights.
-
Supervised Learning Model: In this section, the customers data with defined targets indicating the past responses of the customers has been used to train Supervised Learning algorithms. The trained model is used to make predictions on unseen test data to determine whether a person could be a possible customer.
The data is provided by Bertelsmann Arvato Analytics.
AZDIAS
— Demographics data for the general population of Germany; 891 211 persons (rows) x 366 features (columns).CUSTOMERS
— Demographics data for customers of a mail-order company; 191 652 persons (rows) x 369 features (columns).TRAIN
— Demographics data for individuals who were targets of a marketing campaign; 42 982 persons (rows) x 367 (columns).TEST
— Demographics data for individuals who were targets of a marketing campaign; 42 833 persons (rows) x 366 (columns).
Additionally, there were 2 more files for describing attributes:
DIAS Attributes Values
— Explains values encoding.DIAS Information Attributes
— Explains column names meanings.
The project has covered various technical steps which include:
- Exploratory Data Analysis
- Data Preprocessing (Feature Engineering)
- Dimensionality Reduction (PCA)
- K-Means Clustering
- Supervised Learning
- Model Evaluation
- Predictions on Test data
The main findings of the code can be found at the post available at medium
Must give credit to Arvato Bertelsmann and Udacity for providing the data.