All the necessary libraries to run the code were already available in the Anaconda distribution of Python, except:
- scikit-optimize, which can be installed through
!pip install scikit-optimize
; - lightgbm, which can be installed through
!pip install lightgbm
.
This script was written using Python version 3.*.
This was the chosen project to be developed as the Capstone Project for the Udacity Data Scientist Nanodegree. I am passionate about data, and especially data related to people somehow.
Customer Segmentation is all about understanding people, how they behave, what they like, how they think, and so on. I really enjoy putting together the pieces of this puzzle.
Besides that, I felt like it would be a great challenge to work on real data, giving me the chance to overcome the problems that come up in the daily life of any data scientist.
- Notebook - Jupyter Notebook with the script containing the whole solution of the project.
- joblib files (
azdias
,customers
,train
, andtest
) - compacted versions of the datasets used in this project.
All the process as well as the results and findings were documented in this Medium post.
Credits must be given to the Arvato Bertelsmann company for providing the data, and also Udacity for proposing this amazing Capstone Project.