This DS project is a result of the 8th FLAI Machine Learning Competition. The aim of the contest was to build a model that detects employee churn with the highest f1 score.
This model got 3rd place 🥉 in the Competition.
For a better understanding, this was divided in 4 parts: 3 notebooks containing the Data Analysis(EDA) and one showing the Machine Learning Algorithm. At the first stage (EDA), the goal was to understand the datasets.
- Exploratory Data Analysis - Part 1: EDA - Part 1
- Exploratory Data Analysis - Part 2: EDA - Part 2
- Exploratory Data Analysis - Part 3: EDA - Part 3
- ML: Algorithm
Following the EDA, a pre-processing was carried out. Thereafter, randomized searches using OPTUNA were carried out for a hyperparametrization. Finally, a voting classifier using the best models was the one employed to predict the churn. After that, the best threshold was found to reach a higher f1.
Things I learned in this competition:
- OPTUNA for hyperparameter optimization
- Clustering using K-means (I used k-means on my 9th submission but didn't get the expected result)
- How to get the best threshold of a precision recall curve to optimize f1