Grade: 18.3 out of 20
Abstract:
This study was done to develop a predictive model for the Government of Newland, a new habitable planet in the year 2048, in order to classify new citizens as one of two classes: “income higher than average” (1), or “income lower or equal to the average” (0). The main goal of this end-to-end machine learning project was to reach the highest possible accuracy on the predictions of a Test dataset, and to accomplish that we had to start by performing data exploration and pre-processing. At this stage, we did coherence checks, outlier identification, missing values imputation, feature engineering, and, finally, feature selection. We obtained satisfactory results on our baseline model, using several classifiers: decision trees, logistic regression, neural networks, support vector machines, some ensembles, among others. When trying to improve the original model performance, we developed three other approaches, by using over and under sampling, different scalers for the metric features, among others. Finally, the best result was obtained with a Gradient Boosting Classifier (ensemble model), with no evidence of overfitting, neither underfitting, and a f1-score micro of 0,8677 on the Training dataset, and 0,8640 on the Validation dataset.
Keywords: Machine Learning, Supervised Learning, Predictive Modelling, Binary Classification, Ensemble.
Group08 members:
Carolina Pina
Mariana Camarneiro
Matilde Pires
Rui Monteiro
Vasco Pestana
MSc: Data Science and Advanced Analytics - Nova IMS
Course: Machine Learning
2020/2021