Binary Classification (Machine Failure Prediction) and Recall Optimization in unbalanced datasets.
Comparing scenarios and algorithms to check performance.
XGBoost, L1 Regularization, Johnson Transformation, QQplots, Statistical Distributions
Improved recall from 47.26% to 95.21% using Synthetic Minority Oversampling Technique and Extreme Gradient Boosting (XGBoost)
The low 47.26% recall was obtained using Johnson Transformations to the features and L1 regularization with alpha=0.07
Recall improvement was needed.
The Synthetic Minority Oversampling Technique (SMOTE) was necessary because of an imbalanced (98:2) dataset:
With feature engineering, removing multicollinearity, and encoding categorial variables, the model using Synthetic Minority Oversampling Technique and Extreme Gradient Boosting (XGBoost) was able to improve recall in an imbalanced datasets obtaining:
Accuracy: 0.9985
Precision: 0.9542
Recall: 0.9521
F1 Score: 0.9531
formulas taken from: https://www.tutorialexample.com/an-introduction-to-accuracy-precision-recall-f1-score-in-machine-learning-machine-learning-tutorial/