Aim: Predict the probability of a potential sale on a website analytics data set.
- First, I performed some simple exploratory data anlysis (EDA) tasks: descriptive statistics using
describe()
; and obtaining the number of unique values in the columns usingnunique
.- This helped me find some outliers in the data, which I removed.
- Secondly, I performed visual EDA tasks, e.g., a correlation heatmap, that helped me gain 5 key insights into the nature of the given data set.
- Thirdly, I preprocessed the data by performing:
- One-hot encoding on the categorical variables; and
- Missing value imputation by creating 'missing value' indicators.
- Fourthly, I performed feature engineering, more specifically, mutual information, to help extract the relative potential of the features as a predictor of the target, considered by itself.
- Fifthly, I created new features and performed scaling (though it was not required, ouch!)
- Lastly, I trained a Logistic regression model among 2 other models to achieve an ROC-AUC of 0.92.
S.D.G.