Number of data points: 42,538
Attributes: 115
Task is to predict views of TED talk videos => Regression problem
- Accuracy
- Confusion Matrix
- Prcision , Recall & F1 Score
- Unecessary variables which contains zip , id or only single category has been dropped
- Features like Grade, Interst rate in %, sub_grade which holds higher information are been handled with Ordinal encoding.
4.1 f_regression to get feature importance, Dropped features with higher P-value with threshold > 0.3
So after loading the data we started with EDA process to understand the data through diffrent types of Univariate,Bivariate & Multivaiate tools and also handles outliers and NaN values. In feature engineering we have created some of the features as well as removed some unwanted features which added less value . Features such as Grade, Sub_Grade, Intreset rate, term etc etc played major roles to understand wheather the person is defaulter or not. Finally we have compared all the models w.r.t.o their Acc, Confusion Matrix , Precison & Recall and all the models have performed better in this case.