This project aims to analyze data for loans through 2007-2015 from Lending Club available on Kaggle. Dataset contains over 887K observations and 74 variables among which one is describing the loan status.
-
Conducted regression analysis to predict loan interest rates based on initial borrower characteristics applying L1, L2 Regularization and Dimension Reduction techniques.
-
Applied feature selection to identify features pertaining to applicants likely to default on their loan and extended the analysis to determine which loan category was most likely to default.
-
Performed Cross-Validated Repeated Undersampling to fix class imbalance and implemented Logistic Regression, LDA and Random Forest models to identify charge-off loans, obtaining an accuracy of 95% on test data.