This project aims to analyze data for loans through 2007-2015 from Lending Club available on Kaggle. Dataset contains over 887K observations and 74 variables among which one is describing the loan status.
Conducted regression analysis to predict loan interest rates based on initial borrower characteristics applying L1, L2 Regularization and Dimension Reduction techniques.
Applied feature selection to identify features pertaining to applicants likely to default on their loan and extended the analysis to determine which loan category was most likely to default.
Performed Cross-Validated Repeated Undersampling to fix class imbalance and implemented Logistic Regression, LDA and Random Forest models to identify charge-off loans, obtaining an accuracy of 95% on test data.