The goal is to help a client determine whether they should invest in p2p loans, and how to go about doing this by devising investment strategies with the help of machine learning models. We will analyze, explore and process the data, develop and evaluate various classification and regression models to provide strategies for high returns with low risk for investors.
Lending Club is an online lending marketplace for peer-to-peer loans. Peer to peer lending is the practice of lending loans from investor to borrowers without any involvement of banks or other Non-banking financial company. The reason customers opt for such a platform is that investors get higher returns on the amount funded and borrowers can easily secure personal loans hassle free and at low operational cost.
It allows borrowers to get unsecured loans in the range of $1000 to $40,000 in a timely manner. The loan period usually lasts 3 years. The easy-to-use mobile application lets investors select from a variety of loan listings along with the information provided about the borrower including loan amount, grade and purpose. Through this, investors reap profit from the interest gained from these loans.
The information available for each loan consists of all the details of the loans at the time of their issuance as well as more information relative to the latest status of loan such as how much principal has been paid so far, how much interest, if the loan was fully paid or defaulted, or if the borrower is late on payments etc.
Loan data – The level of data is at loan ID level. If a person borrowed a loan from LC, then a row of data along with the loan characteristics and borrower characteristics are present in the data.
• On first impressions we see that the classes are imbalanced. We have 13.8% of charged off loans compared to 86.2% of fully paid loans.
• The loans are divided into grades (A-G) with the best interest rate being offered on the A category of the loan and the worst being offered at G.
• We find that the majority of loans are given for the purpose of debt consolidation which we know is a common occurrence because majority of people have different kind of loans that they are paying such as car, home, credit card etc so to have them under one bracket would be highly beneficial for the ease of paying back multiple loans.
• We also noticed that the majority of features have missing values. Such features can be removed in further prediction tasks as they would not be beneficial to the model
• We also detected some outliers and removed them such that not much of the data is lost
We have taken a few different approaches to design the best investment strategies for investors.
-
Classification models to predict default risk
• Decision Tree • GLM (Classification) • Random Forest Classification • GBM
Best Model -GBM with highest specificity
-
Regression models to predict annual returns
• GLM • Random Forest Regression
Best Model – Random Forest with lowest MSE
Investors are hesitant to invest in lower grade loans which can yield high returns with high risk. We can minimize the risk factor by predicting lower grade loans that are less likely to default while also giving high returns.