Problem Statement: To find the patterns, variables or driving factors in the data that will help avoid denying the loan to the applicants capable of repaying it.
A little about Data: We have two types of data for analysis:
- Current applicants - with details like what is their application status, loan credit amount, income, living situation, family status, assets they own etc.
- Previous application data of the applicant.
- The current applicant file has a variable Target, which takes values 0,1. 0 stands for people who have paid the loan and 1 for those you have difficulty is paying.
Analysis Steps:
- All the applicants who have difficulty in paying the loan, have been grouped together in the dataframe called as defaulters and other have been grouped in dataframe called as repayers.
- I am doing the further analysis of the dataset by comparing other variables of the dataset with these features.
- Similarly for previous applications data we have 4 different categories based on the status of the loan – approved, cancelled, refused, unused offer.
- For the purpose of analysis, I have used the data with Approved and Refused status and compared the other variables w.r.t to these two categories
- The previous application file also has the details of current application ID – which helps us understand how many times the user with the current loan application has previously applied for the loan and what was his status.
- I have further segmented the data on this basis and tried to verify the findings derived from EDA.
- This has helped to understand what could be the driving factors giving the capable applicant the required loan.
Insights:
Below are some important relationship found during the EDA. Please find the more details in the attached code.
Conclusion:
- The current and previous application data is imbalanced. Distribution of its variables is skewed.
- We were able to confirm few observations from the EDA.
- We can conclude that person’s age, car and flat ownership, population density of the area in which they reside, overall living situation assessment, documents submitted - these variables could be considered as driving factors while identifying the applicants capable of repaying the loan.
- Also in certain cases we have observed that outliers where the applicant has re-payed the loan. We can perform further analysis on these outliers to understand what could be driving factors for approving the loan.
- The top 10 correlation matrix for both the datasets has further confirmed the EDA.
- It would have been useful if the data for variables like term of loan, purpose of the loan, down payment rate, down payment amount could be tracked for current applicants as well.