Credit_Risk_Analysis

Data preparation, Statistical reasoning, Machine Learning

Overview of the analysis

Using the credit card credit dataset from LendingClub, a peer-to-peer lending services company. Credit risk is an inherently unbalanced classification problem, as good loans easily outnumber risky loans. Therefore, we need to employ different techniques to train and evaluate models with unbalanced classes, such as oversample the data using the RandomOverSampler and SMOTE algorithms, and undersample the data using the ClusterCentroids algorithm. Then, use a combinatorial approach of over and undersampling using the SMOTEENN algorithm. Next, compare two new machine learning models that reduce bias, BalancedRandomForestClassifier and EasyEnsembleClassifier, to predict credit risk. Once the analysis are performed, evaluate the performance of these models and make a written recommendation on whether they should be used to predict credit risk.

Results

Oversample algorithms:

Naive RandomOverSampler Result

SMOTE Oversampler Result

Undersample

ClusterCentroids result

^ back to top ^

Combination (Over and Under) Sampling

SMOTEENN algorithm

Ensemble Learners Algorithms:

Balanced Random Forest Classifier

Easy Ensemble AdaBoost Classifier

^ back to top ^

Summary

All the models used to perform the credit risk analysis show weak precision in determining if a credit risk is high. The Ensemble models brought a lot more improvment specially on the sensitivity of the high risk credits. While Adaboost algorithm shows a recall(sensitivity) of 92% so it detects almost all high risk credit, and BalancedRandomForest algorithm did not show any significant improvement (with significance 70%) from linear regression with oversampliong or downsampler or both. On another hand, all models with a low precision to detetct high risk loan, therefore a lot of low risk credits will still be falsely detected as high risk which would penalize the bank's credit strategy. For those reasons, these models are not sufficient to predict credit risk for commericial application.

Resources

Data: LoanStats_2019Q1.csv from LendingClub
Software and tools: jupyter notebook; pandas, numpy, scikit-learn, imbalanced-learn libraries

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.DS_Store		.DS_Store
README.md		README.md
credit_risk_ensemble.ipynb		credit_risk_ensemble.ipynb
credit_risk_resampling.ipynb		credit_risk_resampling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit_Risk_Analysis

Overview of the analysis

Results

Oversample algorithms:

Undersample

Combination (Over and Under) Sampling

Ensemble Learners Algorithms:

Summary

Resources

About

Releases

Packages

Languages

Rutgers-Data-Science-Bootcamp/Credit_Risk_Analysis

Folders and files

Latest commit

History

Repository files navigation

Credit_Risk_Analysis

Overview of the analysis

Results

Oversample algorithms:

Undersample

Combination (Over and Under) Sampling

Ensemble Learners Algorithms:

Summary

Resources

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages