Skip to content

In this project data pre-processing is employed to handle a dataset that is peppered with problems, like missing values, explicit duplicates, implicit duplicates, and numerous categories.

Notifications You must be signed in to change notification settings

dwiputris/Data_Pre-processing_credit_scoring

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Data_Pre-processing_credit_scoring

In this project data pre-processing is employed to handle a dataset that is peppered with problems, like missing values, explicit duplicates, implicit duplicates, categorisation issues, etc.

With the correct approaches, each is issued handled that in the end enables analysis that bears conclusions and answers to the hypothesis.

First, missing values are replaced by median that is based on other related, observed factors.

Second, explicit duplicates are droped.

Third, implicit duplicates are handled by using all lowercase.

Fourth, data type is changed to suitable one.

Fifth, unreasonable and implausible values are replaced with the reasonable ones.

Lastly, the hypothesis is adddressed. Conclusions drawn are:

  1. There is no effect of the number of children had to the timeliness of repayment
  2. Civil partnership and unmarried marital status have a greater possibilities to default
  3. The greater the income level, the smaller the probability of default
  4. Loans for the purpose of buying cars and education has a greater probability to default.

About

In this project data pre-processing is employed to handle a dataset that is peppered with problems, like missing values, explicit duplicates, implicit duplicates, and numerous categories.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published