The Imbalance Problem in Classification

Final dissertation for the BSc in Statistics at Università degli Studi di Napoli Federico II.

Abstract

The document proposes several approaches to deal with Class Imbalance. Firstly it analyze the pros and cons of the Pre-processing methods, that they can be divided in:
• Undersampling methods
• Oversampling methods
• Hybrid methods

Secondly it analyze the Cost sensitive solutions, that are characterized by modifying existing algorithm (i.e Decision Tree, SVM, Ensemble Methods) in order to change the weights of each class.

Finally, there is an application of all these methods to two dataset. The first dataset is about Churn and they are adopted all the pre-processing methods and tested with a SVM classifier, while the second one is about Spam and instead they are adopted all cost sensitive methods.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
Code		Code
Tesi_GabrieleCola		Tesi_GabrieleCola
Dissertation.pdf		Dissertation.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Imbalance Problem in Classification

Abstract

About

Releases

Packages

Languages

gabrielecola/Imbalance_classification_problem

Folders and files

Latest commit

History

Repository files navigation

The Imbalance Problem in Classification

Abstract

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages