Final dissertation for the BSc in Statistics at Università degli Studi di Napoli Federico II.
The document proposes several approaches to deal with Class Imbalance. Firstly it analyze the pros and cons of the Pre-processing methods, that they can be divided in:
• Undersampling methods
• Oversampling methods
• Hybrid methods
Secondly it analyze the Cost sensitive solutions, that are characterized by modifying existing algorithm (i.e Decision Tree, SVM, Ensemble Methods) in order to change the weights of each class.
Finally, there is an application of all these methods to two dataset. The first dataset is about Churn and they are adopted all the pre-processing methods and tested with a SVM classifier, while the second one is about Spam and instead they are adopted all cost sensitive methods.