Skip to content

Implemented Preprocessing steps, Feature Extraction techniques and Naive Bayes Classifier in C++. Moreover, we have also implemented all the steps using python for comparative analysis.

License

Notifications You must be signed in to change notification settings

prakharjadaun/Feature-Extraction-for-Spam-Email-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Feature Extraction for Spam Email Detection 📧

The proposed system of the project will effectively detect the spam mails and the system will extract the spam mails by using some machine learning algorithms and it gives the result with greater accuracy and with good performance. It will save the user's time and it destroys the risk of spam mails.

📋 Project Description

Emails are the popular and preferred way of writing communication in our everyday life. The problem with emails is spam. Over the past decade, unsolicited bulk emails have become a major problem for email users. A huge amount of spam flows into users' mailboxes every day.

The increasing amount of spam emails day by day is causing many important emails to be lost in the sea of junk mail. To reduce this issue, we are implementing ways in which spam email can be differentiated from important emails.

By doing this we can reduce the time spent to look for an important email which in turn reduces the hassle associated with the process. The results we are expecting are to perform filtering in the most accurate way to differentiate the spam emails from the ham.

🗃️ Project Feature

The main feature of our project is to determine if a received email is spam or ham. This feature will be very useful for students or working professionals who have to deal with emails every day. This project also aims in preventing phishing attempts by filtering the spam from ham emails.

A. Pre-processing

  • Removal of Special Characters
  • Removal of Numbers
  • Lowercase Conversion
  • Tokenization
  • Removal of Stop words
  • Stemming

B. Feature Extraction

  • Bag of words
  • Tf-Idf

C. Classification

  • Naive Bayes Algorithm (in C++ also)
  • Random Forest Classifier
  • Support Vector Machine
  • MLP Classifier

📊 Dataset Preparation

Note: The datasets that are created in our project has been uploaded here : Datasets

About

Implemented Preprocessing steps, Feature Extraction techniques and Naive Bayes Classifier in C++. Moreover, we have also implemented all the steps using python for comparative analysis.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •