SecureEmail : Feature Extraction for Spam Email Detection 🎫

The proposed system of the project will effectively detect the spam mails and the system will extract the spam mails by using some machine learning algorithms and it gives the result with greater accuracy and with good performance. It will save the user's time and it destroys the risk of spam mails.

📋 Project Description

Emails are the popular and preferred way of writing communication in our everyday life. The problem with emails is spam. Over the past decade, unsolicited bulk emails have become a major problem for email users. A huge amount of spam flows into users' mailboxes every day.

The increasing amount of spam emails day by day is causing many important emails to be lost in the sea of junk mail. To reduce this issue, we are implementing ways in which spam email can be differentiated from important emails.

By doing this we can reduce the time spent to look for an important email which in turn reduces the hassle associated with the process. The results we are expecting are to perform filtering in the most accurate way to differentiate the spam emails from the ham.

🗃️ Project Feature

The main feature of our project is to determine if a received email is spam or ham. This feature will be very useful for students or working professionals who have to deal with emails every day. This project also aims in preventing phishing attempts by filtering the spam from ham emails.

A. Pre-processing

Removal of Special Characters
Removal of Numbers
Lowercase Conversion
Tokenization
Removal of Stop words
Stemming

B. Feature Extraction

Bag of words
Tf-Idf

C. Classification

Naive Bayes Algorithm (in C++ also)
Random Forest Classifier
Support Vector Machine
MLP Classifier

📊 Dataset Preparation

Note: The datasets that are created in our project has been uploaded here : Datasets

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.vscode		.vscode
OleanderStemmingLibrary-master		OleanderStemmingLibrary-master
01 - Final Dataset.csv		01 - Final Dataset.csv
02 - Preprocessing.cpp		02 - Preprocessing.cpp
03 - TotalWords.cpp		03 - TotalWords.cpp
04 - Processed data.txt		04 - Processed data.txt
05 - Preprocessing + stemming.ipynb		05 - Preprocessing + stemming.ipynb
06 - Updated.csv		06 - Updated.csv
07 - unique words.txt		07 - unique words.txt
08 - BagOfWords.cpp		08 - BagOfWords.cpp
09 - TFIDF.cpp		09 - TFIDF.cpp
10 - TF IDF and Naives Bayes.ipynb		10 - TF IDF and Naives Bayes.ipynb
11 - Main.cpp		11 - Main.cpp
12 - Mathimp.cpp		12 - Mathimp.cpp
13- preprocessing.cpp		13- preprocessing.cpp
14 - NaiveBayes.cpp		14 - NaiveBayes.cpp
15 - FinalMain.cpp		15 - FinalMain.cpp
16 - sample.cpp		16 - sample.cpp
17 - sample bag.txt		17 - sample bag.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SecureEmail : Feature Extraction for Spam Email Detection 🎫

📋 Project Description

🗃️ Project Feature

📊 Dataset Preparation

About

Releases

Packages

Languages

License

sanidhyajadaun/SecureEmail

Folders and files

Latest commit

History

Repository files navigation

SecureEmail : Feature Extraction for Spam Email Detection 🎫

📋 Project Description

🗃️ Project Feature

📊 Dataset Preparation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages