Email_Spam_Classifier

This is a project that was done for the Skill4U machine Learning program. This project is a Spam email classifier using machine learning. This model uses Gaussian NB algorithm to train the model.

Problem Statement

Spamming is one of the major and common attacks that accumulate a large number of compromised machines by sending unwanted messages, viruses, and phishing through email. We have chosen this project because now there are many people who are trying to fool you just by sending you fake e-mails

In recent figures, 40% of all mail is spam that emails about 15.4 billion emails per day and costs Internet users about $ 355 million per year. Automatic e-mail filtering is the most effective way to deal with spam at the moment

Proposed Solution

The proposed solution for this problem is to use Gaussian Naïve Bayes classifier, we have two classes to classify in either spam or ham emails. GaussianNB assumes that the data from each label is drawn from a simple Gaussian distribution. The Scikit-learn Library helps us to implement the Gaussian Naïve Bayes algorithm for classification.

Execution Plan

We have proposed the following technique in order to classify emails

Dataset

The Dataset used to train our model was taken from Kaggle. https://www.kaggle.com/datasets/nitishabharathi/email-spam-dataset

This dataset contains 3 csv files each file contains 2 columns.
The first column is the body of the email
The second column contains our labels 0 for Not Spam 1 for Spam
Total values of the dataset of all 3 files is 18650

How the data was cleaned

We cleaned the data using NLTK library for python and vanilla python functions.

We balanced our dataset
Combined our 3 csv files into 1 dataset
Removed links from the dataset body column
Removed unnecessary symbols from our body column
Changed all the text into lower case
Performed word Tokenization
Used Lemmatization to remove different forms of the same words
Removed Stop words from our data
Vectorized our data By bag of words method

Algorithm

Algorithm comparison graph	Details
	We are using Gaussian NB algorithm for classification. We tested out different classification algorithms and GaussianNB was giving the best results on the test data

Metrics

ROC Curve	Model Evaluation
	After training and finding the best parameters we were able to get `90.07 %` accuracy on our Test data

Confusion Matrix	Classification Report

Target Audience

About 14.5 billion spam email messages are circulated daily. That is almost 45 percent of the regular email traffic in the world. Internet Service Providers (ISPs) use spam filters to ensure they do not deliver corrupt incoming emails or links to the receiver.

Demonstration

On the left you can see how this model works. You can also try it out by scanning the QR code down below

Demo	Scan to see yourself

Advantages

No more Spam	Benefits
	- It is very effective and is also adaptive, so hard to fool. Based on text classification methods. Phenomenally accurate. Learns new spammer tactics automatically. Adapt to changing spam. It protects you

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dataset		dataset
Email_Spam_Classifier_With_NLTK.ipynb		Email_Spam_Classifier_With_NLTK.ipynb
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
spam_classifier_with_NLTK.joblib		spam_classifier_with_NLTK.joblib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Email_Spam_Classifier

Problem Statement

Proposed Solution

Execution Plan

Dataset

How the data was cleaned

Algorithm

Metrics

Target Audience

Demonstration

Advantages

About

Releases

Packages

Languages

License

Hamas-ur-Rehman/Email_Spam_Classifier

Folders and files

Latest commit

History

Repository files navigation

Email_Spam_Classifier

Problem Statement

Proposed Solution

Execution Plan

Dataset

How the data was cleaned

Algorithm

Metrics

Target Audience

Demonstration

Advantages

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages