Spam eMail Detection using Naive Bayes Classification Algorithm

In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong independence assumptions between the features. They are among the simplest Bayesian network models, but coupled with kernel density estimation, they can achieve higher accuracy levels.

Project Description

In this project, a model is trained with set of emails labelled as either from Spam or Non-Spam. There are 702 emails equally divided into spam and non spam category. Next, the model is tested on 260 emails. The model is tasked to predict the category of the emails and compare the accuracy with known correct classifications. There are two folders: test-mails and train-mails. Train-mails are to train the model. Test-mails are used to test the accuracy of the model. Each email's first line is the subject; the content starts from the third line.

Steps

Cleaning and Preparing Data
Building the Algorithms
Training and Predicting Results
Evaluation

Requirements

Python. Python is an interpreted, high-level and general-purpose programming language.

Google Colab. Google colab is a free online Integrated Data Environment.

Packages

Install the following packages in Python prior to running the code.

import os
import numpy as np
from collections import Counter
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from google.colab import drive
drive.mount('/content/drive')

After importing drive.mount('/content/drive'), follow instructions in the output to authorize access to Google Drive in order to obtain directories.

Launch

Download the data file provided and decompress it. Using Google Drive, create the following folder structure and upload the data here:

/content/drive/MyDrive/MSBA_Colab_2020/ML_Algorithms/CA02/Data

where /content/drive/MyDrive is the standard file path.

Known Bugs

Please download the .ipynb file and open it in Google Collab to correctly display the markup comments.

Authors

Silvia Ji - GitHub

License

This project is licensed under the MIT License.

Acknowledgements

The project template and dataset were provided by Arin Brahma at Loyola Marymount University.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Data.zip		Data.zip
Naive_Bayes_Model.ipynb		Naive_Bayes_Model.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spam eMail Detection using Naive Bayes Classification Algorithm

Project Description

Steps

Requirements

Packages

Launch

Known Bugs

Authors

License

Acknowledgements

About

Releases

Packages

Languages

jisilvia/Naive_Bayes_Spam_Mail_Detector

Folders and files

Latest commit

History

Repository files navigation

Spam eMail Detection using Naive Bayes Classification Algorithm

Project Description

Steps

Requirements

Packages

Launch

Known Bugs

Authors

License

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages