Skip to content

AyushSingh13/nb-spam-classification

Repository files navigation

CS475 Project - Spam Classification

Nitin Kumar, Avi Mahajan, Archan Patel, Ayush Singh, Navjyoth Thakur

Abstract

Our project will deal with the issue of spam messages. Spam text messages have become very common, so our purpose is to filter them out for the benefit of the user. Our implementation will use classification to determine whether or not a given message is “spam” or “not spam”. We will try using classification through regression, SVM, (TBD on neural networks). We will show how these algorithms are used in context of NLP and spam message filtering, discuss what parameters exist and how to optimize them, and then at the end compare the optimized models’ prediction accuracies and show which model is best. Using our knowledge from class, we will discuss our observations and why they make sense (or why they don’t).

Possible Methods

  • K-nearest neighbors
  • Logistic Regression
  • SVM

Timeline:

  • Read reading list material (4/19/2017)
  • Choose implementation (4/20/2017)
  • Implement implementation (copying a strong online one and modifying to fit our data) (4/28/2017)
  • Write analysis of implementation (5/3/2017)

Suggested Reading:

  • Sebastiani, Fabrizio. "Machine learning in automated text categorization." ACM computing surveys (CSUR) 34.1 (2002): 1-47.
  • Guzella, Thiago S., and Walmir M. Caminhas. "A review of machine learning approaches to spam filtering." Expert Systems with Applications 36.7 (2009): 10206-10222.
  • Blanzieri, Enrico, and Anton Bryl. "A survey of learning-based techniques of email spam filtering." Artificial Intelligence Review 29.1 (2008): 63-92.

Links

https://web.stanford.edu/class/cs124/lec/naivebayes.pdf

Extremely short summary of Naive Bayes: https://stats.stackexchange.com/questions/91177/machine-learning-techniques-for-spam-detection-and-in-general-for-text-classifi

Dataset: https://www.kaggle.com/uciml/sms-spam-collection-dataset

Tensorflow Email Phishing https://jrmeyer.github.io/tutorial/2016/02/01/TensorFlow-Tutorial.html

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •