A Spam Filter built for VIT-Pugaar.
Pugaar, is an Android and Web application dedicated for lodging complaints and petty issues related to the maintenance of VIT University, Vellore Men's Hostel. Our goal is to eradicate all the handwritten complaints which takes days to process, by automating the whole process. Once a complaint is into the system, an employee is directly informed using SMS regarding the issue and the details of the one who made the complaint.
Android Application by:
Web Application by :
- @MINOSai: made it with Vue.js
- @bhaveshpraveen: made the REST API using Django.
- @greed2411: made the Spam filter with custom dataset and scikit-learn.
ASF by heart removes spam, i.e., selectively passes complaints dedicated for VIT University Men's Hostel.
It passes complaints which fall under the category of Electrical, Toiletries, Room and Air Conditioners.
Examples include door knob broken
, smoke alarm is not working
, bathroom tap is loose
, AC not cool enough
Dataset was handtyped out of the past few years records of complaints under J block.
As of the algorithm, MultinomialNB is being used here.
Naive Bayes, the most core feature of it is the independence. Bayes thoerem is based on independence of the events, that means here presence of one word shouldn't affect the presence of another word. This is advantageous and disadvantageous(the reason it's being called Naive) based on how people manipulate their words. For a basic hostel complaint, a ML algorithm such as NB is enough. We could have gone with DenseNets if we had more data. But it's a shame we only had 1500 records.
Testing accuracy of Naive Bayes model is 99.436 %
on 25% of the actual data. This shows how good is the feature engineering and how good the model is at generalzing.
Model | Spam | Complaint |
---|---|---|
Spam | 190 | 0 |
Complaint | 2 | 163 |
- Spam and Spam intersection talks about the
True Negatives
- 190, - Spam and Complaint intersection talks about the
False Positives
- 0, (The spam which got classfied as complaint) - Complaint and Spam intersection talks about the
False negatives
- 2, (The complaints which got classified as spam) - Complaint and Complaint intersection talks about the
True Positives
- 163.
The model was dumped into a pickle file using joblib to spam_data_pugaar.pkl and Pipeline class.
- pandas - For reading the data and manipulation.
- scikit-learn - For preprocessing, feature engineering and run MultinomialNB
Get the repository on your local machine,
git clone https://github.com/greed2411/ASF.git
If the model sees the statement as a complaint
it returns True
else, if it is a spam
it returns False
.
>>> from asf import check
>>> check('tubelight broken.')
True # satisfies as a complaint
>>> check('floor mopping')
True
>>> check('send me nudes xD')
False # doesn't satisfy as a complaint
>>> check('Bahen ke laude')
False
>>> check("ΰ€°ΰ€ΰ€‘ΰ₯")
False
>>> check(None)
True
Ubuntu 16.10 , Python 3.6.0 |Anaconda 4.3.1 (64-bit) and scikit-learn version : 0.18.1