Multinomial naive Bayes Spam massages Identifier

Identifying and distinguishing spam massages using the multinomial Naïve Bayes model.

what is Naive Bayes classifier

In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features. At the time of writing this repository, there are 5 different types of Naive Bayes classifiers, which as follow:

1- Bernoulli Naive Bayes classifier

2- Categorical Naive Bayes classifier

3- Complement Naive Bayes classifier

4- Gaussian Naive Bayes classifier

5- multinomial Naive Bayes classifier

In this repository, we have used the multinomial Naive Bayes classifier to detect spam messages, the reason for using this classifier is the simple implementation, high accuracy, and vector implementation method of this model. It should be noted that other methods can also be used to detect spam messages, such as the Complement Naive Bayes classifier and Tf-Idf.

Let's learn more about the Multinomial naive Bayes classifier

MultinomialNB implements the naive Bayes algorithm for multinomially distributed data, and is one of the two classic naive Bayes variants used in text classification (where the data are typically represented as word vector counts, although tf-idf vectors are also known to work well in practice). The distribution is parametrized by vectors θ y = ( θ y 1 , … , θ y n ) for each class y where n is the number of features (in text classification, the size of the vocabulary) and θ y i is the probability P ( x i ∣ y ) of feature i appearing in a sample belonging to class y

The parameters θ y is estimated by a smoothed version of maximum likelihood, i.e. relative frequency counting:

θ ^ y i = N y i + α / N y + α n

where N y i = ∑ x ∈ T x i is the number of times feature i appears in a sample of class y in the training set T and N y = ∑ i = 1 n N y i is the total count of all features for class y

Used database

I used the smsSpamCollection dataset to train my model, which can be accessed via the link below: https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection

Reviewing the results of the outputs of our trained model

The accuracy of our Naïve Bayes multinomial model is 99.01345291479821 % The Precision of our Naïve Bayes multinomial model is 97.88732394366197 % The Recall of our Naïve Bayes multinomial model is 94.5578231292517 %

We can use the confusion matrix to observe the performance of our model:

Steps

More information is available in the Jupyter Notebook file

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
Multinomial_nb_Spam_Identifier.ipynb		Multinomial_nb_Spam_Identifier.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multinomial naive Bayes Spam massages Identifier

what is Naive Bayes classifier

Let's learn more about the Multinomial naive Bayes classifier

Used database

Reviewing the results of the outputs of our trained model

Steps

About

Releases

Packages

Languages

License

mohammadnabia/Multinomial-nb-Spam-Identifier

Folders and files

Latest commit

History

Repository files navigation

Multinomial naive Bayes Spam massages Identifier

what is Naive Bayes classifier

Let's learn more about the Multinomial naive Bayes classifier

Used database

Reviewing the results of the outputs of our trained model

Steps

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages