Analyzing the content of an Email dataset which contains above 5000 email sample with labeled spam or not.We have built a model to classify given email Spam((junk email) or ham (good email) using Naive Bayes Classification algorithm with accuracy score of ~99 . #Naive Bayes Classifier Introduction Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of independence between every pair of features #Checking the distribution of data.
we can see some extreme outliers, we'll set a threshold for length of text (here threshold is 10000, I have not applied this threshold in algotithm implementaion) and plot the histogram again
Below are metrics about the results:
#Confusion Matrix
We achieved 98.836899942163114% accuracy(Mean) with 0.4% standard variance. We are in low bias and low variance region, below plot of the Learning curve.