Text-Classification-20-Newsgroups/README.md at master · tanishq9/Text-Classification-20-Newsgroups · GitHub

Text-Classification for 20-Newsgroups

• The dataset is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups.

• Builded vocabulary from the dataset which was used as a feature set.

• Implemented Multinomial Naive Bayes classifier from scratch for classifying news into appropriate group.

Dataset : http://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups

Results :

• Naive Bayes from scratch : 0.8474

• SKlearn Naive Bayes : 0.8476