Skip to content

Latest commit

 

History

History
16 lines (10 loc) · 525 Bytes

File metadata and controls

16 lines (10 loc) · 525 Bytes

Text-Classification for 20-Newsgroups

• The dataset is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups.

• Builded vocabulary from the dataset which was used as a feature set.

• Implemented Multinomial Naive Bayes classifier from scratch for classifying news into appropriate group.

Results :

• Naive Bayes from scratch : 0.8474

• SKlearn Naive Bayes : 0.8476