Skip to content

Document classification into four defined categories (World, Sports, Business, Sci/Tech). Text Pre-processing using NLTK. Trained with different models ranging from Naïve Bayes to Convolutional Neural Network (CNN) and RCNN.

Notifications You must be signed in to change notification settings

saurabh1907/document-classification-ml-nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document Classification using NLP, Machine Learning

Objective

Performed document classification into four defined categories (World, Sports, Business, Sci/Tech). Trained the classifier accuracy with different models ranging from Naïve Bayes to Convolutional Neural Network (CNN) and RCNN and compared the accuracy. By making use of different feature engineering techniques and Natural Language Processing (NLP) features created an accurate text classifier.

Tech Stack

  • Language- Python
  • Libraries- Pandas, Numpy, Matplotlib, Scikit Learn, NLTK, Keras, TensorFlow backend
  • Models- Naive Bayes, Logistic Regression, Random Forest, XGBoost, Shallow Neural Network, Convolutional Neural Network, RCNN

Implementation

Open document_classifier.ipynb Jupyter file to go to the implementation details

The model can be downloaded from below link.

https://drive.google.com/drive/folders/10Ivt175DEkILxwHsF2Ltti8IZpVLtOyo?usp=sharing

The jupyter file also demonstrates loading and using the model for real-time predictions

About

Document classification into four defined categories (World, Sports, Business, Sci/Tech). Text Pre-processing using NLTK. Trained with different models ranging from Naïve Bayes to Convolutional Neural Network (CNN) and RCNN.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published