Skip to content

nikhilmehta011/IMDB_Reviews_Analysis

Repository files navigation

IMDB_Reviews_Analysis

Dataset -- https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews. Contains 50k reviews, 25k positive and 25k negative.

1. Analysis.ipynb -- Containing some basic pandas analysis, preprocessing, N-gram analysis and visualaisation.

2. Vader.ipynb -- Usually my first step when doing sentimental analysis. NLTK library inbuilt sentimental analyser with just few lines of codes. Performance however is not that good.

3. TFIDF.ipynb -- Combination of Tfidf + Logistic Regression is used in this notebook. After general pre-processing tfidf(using sklearn) is applied to the dataset and the generated featured are trained using logistic regression yielding final accuracy score of 0.9026 training and testing upon 90/10 ratio.

4. Naive_Bayes_Scratch.ipynb -- This notebook is based on Coursera's NLP Specialization. In this sentimental analysis is done using naive bayes algorithm written from scratch. Final accuracy score is 0.9154 which is slightly better than TFIDF. But the main advantage of this method is that it is instantaneously fast while training and testing.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published