Elbow method verifies the number of clusters should be 2
This is a binary text classification problem where we predict the sentiment of movie reviews as either positive or negative. The classes are balanced.
It is trained on features aggregated from character-TFIDF and word-TFIDF. Character-TFIDF has been used to account for misspellings.
The XGBoost model minimizes a custom binary logistic objective and uses accuracy score as the evaluation metric.
The training phase includes validating the model to find the optimal number of boosting rounds with early stopping and sets the classification threshold to maximize the accuracy score on a validation set.
These are pre-trained large language models that are fine-tuned by placing a classifier head on top.
This is an ensemble of XGBoost, BERT and RoBERTa based on majority voting.
cd Sentiment-Analysis-IMDB
conda env create -f environment.yml
conda activate sentiment-analysis
pip install -e src/sentiment-analysis
Including the optional -e flag will install sentiment-analysis in "editable" mode, meaning that instead of copying the files into your virtual environment, a symlink will be created to the files where they are.
python -m sentiment_analysis fetch
python -m nltk.downloader all
jupyter notebook notebooks/
You can now use the jupyter kernel to run notebooks.