The sentiment classifier uses a Long Short-Term Memory (LSTM) network to process sequences of word indices and determine the sentiment of a review. It includes modules for data processing, model training, and evaluation.
Reviews from IMDB dataset
- Sentiment classification: Classifies reviews into positive or negative sentiment
- Data handling: Processes and loads movie review data
- Vocabulary building: Constructs and saves a vocabulary for mapping words to indices
- Model training: Trains an LSTM model for sentiment analysis
- Evaluation: Evaluates the model and plots training and validation losses and accuracies
config.py
: Contains configuration settings for the model.classifier.py
: Defines theSentimentClassifier
model.dataset.py
: Contains theMyDataset
class for handling data.utils.py
: Utility functions for data processing, vocabulary handling, and accuracy calculation.train.py
: Script to train the model.predict.py
: Script to make predictions on new reviews.saved/
: Directory for saving models and vocabulary.
- Clone the repository:
git clone https://github.com/yourusername/sentiment-classifier.git
- Navigate into the project directory:
cd sentiment-classifier
- Install the required dependencies:
pip install torch torchvision torchtext matplotlib
To train the model, run the following command:
python train.py
This will:
- Load training and testing data.
- Build or load the vocabulary.
- Train the model and save the best-performing model.
- Plot and save training and validation losses and accuracies.
To classify a review, use the predict.py
script. Provide the review text as a command-line argument:
python predict.py "This movie was absolutely fantastic!"
This will output the classification of the review as either Positive
or Negative
.
Configuration settings are located in config.py
. You can adjust parameters such as vocabulary size, embedding dimension, hidden size, and learning rate.
- Python 3.7+
- PyTorch
- TorchText
- Matplotlib