Skip to content

Latest commit

 

History

History
35 lines (24 loc) · 1.33 KB

README.md

File metadata and controls

35 lines (24 loc) · 1.33 KB

Deep Active Learning

Build Status License

This respository will contain the code for the master's thesis "Deep Active Learning for Sentiment Analysis". It will store not only the Deep Learning models created, but also both experiments and dataset analysis code.

Download data

To downlod the data used in this project, run the following command:

$ ./scripts/download_data.sh

Preprocessing the dataset

In order to apply preprocessing to the dataset, run the following command:

$ ./scripts/run_dataset_preprocessing.sh

This script will perform the following steps:

  • Read both positive and negative reviews from the train directory of the dataset
  • Format the string for both positive and negative reviews (i.e. remove HTML tags)
  • Create a validation set
  • Create vocabulary using the GloVe embeddings
  • Turn the reviews into a list of ids (Every id represent the row associated with the word in the GloVe matrix)
  • Save the reviews into TFRecord format

This steps are also applied for the test data, but without the validation set part.