Identify toxicity in online comments.
Data for this project has been picked from Kaggle.
-
Download data from kaggle
-
Unzip the files and save .csv in 'data/' folder
- Run following command to load required libraries from requirements.txt file:
pip install -r requirements.txt
-
Download pretrained GLoVe embeddings (glove.840B.300d) from here or here and save to 'data/' folder.
-
Ensure file names specified in config.yaml is consistent with your training and embedding file names
-
Choose preferable settings from config.yaml before initiating traning:
-
load_pretrained_embeddings_from_disk has been defaulted to False, change to True if you want to avoid unpacking glove embeddings for each subsequent run
-
Update random_seed to maintain reproducibility of multiple experiments
-
run main.py
-