Requirements
- pytorch == 1.8.1
- transformers == 4.2.2
- pytorch-pretrained-bert==0.6.2
- sklearn == 0.21
- scikit-learn==0.23.2
- scipy==1.5.4
- imblearn==0.0
- numpy == 1.19.5
- pandas == 1.1.5
Data CSV Format
| DialogueId | UtteranceId | Speaker | Utterance | Persona? | Persona Type |
Training and Evaluation
- Download and place 200d GloVe word vectors in 'data/'folder
- Prepare data in the given format and place it in 'data/' folder.
- Execution
- Persona Discovery
- python dataloader.py. It prepares and dumps the dataframes in a format which can be used by models and dataloaders to train and evaluate.
- python model.py. Trains the proposed model on the prepared data and dumps the best models in 'models/'
- python getRepresent.py. Extracts the representation for each instance so that we can perform oversampling.
- python sample.py. Use SMOTE to sample theextracted representations.
- python model-classify.py. Train the classification layers using the sampled data and dumps thebest models in 'models/'.
- Persona Type Identification
- python dataloader-model2.py.py. It prepares and dumps the dataframes in a format which can be used by models and dataloaders to train and evaluate.
- Set appropriate parameters in init_parameter.py
- sh scripts/run.sh. Trains the proposed model on the prepared data and dumps the best models in 'models/'
- Persona Value Generation
- python main.py. Trains the proposed model.
- Persona Discovery
- python dataloader.py. It prepares and dumps the dataframes in a format which can be used by models and dataloaders to train and evaluate.