This is the code for out EMNLP 2020 paper: Linguistically-Informed Transformations (LIT): A Method forAutomatically Generating Contrast Sets
Coming soon
- We released the datasets used in our paper.
- Note:
orignal+basic
means (original [SNLI/MNLI] + basic [e.g.i;i
] augmentation). Nothing like+pa
is included, as that means compositional transformation.
- Note:
- We will also release the MRS parses with which people can transform sentences by their defined perturbation.
Note: This is not complete parallel datasets of the original SNLI and MNLI. There are some sentences missing because the parser sometimes can’t parse the representation. You might need to run your transformation on some missed out data.
conda env create -f environment.yml
conda activate lit
transfer
: module that contain all functions we mentioned in our paper. Within it, :
-
README.md
gives a detailed documentation of current config of our parser. -
transfer_example.py
is an illustrative example of how to use our parser. -
transfer_snli_parallel.py
is the script we used (some local modification needs to be made) to parse SNLI in parallel. Parallel processing is strongly encouraged
post-process
: after processing the dataset, you need some cleaning of the parsed dataset to put in the right form.
making_sense.py
contains choices of sentence selectors in scoring different generated surface sentencesprocess.py
contains functions that:- select generated sentences
- apply defined rules to generate contrast set