Skip to content
forked from nicpopovic/FREDo

Code and data for NAACL 2022 paper Few-Shot Document-Level Relation Extraction

Notifications You must be signed in to change notification settings

Xilihutu9/FREDo

 
 

Repository files navigation

Few-Shot Document-Level Relation Extraction

Code and data for the NAACL 2022 paper Few-Shot Document-Level Relation Extraction

Table of Contents

  1. Accessing the data (FREDo benchmark tasks)
  2. Dependencies
  3. Reproducing Results (+ link to trained models)
  4. Relevant Code per Section in Paper
  5. Bibtex for Citing

Accessing the data

Download Sampled Episodes

If you are only interested in the raw data for the tasks, you can directly download sampled episodes here: https://drive.google.com/drive/folders/1PuJSJxqZP4ijxFSBZZ6Fmc0SgR2S8pYU?usp=sharing (test_episodes.zip [~120 MB DL, ~550MB on disk] + train_episodes_single.zip [~330MB DL, ~1.4GB on disk]+ train_episodes_schema.zip [~330MB DL, ~1.4GB on disk])

Training and dev/validation episodes are available in two different versions: "single" or "schema". For "single", each episode contains annotations for only a single relation type. For "schema", each episode contains annotations for multiple relation types sampled via the same procedure as the test episodes (see section 4.3 of paper).

In our paper and code we evaluate macro F1 scores across relation types.

Training and Development Data

  • data/[train, dev, test_docred, test_scierc].json contain all annotated documents used for training and testing.
  • data/*_indices.json contain sampled episodes (only the indices of the documents and which relations are to be annotated/extracted).

You can either use our pipeline or export all episodes as json files to use in your own pipeline.

Test episode sampling is called in train.py (Lines 72+73).

Export Episodes to JSON

If you are evaluating your own model, please use the test episodes as sampled by us. You can run export_test_episodes.py to export all episodes for use with your own pipeline.

Dependencies

Dependencies:

  • torch (1.8.0)
  • transformers (4.18.0)
  • tqdm (4.64.0)
  • wandb (0.12.16)
python3 -m venv venv
source venv/bin/activate
pip install wheel
pip install -U setuptools
pip install torch==1.8.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install transformers tqdm wandb

Reproducing Results

In order to reproduce the results from the paper, you can download the models here: https://drive.google.com/drive/folders/1eulLgrGiOwSZawoOGytA6fJZ-b3Rcrob?usp=sharing

The following are the commands for reporducing the results in the paper. Click on "Expected output" for detailed results.

BASELINE 1-DOC

python train.py --support_docs_eval 1 --model dlmnav+sie+sbn --num_epochs 0 --use_markers False
Expected output
3nX3120z6X
---- INDOMAIN TEST EVAL -----
type         precision      recall          f1    support   
P361              0.20        1.79        0.37    893       
P279              0.02        5.15        0.03    136       
P102              0.42       14.49        0.82    428       
P17               3.76       11.08        5.61    59312     
P35               0.29       12.61        0.56    222       
P495              0.41       10.08        0.79    1081      
P463              0.09        1.62        0.18    493       
P3373             0.18       16.43        0.36    645       
P118              0.13        8.30        0.25    253       
P39               0.03        2.63        0.07    38        
P140              0.12       13.97        0.24    458       
P272              0.07       10.00        0.13    160       
P674              0.28        7.76        0.54    348       
P25               0.16       19.28        0.31    83        
P364              0.03       16.92        0.06    65        
P1001             0.14       10.13        0.27    375       
P194              0.07        8.56        0.14    222       
P582              0.07        3.96        0.14    101       
-                    -           -           -    -         
macro             0.36        9.71        0.60
---- SCIERC TEST EVAL -----
type         precision      recall          f1    support   
USED-FOR          3.60        3.82        3.71    42142     
PART-OF           0.46        1.90        0.74    2529      
CONJUNCTION        1.92        5.24        2.81    6414      
EVALUATE-FOR        1.22        2.79        1.70    4223      
HYPONYM-OF        1.30        3.64        1.92    4482      
COMPARE           0.54        2.49        0.89    2328      
FEATURE-OF        0.33        1.30        0.53    2465      
-                    -           -           -    -         
macro             1.34        3.03        1.76

BASELINE 3-DOC

python train.py --support_docs_eval 3 --model dlmnav+sie+sbn --num_epochs 0 --use_markers False
Expected output
G7BwlKpec6
---- INDOMAIN TEST EVAL -----
type         precision      recall          f1    support   
P17               5.19        6.11        5.61    33229     
P361              0.50        1.80        0.78    887       
P272              0.14       20.00        0.27    55        
P495              0.69        6.35        1.24    850       
P674              0.45        8.59        0.86    128       
P35               0.32        2.70        0.56    111       
P102              0.76       15.77        1.45    279       
P140              0.23        2.34        0.42    171       
P364              0.32       19.23        0.63    78        
P463              0.34        3.11        0.61    193       
P3373             0.37        7.88        0.71    241       
P194              0.21        8.62        0.41    116       
P1001             0.20        7.78        0.39    167       
P279              0.04        8.47        0.08    59        
P25               0.03        5.26        0.05    19        
P118              0.38       13.83        0.75    94        
P582              0.14        5.56        0.27    36        
P39               0.42       50.00        0.84    2         
-                    -           -           -    -         
macro             0.60       10.75        0.89
---- SCIERC TEST EVAL -----
type         precision      recall          f1    support   
PART-OF           0.61        1.65        0.89    1335      
USED-FOR          3.98        3.02        3.44    14548     
HYPONYM-OF        2.16        1.56        1.81    2311      
CONJUNCTION        3.07        4.53        3.66    3332      
COMPARE           1.09        3.36        1.64    1162      
EVALUATE-FOR        1.29        1.89        1.53    2382      
FEATURE-OF        0.68        1.21        0.88    1235      
-                    -           -           -    -         
macro             1.84        2.46        1.98

DLMNAV 1-DOC

python train.py --support_docs_eval 1 --model dlmnav --num_epochs 0 --load_checkpoint checkpoints/best_dlmnav.pt
Expected output
38BE1Nj8Pu
loading model from checkpoints/best_dlmnav.pt
---- INDOMAIN TEST EVAL -----
type         precision      recall          f1    support   
P279              0.37        2.94        0.66    136       
P17              30.40        2.38        4.41    59312     
P463              3.82        8.11        5.19    493       
P495             11.43       16.47       13.50    1081      
P674              4.45       23.56        7.49    348       
P361              3.33        5.04        4.01    893       
P364              1.76       36.92        3.36    65        
P39               0.27       15.79        0.52    38        
P35               5.06       27.48        8.54    222       
P3373             7.12       21.24       10.67    645       
P118             15.54       62.06       24.86    253       
P1001             1.93        6.40        2.96    375       
P194              3.90       30.63        6.92    222       
P25               2.03       45.78        3.89    83        
P102             10.61       14.72       12.33    428       
P272              4.47       47.50        8.17    160       
P582              3.13       27.72        5.62    101       
P140              9.24       10.04        9.62    458       
-                    -           -           -    -         
macro             6.60       22.49        7.37
---- SCIERC TEST EVAL -----
type         precision      recall          f1    support   
USED-FOR          2.54        0.04        0.08    42142     
PART-OF           0.59        0.36        0.44    2529      
FEATURE-OF        0.32        0.16        0.22    2465      
CONJUNCTION        1.06        0.25        0.40    6414      
EVALUATE-FOR        0.91        0.19        0.31    4223      
HYPONYM-OF        6.25        2.57        3.64    4482      
COMPARE           1.31        0.69        0.90    2328      
-                    -           -           -    -         
macro             1.85        0.61        0.86

DLMNAV 3-DOC

python train.py --support_docs_eval 3 --model dlmnav --num_epochs 0 --load_checkpoint checkpoints/best_dlmnav.pt
Expected output
eASm3VD2Bv
loading model from checkpoints/best_dlmnav.pt
---- INDOMAIN TEST EVAL -----
type         precision      recall          f1    support   
P17              32.74        1.66        3.17    33229     
P361              4.51        4.51        4.51    887       
P674              5.35       31.25        9.14    128       
P495             11.91       14.94       13.26    850       
P140              7.08        8.77        7.83    171       
P39               0.00        0.00        0.00    2         
P463              3.45        6.22        4.44    193       
P194              6.65       31.03       10.96    116       
P35               4.64       27.03        7.93    111       
P364             10.21       62.82       17.56    78        
P272              4.55       58.18        8.44    55        
P582              4.36       38.89        7.84    36        
P1001             1.67        2.40        1.97    167       
P118             16.35       45.74       24.09    94        
P279              1.05        5.08        1.73    59        
P25               1.78       47.37        3.43    19        
P102             12.29       15.41       13.67    279       
P3373            10.77       26.14       15.25    241       
-                    -           -           -    -         
macro             7.74       23.75        8.62
---- SCIERC TEST EVAL -----
type         precision      recall          f1    support   
USED-FOR          0.00        0.00        0.00    14548     
PART-OF           0.00        0.00        0.00    1335      
HYPONYM-OF        8.76        1.69        2.83    2311      
CONJUNCTION        0.65        0.06        0.11    3332      
COMPARE           0.94        0.26        0.41    1162      
EVALUATE-FOR        1.73        0.25        0.44    2382      
FEATURE-OF        0.00        0.00        0.00    1235      
-                    -           -           -    -         
macro             1.73        0.32        0.54

DLMNAV+SIE 1-DOC

python train.py --support_docs_eval 1 --model dlmnav+sie --num_epochs 0 --load_checkpoint checkpoints/best_sie_sbn.pt
Expected output
vxMctq1z4F
loading model from checkpoints/best_sie_sbn.pt
---- INDOMAIN TEST EVAL -----
type         precision      recall          f1    support   
P279              0.52        2.94        0.89    136       
P17              22.84        3.44        5.98    59312     
P463              3.27       11.56        5.10    493       
P495              5.14        2.68        3.53    1081      
P674              5.90       22.99        9.38    348       
P1001             6.67       17.60        9.68    375       
P35               6.17       22.52        9.68    222       
P39               0.33       18.42        0.64    38        
P361              3.04        2.80        2.92    893       
P364              2.60       35.38        4.85    65        
P25               1.85       48.19        3.56    83        
P3373             5.87       25.89        9.58    645       
P118             11.25       77.08       19.63    253       
P102              6.71       26.17       10.68    428       
P582              2.59       17.82        4.52    101       
P194              4.78       13.96        7.13    222       
P272              5.31       44.38        9.49    160       
P140              6.44       13.76        8.77    458       
-                    -           -           -    -         
macro             5.63       22.64        7.00
---- SCIERC TEST EVAL -----
type         precision      recall          f1    support   
USED-FOR          6.35        1.73        2.72    42142     
PART-OF           0.68        0.51        0.59    2529      
CONJUNCTION        1.61        0.72        0.99    6414      
FEATURE-OF        0.16        0.12        0.14    2465      
EVALUATE-FOR        1.86        1.18        1.45    4223      
HYPONYM-OF        3.76        1.43        2.07    4482      
COMPARE           2.36        1.42        1.77    2328      
-                    -           -           -    -         
macro             2.40        1.02        1.39

DLMNAV+SIE 3-DOC

python train.py --support_docs_eval 3 --model dlmnav+sie --num_epochs 0 --load_checkpoint checkpoints/best_sie_sbn.pt
Expected output
Ta4ae83La8
loading model from checkpoints/best_sie_sbn.pt
---- INDOMAIN TEST EVAL -----
type         precision      recall          f1    support   
P361              3.92        3.49        3.69    887       
P674              5.29       42.97        9.42    128       
P17              24.60        5.30        8.72    33229     
P495              3.57        2.47        2.92    850       
P140              4.95       13.45        7.23    171       
P102              6.39       25.09       10.18    279       
P364              4.61       60.26        8.56    78        
P463              3.39       14.51        5.49    193       
P39               0.05       50.00        0.11    2         
P194              6.32       24.14       10.02    116       
P1001             5.08       23.35        8.35    167       
P279              1.24       22.03        2.35    59        
P118              7.23       78.72       13.24    94        
P272              3.62       63.64        6.85    55        
P35               3.95       14.41        6.20    111       
P3373             6.77       32.78       11.22    241       
P582              2.58       36.11        4.82    36        
P25               1.21       73.68        2.37    19        
-                    -           -           -    -         
macro             5.26       32.58        6.76
---- SCIERC TEST EVAL -----
type         precision      recall          f1    support   
USED-FOR          6.37        3.62        4.61    14548     
HYPONYM-OF        5.14        2.34        3.21    2311      
FEATURE-OF        0.36        0.40        0.38    1235      
PART-OF           0.74        0.82        0.78    1335      
EVALUATE-FOR        1.57        1.22        1.37    2382      
CONJUNCTION        3.14        1.98        2.43    3332      
COMPARE           2.78        2.58        2.68    1162      
-                    -           -           -    -         
macro             2.87        1.85        2.21

DLMNAV+SIE+SBN 1-DOC

python train.py --support_docs_eval 1 --model dlmnav+sie+sbn --num_epochs 0 --load_checkpoint checkpoints/best_sie_sbn.pt
Expected output
HXGmfwnstC
loading model from checkpoints/best_sie_sbn.pt
---- INDOMAIN TEST EVAL -----
type         precision      recall          f1    support   
P361              0.33        3.14        0.59    893       
P279              0.04        1.47        0.07    136       
P17               8.89       20.26       12.35    59312     
P35               0.72       23.42        1.40    222       
P495              1.47       25.16        2.78    1081      
P463              0.36       14.00        0.71    493       
P674              0.40       31.03        0.79    348       
P3373             0.98       43.72        1.92    645       
P1001             0.62       29.60        1.22    375       
P194              0.45       32.88        0.88    222       
P118              0.42       34.78        0.84    253       
P272              0.69       36.25        1.36    160       
P39               0.01       13.16        0.02    38        
P582              0.23       15.84        0.46    101       
P102              0.90       49.30        1.76    428       
P140              0.23       12.66        0.44    458       
P364              0.06       15.38        0.12    65        
P25               0.40       21.69        0.78    83        
-                    -           -           -    -         
macro             0.96       23.54        1.58
---- SCIERC TEST EVAL -----
type         precision      recall          f1    support   
USED-FOR          4.62        5.14        4.86    42142     
PART-OF           0.58        1.82        0.88    2529      
CONJUNCTION        3.30        8.53        4.76    6414      
FEATURE-OF        0.39        1.74        0.63    2465      
EVALUATE-FOR        2.08        4.10        2.76    4223      
HYPONYM-OF        2.22        4.46        2.97    4482      
COMPARE           1.74        4.34        2.48    2328      
-                    -           -           -    -         
macro             2.13        4.30        2.76

DLMNAV+SIE+SBN 3-DOC

python train.py --support_docs_eval 3 --model dlmnav+sie+sbn --num_epochs 0 --load_checkpoint checkpoints/best_sie_sbn.pt
Expected output
AmcBdjBC9r
loading model from checkpoints/best_sie_sbn.pt
---- INDOMAIN TEST EVAL -----
type         precision      recall          f1    support   
P17              11.96       17.46       14.20    33229     
P361              0.66        3.83        1.13    887       
P495              2.63       16.12        4.52    850       
P272              0.88       32.73        1.72    55        
P279              0.11        5.08        0.22    59        
P140              1.24       12.28        2.25    171       
P102              2.10       48.03        4.02    279       
P39               0.01       50.00        0.01    2         
P674              1.24       25.00        2.37    128       
P364              0.66       34.62        1.30    78        
P463              0.33        6.22        0.63    193       
P35               0.41        5.41        0.77    111       
P194              1.43       30.17        2.73    116       
P1001             1.88       29.94        3.54    167       
P118              0.62       28.72        1.21    94        
P3373             3.96       42.32        7.25    241       
P582              0.34       13.89        0.66    36        
P25               0.37       21.05        0.72    19        
-                    -           -           -    -         
macro             1.71       23.49        2.74
---- SCIERC TEST EVAL -----
type         precision      recall          f1    support   
USED-FOR          6.06        5.69        5.87    14548     
HYPONYM-OF        4.48        3.81        4.12    2311      
FEATURE-OF        0.46        0.97        0.62    1235      
PART-OF           0.85        1.87        1.17    1335      
CONJUNCTION        5.63        9.12        6.96    3332      
EVALUATE-FOR        3.12        3.57        3.33    2382      
COMPARE           2.83        4.65        3.52    1162      
-                    -           -           -    -         
macro             3.35        4.24        3.66

Relevant Code per Section in Paper

Section 4.3 Test Episode Sampling

Covered by parse_test() in src/data.py (lines 138-324)

Section 5.1.1 Baseline

Covered in src/models/base_model.py Covered in src/models/dlmnav_sbn.py

Run model using:

python train.py --query_docs_train 3 --query_docs_eval 3 --support_docs_train 1 --support_docs_eval 1 --model dlmnav+sie+sbn --num_epochs 0 --use_markers False

Section 5.1.2 DLMNAV

Covered in src/models/base_model.py Covered in src/models/dlmnav_sbn.py

Train model using:

python train.py --query_docs_train 3 --query_docs_eval 3 --support_docs_train 1 --support_docs_eval 1 --model dlmnav

Section 5.1.3 SIE

Covered in src/models/base_model.py Covered in src/models/dlmnav_sie.py

Train model using:

python train.py --query_docs_train 3 --query_docs_eval 3 --support_docs_train 1 --support_docs_eval 1 --model dlmnav+sie

Section 5.1.4 SBN

Covered in src/models/base_model.py Covered in src/models/dlmnav_sbn.py

Train model using:

python train.py --query_docs_train 3 --query_docs_eval 3 --support_docs_train 1 --support_docs_eval 1 --model dlmnav+sie+sbn

Section 5.2 Sampling Training & Development Episodes

Covered by parse_episodes() in src/data.py (lines 144-324)

In order to control sampling strategies, use the following parameters:

  • --balancing_train hard/soft/single
  • --balancing_dev hard/soft/single

The above parameters control how relation types are sampled for train/dev episodes. "hard" balances the annotated relation types so that the resulting episodes contain the same amount of examples for each relation type. "soft" corresponds to the balancing procedure used for the test episodes, which is outlined in section 4.3 of the paper. "single" annotates only a single relation type per document.

  • --ensure_positive True/False

The above parameter controls whether the query documents contain at least a single positive example of the relation type r_s.

Parameters for test sets are found in train.py (lines 70-71)

Cite

If you use the benchmark or the code, please cite this paper:

@inproceedings{popovic_fsdlre_2022, 
 title = "{F}ew-{S}hot {D}ocument-{L}evel {R}elation {E}xtraction", 
 author = "Popovic, Nicholas and Färber, Michael",
 booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies", 
 year = "2022"
}

About

Code and data for NAACL 2022 paper Few-Shot Document-Level Relation Extraction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%