This dataset contains 4.2k human factuality judgements:
CNN/DM:
- 600 summaries generated by BART-Large
- 600 summaries generated by BertSum (Liu and Lapata, 2019)
- 600 summaries generated by PGConv (See et al., 2017)
- 600 summaries generated by BottomUp (Gehrmann et al., 2018)
- 600 summaries generated by AbsRL (Chen and Bansal, 2018)
XSum
- 600 summaries generated by BART-Large
- 600 summaries generated by BertSum
For each of these 4.2k summaries, one randomly selected sentence (displayed in context) was annotated for factuality by three annotators, and an aggregated judgement (produced by MACE) has been added. Note that the annotated BART-Large summaries are taken from the constraint-fact dataset.
id
: ID between 0 and 4199summary
: Complete summarysummary_raw
: Same assummary
summary_sentence
: The randomly selected summary sentence that annotators judged for factualitysummary_sentence_contextleft
: Left context of thesummary_sentence
summary_sentence_contextright
: Right context of thesummary_sentence
model_name
: Name of the model that generated the summary (abs_rl
,bart
,bert_sum
,bottom_up
, orpointer_gen_cov
)abstractiveness_constraint
: Abstractiveness constraint used to generate this summary (none
,lambda2
,lambda4
,1/lambda2
, or1/lambda1
, see our paper)annotator_comments
: Comments from the annotatorsannotator_ids
: Anonymized annotator IDs (the annotator ID space is shared with the annotator ID space in the models-fact dataset)annotator_votes
: Factuality votes from the annotators (0=not factually consistent with the displayed document(s); 1=factually consistent)annotator_votes_combined
: Aggregated factuality judgement from MACEdataset_name
: Name of the dataset (cnn_dailymail
orxsum
)document_full
: Complete input document(s)document_short
: Shortened document(s) displayed to the annotators, which contains the sentences most similar to thesummary_sentence
.document_original
: Original input document(s) from the test set. This is the same asdocument_full
except for XSum, wheredocument_full
contains the first sentence reinserted, butdocument_original
does not (see Footnote 6 of the paper).document_id
: Document ID indataset_name
This dataset can be downloaded here: models_fact_v1.0.tar.gz
The dataset does not contain the input articles from CNN/DM and XSum, but we provide a script that will insert them from the corresponding Huggingface datasets. Run the script like this:
python abstractive-factual-tradeoff/misc/unpack.py /path/to/models_fact_v1.0.tar.gz
That will create a directory /path/to/models_fact_v1.0
next to the tarball. The directory will contain a data.jsonl
file with the dataset. It will also contain directories with the full test.source
and test.target
files for cnn_dailymail
and xsum
.