Skip to content

Latest commit

 

History

History
46 lines (35 loc) · 3.06 KB

File metadata and controls

46 lines (35 loc) · 3.06 KB

ModelsFact Dataset

This dataset contains 4.2k human factuality judgements:

CNN/DM:

  • 600 summaries generated by BART-Large
  • 600 summaries generated by BertSum (Liu and Lapata, 2019)
  • 600 summaries generated by PGConv (See et al., 2017)
  • 600 summaries generated by BottomUp (Gehrmann et al., 2018)
  • 600 summaries generated by AbsRL (Chen and Bansal, 2018)

XSum

  • 600 summaries generated by BART-Large
  • 600 summaries generated by BertSum

For each of these 4.2k summaries, one randomly selected sentence (displayed in context) was annotated for factuality by three annotators, and an aggregated judgement (produced by MACE) has been added. Note that the annotated BART-Large summaries are taken from the constraint-fact dataset.

Fields

  • id: ID between 0 and 4199
  • summary: Complete summary
  • summary_raw: Same as summary
  • summary_sentence: The randomly selected summary sentence that annotators judged for factuality
  • summary_sentence_contextleft: Left context of the summary_sentence
  • summary_sentence_contextright: Right context of the summary_sentence
  • model_name: Name of the model that generated the summary (abs_rl, bart, bert_sum, bottom_up, or pointer_gen_cov)
  • abstractiveness_constraint: Abstractiveness constraint used to generate this summary (none, lambda2, lambda4, 1/lambda2, or 1/lambda1, see our paper)
  • annotator_comments: Comments from the annotators
  • annotator_ids: Anonymized annotator IDs (the annotator ID space is shared with the annotator ID space in the models-fact dataset)
  • annotator_votes: Factuality votes from the annotators (0=not factually consistent with the displayed document(s); 1=factually consistent)
  • annotator_votes_combined: Aggregated factuality judgement from MACE
  • dataset_name: Name of the dataset (cnn_dailymail or xsum)
  • document_full: Complete input document(s)
  • document_short: Shortened document(s) displayed to the annotators, which contains the sentences most similar to the summary_sentence.
  • document_original: Original input document(s) from the test set. This is the same as document_full except for XSum, where document_full contains the first sentence reinserted, but document_original does not (see Footnote 6 of the paper).
  • document_id: Document ID in dataset_name

Download

This dataset can be downloaded here: models_fact_v1.0.tar.gz

The dataset does not contain the input articles from CNN/DM and XSum, but we provide a script that will insert them from the corresponding Huggingface datasets. Run the script like this:

python abstractive-factual-tradeoff/misc/unpack.py /path/to/models_fact_v1.0.tar.gz

That will create a directory /path/to/models_fact_v1.0 next to the tarball. The directory will contain a data.jsonl file with the dataset. It will also contain directories with the full test.source and test.target files for cnn_dailymail and xsum.