Factual CoCo: A metric for factual consistency in text summarization via counterfactual estimation

The implementation of Factual Consistency Evaluation for Text Summarization via Counterfactual Estimation in PyTorch.

Requirements

Python version 3.6
Torch version 1.6.0
spaCy v3.1 (Install from here)

Besides, you need to download the model used in spaCy for part-of-speech (pos) tagging

python -m spacy download en_core_web_sm

or download from en_core_web_sm-3.1.0-py3-none-any.whl from here, and then run pip install en_core_web_sm-3.1.0-py3-none-any.whl.

Quick Start

1. Clone the source code

git clone http://gitlab.alibaba-inc.com/yuexiang.xyx/factual_coco.git

2. Install fairseq

We provide BART as the scoring model adopted in CoCo, and implement it via fairseq (which is provided in this repository). And you can install fairseq via:

cd factual_coco
pip install --editable ./

If you would like to adopt other summarization model as the scoring model, you can skip this step and implement your own scoring model.

3. Provide model path and data path

Before execute run_coco.py to get the coco score, you should provide:

model_path: The path to the scoring model, which is an independent summarization model, and it is not necessary to be the model that generates the evaluated summaries.
data_path: The path to the source documents (named as source.txt) and summaries (named as summary.txt). One document/summary per line. (We provide an example in the data folder)

Note: You might need to modify the load_model function in the code according scoring model you use.
In this repository, we adopt BART as the scoring model and implement it via fairseq. The checkpoints can be downloaded from here, including bart.large.cnn and bart.large.xsum .
Take bart.large.cnn as example, the model path should include:

bart.large.cnn
│   model.pt
│   dict.txt (it can be a copy of the dict.source.txt or dict.target.txt)

4. Get CoCo scores

python3 run_coco.py --model_path /path/to/model --data_path /path/to/data --output_file coco_score.txt --mask token

mask is used to set up the mask strategy (one of ['token', 'span', 'sent', 'doc'], more details can be found in the paper). And you can design your own mask strategies in the mask function.
output_file denotes the file to save the generated coco scores.

Cite

If you find this repository useful for your research or development, please cite the following paper:

@inproceedings{xie2021factual,
    title = "Factual Consistency Evaluation for Text Summarization via Counterfactual Estimation",
    author = "Xie, Yuexiang  and Sun, Fei  and Deng, Yang  and Li, Yaliang  and Ding, Bolin",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    url = "https://aclanthology.org/2021.findings-emnlp.10",
    pages = "100--110"
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
fairseq		fairseq
README.md		README.md
encoder.json		encoder.json
run_coco.py		run_coco.py
setup.py		setup.py
vocab.bpe		vocab.bpe
vocab_aligned		vocab_aligned

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Factual CoCo: A metric for factual consistency in text summarization via counterfactual estimation

Requirements

Quick Start

1. Clone the source code

2. Install fairseq

3. Provide model path and data path

4. Get CoCo scores

Cite

About

Releases

Packages

Languages

xieyxclack/factual_coco

Folders and files

Latest commit

History

Repository files navigation

Factual CoCo: A metric for factual consistency in text summarization via counterfactual estimation

Requirements

Quick Start

1. Clone the source code

2. Install fairseq

3. Provide model path and data path

4. Get CoCo scores

Cite

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages