This repository contains the dataset for the CLEF2019-CheckThat! task 1.
For information about the previous edition of the shared task, refer to CLEF2018-CheckThat!
It also contains the format checker, scorer and baselines for the task.
FCPD corpus for the CLEF-2019 LAB on "Automatic Identification and Verification of Claims"
Version 2.0: May 13, 2019 (TEST GOLD LABELS RELEASED)
This file contains the basic information regarding the CLEF2019-CheckThat! Task 1 on Check-Worthiness estimation dataset provided for the CLEF2019-CheckThat Lab on "Automatic Identification and Verification of Claims". The current TRIAL version (1.0, March 12, 2019) corresponds to the release of a part of the training data set. The test set will be provided in future versions. All changes and updates on these data sets and tools are reported in Section 1 of this document.
Table of contents:
- Evaluation Results
- List of Versions
- Contents of the Distribution v1.0
- Subtasks
- Data Format
- Results File Format
- Format checkers
- Scorers
- Baselines
- Notes
- Licensing
- Citation
- Credits
Note that, the main evaluation measure is MAP on the primary submission! The teams are ordered according to this score.
Team Name | submission | MAP | RR | R-P | P@1 | P@3 | P@5 | P@10 | P@20 | P@50 |
---|---|---|---|---|---|---|---|---|---|---|
Copenhagen | primary | .1660 1 |
.41763 | .13874 | .28572 | .2381 1 |
.2571 1 |
.22862 | .15712 | .12292 |
contr.-1 | .1496 | .3098 | .1297 | .1429 | .2381 | .2000 | .2000 | .1429 | .1143 | |
contr.-2 | .1580 | .2740 | .1622 | .1429 | .1905 | .2286 | .2429 | .1786 | .1200 | |
TheEarthIsFlat | primary | .15972 | .195311 | .2052 1 |
.00004 | .09523 | .22862 | .21433 | .1857 1 |
.1457 1 |
contr.-1 | .1453 | .3158 | .1101 | .2857 | .2381 | .1429 | .1429 | .1357 | .1171 | |
contr.-2 | .1821 | .4187 | .1937 | .2857 | .2381 | .2286 | .2286 | .2143 | .1400 | |
IPIPAN | primary | .13323 | .28646 | .14812 | .14293 | .09523 | .14295 | .17145 | .15003 | .11713 |
Terrier | primary | .12634 | .32535 | .10888 | .28572 | .2381 1 |
.20003 | .20004 | .12866 | .09147 |
UAICS | primary | .12345 | .4650 1 |
.14603 | .4286 1 |
.2381 1 |
.22862 | .2429 1 |
.14294 | .09436 |
contr.-1 | .0649 | .2817 | .0655 | .1429 | .2381 | .1429 | .1143 | .0786 | .0343 | |
contr.-2 | .0726 | .4492 | .0547 | .4286 | .2857 | .1714 | .1143 | .0643 | .0257 | |
Factify | primary | .12106 | .22858 | .12925 | .14293 | .09523 | .11436 | .14296 | .14294 | .10864 |
JUNLP | primary | .11627 | .44192 | .11287 | .28572 | .19052 | .17144 | .17145 | .12866 | .10005 |
contr.-1 | .0976 | .3054 | .0814 | .1429 | .2381 | .1429 | .0857 | .0786 | .0771 | |
contr.-2 | .1226 | .4465 | .1357 | .2857 | .2381 | .2000 | .1571 | .1286 | .0886 | |
nlpir01 | primary | .10008 | .28407 | .10639 | .14293 | .2381 1 |
.17144 | .10008 | .12147 | .09436 |
contr.-1 | .0966 | .3797 | .0849 | .2857 | .1905 | .2286 | .1429 | .1071 | .0886 | |
contr.-2 | .0965 | .3391 | .1129 | .1429 | .2381 | .2286 | .1571 | .1286 | .0943 | |
TOBB ETU | primary | .08849 | .202810 | .11506 | .00004 | .09523 | .14295 | .12867 | .13575 | .08298 |
contr.-1 | .0898 | .2013 | .1150 | .0000 | .1429 | .1143 | .1286 | .1429 | .0829 | |
contr.-2 | .0913 | .3427 | .1007 | .1429 | .1429 | .1143 | .0714 | .1214 | .0829 | |
IIT (ISM) Dhanbad, India | primary | .083510 | .22389 | .071411 | .00004 | .19052 | .11436 | .08579 | .08579 | .07719 |
é proibido cochilar | primary | .079611 | .35144 | .088610 | .14293 | .2381 1 |
.14295 | .12867 | .10718 | .071410 |
contr.-1 | .1357 | .5414 | .1595 | .4286 | .2381 | .2571 | .2714 | .1643 | .1200 | |
Fire | primary | .052812 | .136512 | .057012 | .00004 | .04764 | .05717 | .042910 | .050010 | .054311 |
- v1.0 [2019/03/12] - TRIAL data. The training data for task 1 contains 19 fact-checked documents - debates, speeches, press conferences, etc, analysed by factcheck.org.
We provide the following files:
-
Main folder: data
- Subfolder /training
Contains all training data released with the version 1.0 - Subfolder /test_annotated
Contains the gold labels for the test datsaset, released with the version 2.0.
- README.md
this file
-
clef18.bib - Bibliography of the overview papers from CLEF-2018 Shared task.
-
working_notes/clef19_checkthat.bib - Bibliography of overview and participants' papers.
-
working_notes/clef18_checkthat.bib - Bibliography of last year's overview and participants' papers.
- Subfolder /training
Predict which claim in a political debate should be prioritized for fact-checking. In particular, given a debate, speech or a press conference the goal is to produce a ranked list of its sentences based on their worthiness for fact checking.
The datasets are text files with the information TAB separated. The text encoding is UTF-8.
line_number speaker text label
Where:
- line_no: the line number (starting from 1)
- speaker: the person speaking (a candidate, the moderator, or "SYSTEM"; the latter is used for the audience reaction)
- text: a sentence that the speaker said
- label: 1 if this sentence is to be fact-checked, and 0 otherwise
Example:
...
65 TRUMP So we're losing our good jobs, so many of them. 0
66 TRUMP When you look at what's happening in Mexico, a friend of mine who builds plants said it's the eighth wonder of the world. 0
67 TRUMP They're building some of the biggest plants anywhere in the world, some of the most sophisticated, some of the best plants. 0
68 TRUMP With the United States, as he said, not so much. 0
69 TRUMP So Ford is leaving. 1
70 TRUMP You see that, their small car division leaving. 1
71 TRUMP Thousands of jobs leaving Michigan, leaving Ohio. 1
72 TRUMP They're all leaving. 0
...
For this task, the expected results file is a list of claims with the estimated score for check-worthiness. Each line contains a tab-separated line with:
line_number score
Where line_number is the number of the claim in the debate and score is a number, indicating the priority of the claim for fact-checking. For example:
1 0.9056
2 0.6862
3 0.7665
4 0.9046
5 0.2598
6 0.6357
7 0.9049
8 0.8721
9 0.5729
10 0.1693
11 0.4115
...
Your result file MUST contain scores for all lines from the respective input file. Otherwise the scorer will not score this result file.
The checker for the subtask is located in the format_checker module of the project. The format checker verifies that your generated results file complies with the expected format. To launch it run:
python3 format_checker/main.py --pred_file_path=<path_to_your_results_file>
run_format_checker.sh
includes examples of the output of the checker when dealing with an ill-formed results file.
Its output can be seen in run_format_checker_out.txt
The checks for completness (if the result files contain all lines / claims) is NOT handled by the format checkers, because they receive only the results file and not the gold one.
Launch the scorers for the task as follows:
python3 scorer/main.py --gold_file_path="<path_gold_file_1, path_to_gold_file_k>" --pred_file_path="<predictions_file_1, predictions_file_k>"
Both --gold_file_path
and --pred_file_path
take a single string that contains a comma separated list of file paths. The lists may be of arbitraty positive length (so even a single file path is OK) but their lengths must match.
<path_to_gold_file_n> is the path to the file containing the gold annotations for debate n and <predictions_file_n> is the path to the respective file holding predicted results for debate n, which must follow the format, described in the 'Results File Format' section.
The scorers call the format checkers for the task to verify the output is properly shaped. They also handle checking if the provided predictions file contains all lines / claims from the gold one.
run_scorer.sh
provides examples on using the scorers and the results can be viewed in the run_scorer_out.txt file.
For Task 1 (ranking): R-Precision, Average Precision, Recipocal Rank, Precision@k and means of these over multiple debates. The official metric for task1, that will be used for the competition ranking is the Mean Average Precision (MAP)
The baselines module contains a random and a simple ngram baseline for the task.
If you execute main.py, both of the baselines will be trained on all but the 20190108_oval_office.tsv debate and evaluated on the 20190108_oval_office.tsv debate. The performance of both baselines will be displayed.
These datasets are free for general research use.
- When referring to the 2019 shared task, cite the following paper:
@InProceedings{clef-checkthat:2019,
author = "Elsayed, Tamer and
Nakov, Preslav and
Barr\'{o}n-Cede{\~n}o, Alberto and
Hasanain, Maram and
Suwaileh, Reem and
{Da San Martino}, Giovanni and
Atanasova, Pepa",
title = "Overview of the CLEF-2019 CheckThat!: Automatic Identification and Verification of Claims",
booktitle = "Experimental IR Meets Multilinguality, Multimodality, and Interaction",
series = "LNCS",
pubblisher = "Springer",
address = "Lugano, Switzerland",
month = "September",
year = 2019
}
- When referring specifically to Task 1, please, cite the following :
@InProceedings{clef-checkthat-T1:2019,
author = "Atanasova, Pepa and
Nakov, Preslav and
Karadzhov, Georgi and
Mohtarami, Mitra and
Da San Martino, Giovanni",
title = "Overview of the CLEF-2019 CheckThat! Lab on Automatic Identification and Verification of Claims. Task 1: Check-Worthiness",
crossref = "clef-ceur:19"
}
- To cite participants' papers refer to the following file working_notes/clef19_checkthat.bib.
- If you want to cite any of the papers from the previous edition of the task, refer to this file working_notes/clef18_checkthat.bib. [PROCEEDINGS WITH ALL PAPERS from 2018]
Lab Organizers:
- Pepa Atanasova, University of Copenhagen
- Preslav Nakov, Qatar Computing Research Institute, HBKU
- Mitra Mohtarami, MIT
- Georgi Karadzhov, Sofia University
- Spas Kyuchukov, Sofia University
- Alberto Barrón-Cedeño, Qatar Computing Research Institute, HBKU
- Giovanni Da San Martino, Qatar Computing Research Institute, HBKU
- Tamer Elsayed, Qatar University
- Maram Hasanain, Qatar University
- Reem Suwaileh, Qatar University
Task website: https://sites.google.com/view/clef2019-checkthat/ The official rules are published on the website, check them!
Contact: clef-factcheck@googlegroups.com