forked from clef2018-factchecking/clef2018-factchecking
-
Notifications
You must be signed in to change notification settings - Fork 7
/
run_scorer_out.txt
112 lines (112 loc) · 10.6 KB
/
run_scorer_out.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
Scoring a random baseline for task 1
INFO : Started evaluating results for Task 1 ...
INFO : Reading gold predictions from file /Users/pzg477/git/clef2019-factchecking-task1/data/training/20160926_1pres.tsv
INFO : Reading predicted ranking order from file /Users/pzg477/git/clef2019-factchecking-task1/scorer/data/task1_random_baseline.txt
INFO : ======================================== RESULTS for task1_random_baseline.txt =========================================
INFO : AVERAGE PRECISION: 0.0242
INFO : ========================================================================================================================
INFO : RECIPROCAL RANK: 0.0172
INFO : ========================================================================================================================
INFO : R-PRECISION (R=37): 0.0000
INFO : ========================================================================================================================
INFO : PRECISION@N: @1 @3 @5 @10 @20 @50
INFO : 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
INFO : ========================================================================================================================
INFO : Description of the evaluation metrics:
INFO : !!! THE OFFICIAL METRIC USED FOR THE COMPETITION RANKING IS MEAN AVERAGE PRECISION (MAP) !!!
INFO : R-Precision is Precision at R, where R is the number of relevant line_numbers for the evaluated set.
INFO : Average Precision is the precision@N, estimated only @ each relevant line_number and then averaged over the number of relevant line_numbers.
INFO : Reciprocal Rank is the reciprocal of the rank of the first relevant line_number in the list of predictions sorted by score (descendingly).
INFO : Precision@N is precision estimated for the first N line_numbers in the provided ranked list.
INFO : The MEAN versions of each metric are provided to average over multiple debates (each with separate prediction file).
INFO : ========================================================================================================================
INFO : ========================================================================================================================
**********
Scoring the gold predictions for task 1
INFO : Started evaluating results for Task 1 ...
INFO : Reading gold predictions from file /Users/pzg477/git/clef2019-factchecking-task1/data/training/20160926_1pres.tsv
INFO : Reading predicted ranking order from file /Users/pzg477/git/clef2019-factchecking-task1/scorer/data/task1_gold.txt
INFO : ============================================== RESULTS for task1_gold.txt ==============================================
INFO : AVERAGE PRECISION: 1.0000
INFO : ========================================================================================================================
INFO : RECIPROCAL RANK: 1.0000
INFO : ========================================================================================================================
INFO : R-PRECISION (R=37): 1.0000
INFO : ========================================================================================================================
INFO : PRECISION@N: @1 @3 @5 @10 @20 @50
INFO : 1.0000 1.0000 1.0000 1.0000 1.0000 0.7400
INFO : ========================================================================================================================
INFO : Description of the evaluation metrics:
INFO : !!! THE OFFICIAL METRIC USED FOR THE COMPETITION RANKING IS MEAN AVERAGE PRECISION (MAP) !!!
INFO : R-Precision is Precision at R, where R is the number of relevant line_numbers for the evaluated set.
INFO : Average Precision is the precision@N, estimated only @ each relevant line_number and then averaged over the number of relevant line_numbers.
INFO : Reciprocal Rank is the reciprocal of the rank of the first relevant line_number in the list of predictions sorted by score (descendingly).
INFO : Precision@N is precision estimated for the first N line_numbers in the provided ranked list.
INFO : The MEAN versions of each metric are provided to average over multiple debates (each with separate prediction file).
INFO : ========================================================================================================================
INFO : ========================================================================================================================
**********
Scoring BOTH the gold and random baseline for task 1 (only for example purposes, scoring multiple files for 1 debate will lead to wrong metrics)
INFO : Started evaluating results for Task 1 ...
INFO : Reading gold predictions from file /Users/pzg477/git/clef2019-factchecking-task1/data/training/20160926_1pres.tsv
INFO : Reading predicted ranking order from file /Users/pzg477/git/clef2019-factchecking-task1/scorer/data/task1_gold.txt
INFO : ============================================== RESULTS for task1_gold.txt ==============================================
INFO : AVERAGE PRECISION: 1.0000
INFO : ========================================================================================================================
INFO : RECIPROCAL RANK: 1.0000
INFO : ========================================================================================================================
INFO : R-PRECISION (R=37): 1.0000
INFO : ========================================================================================================================
INFO : PRECISION@N: @1 @3 @5 @10 @20 @50
INFO : 1.0000 1.0000 1.0000 1.0000 1.0000 0.7400
INFO : ========================================================================================================================
INFO : Reading gold predictions from file /Users/pzg477/git/clef2019-factchecking-task1/data/training/20160926_1pres.tsv
INFO : Reading predicted ranking order from file /Users/pzg477/git/clef2019-factchecking-task1/scorer/data/task1_random_baseline.txt
INFO : ======================================== RESULTS for task1_random_baseline.txt =========================================
INFO : AVERAGE PRECISION: 0.0242
INFO : ========================================================================================================================
INFO : RECIPROCAL RANK: 0.0172
INFO : ========================================================================================================================
INFO : R-PRECISION (R=37): 0.0000
INFO : ========================================================================================================================
INFO : PRECISION@N: @1 @3 @5 @10 @20 @50
INFO : 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
INFO : ========================================================================================================================
INFO : =================================================== AVERAGED RESULTS ===================================================
INFO : MEAN AVERAGE PRECISION (MAP): 0.5121
INFO : ========================================================================================================================
INFO : MEAN RECIPROCAL RANK: 0.5086
INFO : ========================================================================================================================
INFO : MEAN R-PRECISION: 0.5000
INFO : ========================================================================================================================
INFO : MEAN PRECISION@N: @1 @3 @5 @10 @20 @50
INFO : 0.5000 0.5000 0.5000 0.5000 0.5000 0.3700
INFO : ========================================================================================================================
INFO : Description of the evaluation metrics:
INFO : !!! THE OFFICIAL METRIC USED FOR THE COMPETITION RANKING IS MEAN AVERAGE PRECISION (MAP) !!!
INFO : R-Precision is Precision at R, where R is the number of relevant line_numbers for the evaluated set.
INFO : Average Precision is the precision@N, estimated only @ each relevant line_number and then averaged over the number of relevant line_numbers.
INFO : Reciprocal Rank is the reciprocal of the rank of the first relevant line_number in the list of predictions sorted by score (descendingly).
INFO : Precision@N is precision estimated for the first N line_numbers in the provided ranked list.
INFO : The MEAN versions of each metric are provided to average over multiple debates (each with separate prediction file).
INFO : ========================================================================================================================
INFO : ========================================================================================================================
**********
TEST ERROR: Scoring task 1, where the provided list of line_numbers contains a line_number, which is not in the gold file.
ERROR : Problem with line_number: 1500. They should be consecutive and starting from 1.
ERROR : Bad format for pred file /Users/pzg477/git/clef2019-factchecking-task1/scorer/data/task1_other_line_number.txt. Cannot score.
**********
TEST ERROR: Scoring task 1, where the provided list of line_numbers does not contain all lines from the gold file.
INFO : Started evaluating results for Task 1 ...
INFO : Reading gold predictions from file /Users/pzg477/git/clef2019-factchecking-task1/data/training/20160926_1pres.tsv
INFO : Reading predicted ranking order from file /Users/pzg477/git/clef2019-factchecking-task1/scorer/data/task1_not_all_lines.txt
ERROR : The predictions do not match the lines from the gold file - missing or extra line_no
Traceback (most recent call last):
File "task1.py", line 203, in <module>
thresholds, precisions, avg_precision, reciprocal_rank, num_relevant = evaluate(gold_file, pred_file)
File "task1.py", line 106, in evaluate
gold_labels, line_score = _read_gold_and_pred(gold_fpath, pred_fpath)
File "task1.py", line 47, in _read_gold_and_pred
raise ValueError('The predictions do not match the lines from the gold file - missing or extra line_no')
ValueError: The predictions do not match the lines from the gold file - missing or extra line_no
**********