labels for columns in benchmark #1

amueller · 2020-10-23T00:31:31Z

hey! I was trying to run the benchmark, but it's not clear to me where the ground truth labels are stored.
I assume meta_data.csv maps the column id to the csv file it came from, right?

Thanks!

The text was updated successfully, but these errors were encountered:

pvn25 · 2020-10-23T18:40:43Z

Hi,

The y_act column in the data_train and data_test files denote the ground truth label. The label vocabulary is coded as follows:

Numeric : 0
Categorical: 1
Datetime:2
Sentence:3
URL: 4
Numbers: 5
List: 6
Not-Generalizable: 7
Custom Object (or Context-Specific): 8

I've updated the readme to reflect this. I'm really sorry for this confusion.

Yes, the meta_data.csv maps every column to the source CSV file. So, a join on Record_id in data_train and meta_data would give the source file of that column. The raw CSV files can be downloaded from here.

Thank you for pointing this out.

amueller · 2020-10-26T04:18:45Z

thank you for the quick reply, I'll give it a go next week!

amueller · 2020-12-23T19:09:54Z

Took a bit longer, results on the training set are here:

                   precision    recall  f1-score   support

          Numeric       0.78      0.86      0.82      2909
      Categorical       0.51      0.64      0.57      1854
         Datetime       0.00      0.00      0.00       549
         Sentence       0.17      0.69      0.27       293
              URL       0.00      0.00      0.00       120
          Numbers       0.00      0.00      0.00       469
             List       0.00      0.00      0.00       188
Not-Generalizable       0.44      0.60      0.51       848
    Custom Object       0.00      0.00      0.00       706

         accuracy                           0.56      7936
        macro avg       0.21      0.31      0.24      7936
     weighted avg       0.46      0.56      0.50      7936

Some categories are not produced by dabl right now, but I'm also a bit confused by some of the mistakes, see #4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

labels for columns in benchmark #1

labels for columns in benchmark #1

amueller commented Oct 23, 2020 •

edited

Loading

pvn25 commented Oct 23, 2020

amueller commented Oct 26, 2020

amueller commented Dec 23, 2020

labels for columns in benchmark #1

labels for columns in benchmark #1

Comments

amueller commented Oct 23, 2020 • edited Loading

pvn25 commented Oct 23, 2020

amueller commented Oct 26, 2020

amueller commented Dec 23, 2020

amueller commented Oct 23, 2020 •

edited

Loading