schema, config and eval scripts to make hidden eval dataset work #11

weiweiy · 2023-11-09T06:20:49Z

@msaroufim, currently sam_sum is the only dataset doesn't work with helm-summerize and down stream eval scripts. Please have a look. Thanks.

…config

weiweiy · 2023-11-10T05:56:07Z

src/helm/benchmark/run_specs.py

+        name="sam_sum",
+        scenario_spec=scenario_spec,
+        adapter_spec=adapter_spec,
+        metric_specs=   get_summarization_metric_specs({"task": "sam_sum", "device": 'cpu'})


@yifanmai Just want to run a sanity check — is using a summarization metric here fine?

weiweiy · 2023-11-10T06:01:42Z

@msaroufim now all the metrics should work correctly. If we ran example budgets 3k, we should have enough samples to make the summarization tasks stable

msaroufim · 2023-11-11T01:57:53Z

src/helm/benchmark/run_specs.py

+        scenario_spec=scenario_spec,
+        adapter_spec=adapter_spec,
+        metric_specs=   get_summarization_metric_specs({"task": "sam_sum", "device": 'cpu'})
+        + get_generative_harms_metric_specs(),


do you wanna keep this?

weiweiy added 4 commits November 9, 2023 06:11

schema, config and eval scripts to make hidden eval dataset work

4063351

upgrade datasets version to 2.14.6 and generate 1000 and 2000 sparse …

3cbee85

…config

take out cnn from open eval, added 3k eval config

5ddfd7b

change sam_sum to use summerization metrics

06490d2

weiweiy commented Nov 10, 2023

View reviewed changes

re-generate sparse_run_spec

5b56c79

update cause2corr to only do 1-shot examples

f3f85e4

msaroufim approved these changes Nov 11, 2023

View reviewed changes

msaroufim reviewed Nov 11, 2023

View reviewed changes

weiweiy merged commit 05b5e50 into neurips_eval Nov 11, 2023
3 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

schema, config and eval scripts to make hidden eval dataset work #11

schema, config and eval scripts to make hidden eval dataset work #11

weiweiy commented Nov 9, 2023

weiweiy Nov 10, 2023

weiweiy commented Nov 10, 2023

msaroufim Nov 11, 2023

schema, config and eval scripts to make hidden eval dataset work #11

schema, config and eval scripts to make hidden eval dataset work #11

Conversation

weiweiy commented Nov 9, 2023

weiweiy Nov 10, 2023

Choose a reason for hiding this comment

weiweiy commented Nov 10, 2023

msaroufim Nov 11, 2023

Choose a reason for hiding this comment