Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema, config and eval scripts to make hidden eval dataset work #11

Merged
merged 6 commits into from
Nov 11, 2023

Conversation

weiweiy
Copy link
Contributor

@weiweiy weiweiy commented Nov 9, 2023

@msaroufim, currently sam_sum is the only dataset doesn't work with helm-summerize and down stream eval scripts. Please have a look. Thanks.

name="sam_sum",
scenario_spec=scenario_spec,
adapter_spec=adapter_spec,
metric_specs= get_summarization_metric_specs({"task": "sam_sum", "device": 'cpu'})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yifanmai Just want to run a sanity check — is using a summarization metric here fine?

@weiweiy
Copy link
Contributor Author

weiweiy commented Nov 10, 2023

@msaroufim now all the metrics should work correctly. If we ran example budgets 3k, we should have enough samples to make the summarization tasks stable

scenario_spec=scenario_spec,
adapter_spec=adapter_spec,
metric_specs= get_summarization_metric_specs({"task": "sam_sum", "device": 'cpu'})
+ get_generative_harms_metric_specs(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you wanna keep this?

@weiweiy weiweiy merged commit 05b5e50 into neurips_eval Nov 11, 2023
3 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants