-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate data for model evaluation using the MMLU benchmark #180
Conversation
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it work out if most of this goes into a new instructlab.sdg.eval_data
module?
eval_data.py:
def _create_mmlu_evaluation_dataset(...):
...
def _ create_mmlu_evaluation_task(..):
...
def generate_eval_task_data(mmlubench_pipe, task_name, samples):
mmlubench_data = mmlubench_pipe.generate(samples)
...
create_mmlu_evaluation_dataset(mmlubench_data)
...
create_mmlu_evaluation_task(task_name, ....)
def mmlubench_pipe_init(ctx):
with resources.as_file(
resources.files(EVAL_PIPELINES_PKG).joinpath("mmlu_bench.yaml")
) as yaml_path:
return Pipeline.from_file(ctx, yaml_path)
See also instructlab/eval#35 |
Ok, I've done a bit of refactoring to match my review comments. I haven't done any testing, however. |
src/instructlab/sdg/eval_data.py
Outdated
task_yaml = { | ||
"task": task_name, | ||
"dataset_kwargs": {"data_files": {"test": eval_data_file_path}}, | ||
"include": "_default_mmlu_pr_template_yaml", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely clear where this _default_mmlu_pr_template_yaml
lives - the eval library?
I'm definitely nervous about tight coupling between the libraries - this sort of thing can make it difficult to make changes
Might it make sense to put _default_mmlu_pr_template_yaml
in the same codebase that generates this task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the file its referencing ? https://github.com/instructlab/instructlab/blob/main/tests/testdata/mmlu_branch/_default_mmlu_pr_template_yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry - I thought I linked to that. Yes, that's what it looks like
Thinking about it afterwards ... and looking at https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/task_guide.md#including-a-base-yaml
Why not just include all that task config in the file we write out? There's no huge benefit to referencing another file and all the coordination that comes with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense looks like part of it is included here already, may aswell have it all
The code as-is at least runs and produces an jsonl file |
Looking at the example mentioned in instructlab/eval#35 Some differences what this PR produces and the example are this version includes a number of key/value pairs that arn't in the example (and I assume not required) e.g. mmlubench_question, mmlubench_answer, output, icl_*, task_description the example has "content" where we have "document" The example has keys the we don't have origin_branch_name, pull_request, input, targets, row_idx, path @nathan-weinberg here is an example of what the PR produces currently (I've formatted the individual lines into multiple lines) |
Reading https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/new_task_guide.md#writing-a-prompt-template and looking at examples
That suggests that only I would imagine any differences in the extra columns won't cause a problem. Personally, I'd prefer to trim the final dataset to the minimum that's needed for lm-eval, but that can be a follow-up after we merge the PR IMO. I've filed #183 for that |
Ok, if you merge the |
Oh, and update the
|
will do, There is still an outstanding question about the format of the output mmlubench file (see #180 (comment) ) , I think we need to get an answer on that first as some of it needs to change, or should we just follow up with any required changes? |
@derekhiggins @markmc WRT discussion around Eval compatibility - @khaledsulayman and @alinaryan are going to run a quick test with the sample data you provided here https://goodsquishy.com/upload/d7bc7c5bc21b5b2d2cbc with MMLUBranch to ensure compatibility - if there's no issues with that we should be good from an Eval POV, cc @alimaredia @danmcp |
This pull request has merge conflicts that must be resolved before it can be |
Rebased from aakankshaduggal#15 and aakankshaduggal#16 Changed by me (synth data wasn't the correct format without it(merlinite-7b-lab)) src/instructlab/sdg/pipelines/simple/mmlu_bench.yaml - temperature: 0 + temperature: 0.7 Refactor MMLU bench code into eval_data.py The MMLU bench pipeline is different from the training samples generation pipeline - instead it is generating a dataset that can be used to evaluate the model performance. Let's assume there could be multiple eval sdg pipelines in future, and they could be used with any of the training data pipelines, so put mmlu_bench.yaml in instructlab.sdg.pipelines.eval. Also encapsulate all the relevant code into a new sdg.eval_data Python module whose main interface is: ``` mmlu_bench_pipe = eval_data.mmlubench_pipe_init(ctx) eval_data.generate_eval_task_data(mmlu_bench_pipe, task_name, samples, ...) ``` Closes: instructlab#170 Co-authored-by: shiv <shivchander.s30@gmail.com> Co-authored-by: abhi1092 <abhi1092@gmail.com> Co-authored-by: Aakanksha Duggal <aduggal@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Derek Higgins <derekh@redhat.com>
This makes more sense if we want to use PipelineContext to separately initialize the mmlu-bench pipeline. Suggestion from @bbrowning in instructlab#163. Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Sorry, I hit merge and remembered this is what I was waiting for - let us know if you hit any issues and @derekhiggins or I will resolve them quickly 👍 |
Np @markmc we will followup if any changes are needed 👍 |
@markmc @derekhiggins it seems we are missing the necessary yaml files to put in the tasks directory in order to test the data @nathan-weinberg linked. Is this just the |
See #180 (comment)
But do let us know if something specific was missed |
@markmc I'm sorry I seem to be missing the task yaml you're referring to. Is it linked here or is it somewhere in the repository? the goodsquishy link in Nathan's comment appears to be jsonl. |
Here's where it gets generated by sdg/src/instructlab/sdg/eval_data.py Lines 120 to 125 in 842eb04
|
commit d280df9
Generate mmlubench data
commit 54c113a (HEAD -> main, me/mmlubench)
sdg_init() refactor - take a PipelineContext rather than returning one