-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sasarkar/qwen finetuning bucketing #1130
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@ssarkar2 This will only work for Qwen? Or is it also appliable with other models? |
Haven't tried other models, but should work for others, since change is in SFT trainer |
8x cmd:
{'train_runtime': 324.0838, 'train_samples_per_second': 30.12, 'train_steps_per_second': 0.471, 'train_loss': 1.5799240708351134, 'epoch': 0.45, 'memory_allocated (GB)': 39.92, 'max_memory_allocated (GB)': 84.57, 'total_memory_available (GB)': 94.62} ***** eval metrics ***** time to run: 6m40.188s withut bucket: run time= 48min |
TODO: The command i was using earlier didnt have " --deepspeed " maybe that's why?
|
|
tests/test_sft_train.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have some tests for sft.py
in https://github.com/huggingface/optimum-habana/blob/main/tests/test_examples.py. Is it possible to integrate this one there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some extra command line args that I have:
--subset
--max_steps ( to keep the test short)
--lora_target_modules "q_proj" "v_proj" "k_proj" "o_proj"
--lr_scheduler_type
--use_peft
--packing False
etc
Not sure how to fit these into test_examples.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to add any specific extra arguments in the extra_arguments
field of the json baseline files, here is an example for Qwen:
"extra_arguments": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if that's good enough, otherwise we'll just keep the current version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@regisss
Ok, I can almost make test_examples.py work I think.
I have a variation in the cmd line:
if "72" in model_name:
command += [
"--max_steps",
"50",
"--gradient_checkpointing",
"True",
"--pipelining_fwd_bwd",
"True",
]
else:
command += ["--max_steps", "100"]
And also, I have a certain deepspeed config file, which I use for qwen72b model, which I create as a tmp file and dump and use
if "72" in model_name:
command += ["--deepspeed", fp.name]
Any suggestions how these might be incorporated in test_examples.py? For the if-else in the cmd line based on 7b or 72b model, I could enter the qwen2-7 settings in Qwebn2_7B.json, and for 72b model I could create a new json Qwen2-72B.json ? However, MODELS_TO_TEST_MAPPING has entry: "qwen2": [("Qwen/Qwen2-7B", "Habana/qwen")],, so then I'd have to change that as well maybe?
just checking if you have some easy/obvious solution, else I'll stick to the test file I have now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the DeepSpeed config, you can also store it here and pass an argument like this one:
"--deepspeed tests/configs/deepspeed_zero_3_gaudi1.json" |
But that's up to you, using a temporary file is fine too.
For the if-else in the cmd line based on 7b or 72b model, I could enter the qwen2-7 settings in Qwebn2_7B.json, and for 72b model I could create a new json Qwen2-72B.json ?
Yes, that sounds good!
However, MODELS_TO_TEST_MAPPING has entry: "qwen2": [("Qwen/Qwen2-7B", "Habana/qwen")],, so then I'd have to change that as well maybe?
Yep, you would probably need to change it to:
"qwen2": [("Qwen/Qwen2-7B", "Habana/qwen"), ("name_of_72B_checkpoint", "Habana/qwen")]
Then, we will also need to add a rule in this big conditional block to make sure the Qwen-72B is executed only for your test, but I can help with that at the end if you don't see how to do it.
24c8941
to
ea98a17
Compare
Trying with llama2-7b:
without bucket:
bucket=8:
|
sasarkar/qwen finetuning bucketing huggingface#1130
What does this PR do?
Add bucketing to Qwen training
earlier version: #1128
results:
Numbers without this branch:
Fixes # (issue)
Before submitting