-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensor parallel distributed strategy without using deepspeed #1121
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments. Additionally, can you:
- run
make style
? - add an example command in the README of the text-generation example?
- add a test for it in https://github.com/huggingface/optimum-habana/blob/main/tests/test_text_generation_example.py?
- add a link to the original implementation in all files that are inspired from it?
- check if there are any copyrights to cite?
Done |
8ecceca
to
2b5f46e
Compare
@regisss Addressed all your comments. Please review the changes. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update your main branch and merge it into this PR? To have the whole CI working again.
examples/text-generation/README.md
Outdated
@@ -264,6 +264,35 @@ set the following environment variables before running the command: `PT_ENABLE_I | |||
|
|||
You will also need to add `--torch_compile` in your command. | |||
|
|||
### Running with Tesor parallel strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Running with Tesor parallel strategy | |
### Running with tensor-parallel strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
examples/text-generation/README.md
Outdated
### Running with Tesor parallel strategy | ||
#### Attribution | ||
|
||
This repository includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This repository includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details. | |
> [!NOTE] | |
> This strategy includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
examples/text-generation/README.md
Outdated
@@ -264,6 +264,35 @@ set the following environment variables before running the command: `PT_ENABLE_I | |||
|
|||
You will also need to add `--torch_compile` in your command. | |||
|
|||
### Running with Tesor parallel strategy | |||
#### Attribution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove that line, let's put it in a "box" as suggested below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added in the box, but not sure if this syntax had to be preserved [!WARNING]
examples/text-generation/README.md
Outdated
|
||
This repository includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details. | ||
|
||
torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models. To enable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models. To enable | |
> [!WARNING] | |
> torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models. | |
To enable... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, Added Note and Warning in box
@kalyanjk can you update based on review comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few more comments to address.
Also, the test fails on my instance with this error:
Traceback (most recent call last):
File "/root/workspace/fork/examples/text-generation/run_generation.py", line 674, in <module>
main()
File "/root/workspace/fork/examples/text-generation/run_generation.py", line 317, in main
model, assistant_model, tokenizer, generation_config = initialize_model(args, logger)
File "/root/workspace/fork/examples/text-generation/utils.py", line 592, in initialize_model
else setup_distributed_model_tp(args, model_dtype, model_kwargs, logger)
File "/root/workspace/fork/examples/text-generation/utils.py", line 281, in setup_distributed_model_tp
lazy_sd = serialization.load_state_dict(
File "/usr/local/lib/python3.10/dist-packages/optimum/habana/distributed/serialization.py", line 191, in load_state_dict
assert len(checkpoints) > 0, f"Can't find the requested checkpoint data at {model_path}"
AssertionError: Can't find the requested checkpoint data at meta-llama/Llama-2-7b-hf
Any idea about what's going on? It seems like a serialization issue. Or is it because it requires Synapse 1.17? I'm running 1.16.
examples/text-generation/README.md
Outdated
@@ -264,6 +264,38 @@ set the following environment variables before running the command: `PT_ENABLE_I | |||
|
|||
You will also need to add `--torch_compile` in your command. | |||
|
|||
### Running with tesor-parallel strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Running with tesor-parallel strategy | |
### Running with tensor-parallel strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
examples/text-generation/README.md
Outdated
```bash | ||
NOTE: This strategy includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details. | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```bash | |
NOTE: This strategy includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details. | |
``` | |
> [!NOTE] | |
> This strategy includes code from the [foundation-model-stack](https://github.com/foundation-model-stack/foundation-model-stack) repository, which is licensed under the Apache License 2.0. See the `LICENSE` file for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated with the suggested format
examples/text-generation/README.md
Outdated
|
||
```bash | ||
WARNING: torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models. | ||
``` | ||
To enable torch.compile with tensor parallel strategy, please set the following environment variables before running the | ||
command: `PT_ENABLE_INT64_SUPPORT=1` and `PT_HPU_LAZY_MODE=0`. This will enable tensor parallel strategy without deepspeed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```bash | |
WARNING: torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models. | |
``` | |
To enable torch.compile with tensor parallel strategy, please set the following environment variables before running the | |
command: `PT_ENABLE_INT64_SUPPORT=1` and `PT_HPU_LAZY_MODE=0`. This will enable tensor parallel strategy without deepspeed. | |
> [!WARNING] | |
> torch.compile with tensor parallel strategy is an experimental feature. It has not been validated for all models. | |
To enable torch.compile with tensor parallel strategy, please set the following environment variables before running the | |
command: `PT_ENABLE_INT64_SUPPORT=1` and `PT_HPU_LAZY_MODE=0`. This will enable tensor parallel strategy without deepspeed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated with the suggested format
examples/text-generation/README.md
Outdated
|
||
Here is an example: | ||
```bash | ||
python ../gaudi_spawn.py --world_size 8 run_generation.py \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python ../gaudi_spawn.py --world_size 8 run_generation.py \ | |
PT_ENABLE_INT64_SUPPORT=1 PT_HPU_LAZY_MODE=0 python ../gaudi_spawn.py --world_size 8 run_generation.py \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Please run |
Can you provide with absolute path for meta-llama/Llama-2-7b-hf. All my testing is on 1.17, I will verify on 1.16 and update. |
@regisss successfully verified the sanity test for the 1.16 release using both the 7b and 70b models. Everything is working fine. |
There is no absolute path, this is the hub model id and I really think this use case should work as not everybody has the models stored locally. If the absolute path to the model is needed, there should be some code to find the model in the Transformers cache. You can get the default path to cache with:
More information about the structure of the cache here: https://huggingface.co/docs/huggingface_hub/v0.24.2/en/guides/manage-cache#understand-caching Also, I see I forgot to mention it, can you replace the arg |
Added test in tests/test_text_generation_example.py add a link to the original implementation for the referenced files
Updated : renamed the distributed_strategy to parallel_strategy. |
Updated cache_dir setting for parallel_strategy = tp @regisss can you please verify if you are able to load the data now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes, it looks good to me!
One last thing, as written in the comment below, the test fails on my instance because the throughput I get is too low. Maybe due to a different version of Synapse?
…face#1121) Co-authored-by: Kalyan <kkumar@habana.ai>
…face#1121) Co-authored-by: Kalyan <kkumar@habana.ai>
* Revert "Tensor parallel distributed strategy without using deepspeed (#280) (#299)" This reverts commit 32c86d3. * Tensor parallel distributed strategy without using deepspeed (huggingface#1121) Co-authored-by: Kalyan <kkumar@habana.ai> --------- Co-authored-by: Kalyan <kkumar@habana.ai>
* Revert "Tensor parallel distributed strategy without using deepspeed (#280)" This reverts commit c6e5f9c. * Tensor parallel distributed strategy without using deepspeed (huggingface#1121) Co-authored-by: Kalyan <kkumar@habana.ai> --------- Co-authored-by: Kalyan <kkumar@habana.ai>
Tensor parallel by extending GaudiLlamaAttention -> TPGaudiLlamaAttention and GaudiLlamaMLP -> TPGaudiLlamaMLP
use parameter --distributed_strategy="tp" to invoke this code path
code design reference: https://github.com/foundation-model-stack/foundation-model-stack/tree/main