-
Notifications
You must be signed in to change notification settings - Fork 425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable composable benchmark configs for flexible model+device+optimization scheduling #7349
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7349
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit c9f0156 with merge base 6ab4399 (): NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
a70add1
to
2e2ab00
Compare
2e2ab00
to
19197a1
Compare
edba8e4
to
0289f0a
Compare
8a9df92
to
a7dc617
Compare
84c943e
to
6e7a7b1
Compare
c72fb73
to
d030e94
Compare
d030e94
to
95ff5de
Compare
68c1719
to
4940f44
Compare
Core ML ANE job (ios 17): https://github.com/pytorch/executorch/actions/runs/12403482481 |
4940f44
to
38713c2
Compare
3120d9c
to
39db916
Compare
@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@huydhn FYI, Core ML ANE and Qualcomm HTP jobs are not going to block this PR. I will merge this PR tonight to unblock you, and leave adding those paths separately if they are not working out of the box |
Sounds good! Once this lands, I could start working on bringing the benchmark config to the dashboard |
39db916
to
c9f0156
Compare
@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Core ML ANE and Qualcomm HTP are not ready. Will enable in separate PRs. For now, we just hide these paths from |
@kirklandsign FYI, the results of newly added llama3 benchmarkings (spintquant, qlora, bf16) are also missing from the benchmark_results.json. See the upload job here: https://github.com/pytorch/executorch/actions/runs/12404651330/job/34631545501 |
ExecuTorch has replaced the backend field with a more generic benchmark configs concept, so the dashboard will display it instead. * pytorch/executorch#7349 * pytorch/executorch#7433 ### Testing https://torchci-git-fork-huydhn-add-executorch-backend-fbopensource.vercel.app/benchmark/llms?startTime=Tue%2C%2017%20Dec%202024%2019%3A05%3A31%20GMT&stopTime=Tue%2C%2024%20Dec%202024%2019%3A05%3A31%20GMT&granularity=hour&lBranch=handle-benchmark-config-dashboard&lCommit=c48da2bd2c9a32705db9b1adf638344474c275a4&rBranch=handle-benchmark-config-dashboard&rCommit=c33f815eff17c8890f2c8527dc0f0dbca50b4397&repoName=pytorch%2Fexecutorch&modelName=All%20Models&backendName=All%20Backends&dtypeName=All%20DType&deviceName=All%20Devices
It's not true that any combination of model + delegate can work, which make adding a new model to continuous run not easy as the workflow will run it across all delegates. Besides, each delegate may run with different configurations. For example, llama3.2 spinquant is using a prequantized checkpoitn hence it's not using the recipe for the regular fp32 checkpint. To support various combinations and optimizations, we are migrating to use
benchmark_configs
which is a set of predefined configs with combination of all optimizations that are possibly applied to the model, e.g. kv cache, embedding/activation quant, dtype, delegation, sdpa, etc.In this PR, given a model (either a Hugging Face model ID or a in-tree model name) and a target platform ("android" vs "ios"), it's retrieving a list of supported benchmark configurations from the script
gather_benchmark_configs.py
and schedule the benchmark jobs accordingly. From the workflow dispatcher, users will just need to enter the model names, it will discover all supported benchmark configs for each model. Further more (not included in this PR), we can potentially exposeconfig_args
(key-value paris) from the script, if there is a way to store them in the DB and display in the dashboard. It will help understand how exactly a model is exported/lowered when discussing/debugging perf metrics.Apple: https://github.com/pytorch/executorch/actions/runs/12404655922
Android: https://github.com/pytorch/executorch/actions/runs/12404651330