-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable torchao quantization in framework and group_bench #2116
Conversation
This pull request was exported from Phabricator. Differential Revision: D52802534 |
Summary: Support torchao quantization code in the framework. Add a new config `torch_ao.yaml` in the group_bench userbenchmark. Differential Revision: D52802534
8fec3de
to
c7e3ed3
Compare
This pull request was exported from Phabricator. Differential Revision: D52802534 |
Summary: Support torchao quantization code in the framework. Add a new config `torch_ao.yaml` in the group_bench userbenchmark. Differential Revision: D52802534
c7e3ed3
to
a03da8f
Compare
This pull request was exported from Phabricator. Differential Revision: D52802534 |
a03da8f
to
179f0e7
Compare
Summary: Support torchao quantization code in the framework. Add a new config `torch_ao.yaml` in the group_bench userbenchmark. Differential Revision: D52802534
This pull request was exported from Phabricator. Differential Revision: D52802534 |
Summary: Support torchao quantization code in the framework. Add a new config `torch_ao.yaml` in the group_bench userbenchmark. Differential Revision: D52802534
179f0e7
to
93d3b26
Compare
This pull request was exported from Phabricator. Differential Revision: D52802534 |
Summary: Support torchao quantization code in the framework. Add a new config `torch_ao.yaml` in the group_bench userbenchmark. Differential Revision: D52802534
93d3b26
to
0fecc19
Compare
This pull request was exported from Phabricator. Differential Revision: D52802534 |
Summary: Support torchao quantization code in the framework. Add a new config `torch_ao.yaml` in the group_bench userbenchmark. Differential Revision: D52802534
0fecc19
to
3c42fc6
Compare
Summary: Support torchao quantization code in the framework. Add a new config `torch_ao.yaml` in the group_bench userbenchmark. Differential Revision: D52802534
This pull request was exported from Phabricator. Differential Revision: D52802534 |
Summary: Support torchao quantization code in the framework. Add a new config `torch_ao.yaml` in the group_bench userbenchmark. Differential Revision: D52802534
3c42fc6
to
b5d3f0b
Compare
This pull request was exported from Phabricator. Differential Revision: D52802534 |
b5d3f0b
to
4cda199
Compare
This pull request was exported from Phabricator. Differential Revision: D52802534 |
correct me if i'm wrong, the dynamo benchmarks are comparing the compiled vs non compiled models and this change would be quantized+compiled vs quantized, right? We'd want to compare quantized+compiled vs compiled to get something usable. |
@HDCharles This change would be comparing quantized+compiled vs. compiled. |
model: "*" | ||
test: eval | ||
device: cuda | ||
extra_args: --precision bf16 --torchdynamo inductor --inductor-compile-mode max-autotune |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HDCharles In this config, we define the baseline extra_args being --precision bf16 --torchdynamo inductor --inductor-compile-mode max-autotune
, so it will apply this to every test_group/subgroup defined below.
test_batch_size_default: | ||
subgroup: | ||
- extra_args: | ||
- extra_args: --quantization int8dynamic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As shown in D52802534 test plan, here are the test results:
Running TorchBenchModelConfig(name='resnet50', test='eval', device='cuda', batch_size=None, extra_args=['--precision', 'bf16', '--torchdynamo', 'inductor', '--inductor-compile-mode', 'max-autotune', '--quantization', 'int8dynamic'], extra_env=None, output_dir=None) ... [done]
Running TorchBenchModelConfig(name='resnet50', test='eval', device='cuda', batch_size=None, extra_args=['--precision', 'bf16', '--torchdynamo', 'inductor', '--inductor-compile-mode', 'max-autotune', '--quantization', 'int8weightonly'], extra_env=None, output_dir=None) ... [done]
Running TorchBenchModelConfig(name='resnet50', test='eval', device='cuda', batch_size=None, extra_args=['--precision', 'bf16', '--torchdynamo', 'inductor', '--inductor-compile-mode', 'max-autotune', '--quantization', 'int4weightonly'], extra_env=None, output_dir=None) ... [done]
They are all running with compiler enabled.
so looking at the run in the test plan, it looks really good. I see its collecting latencies, we'd also like to collect is peak memory usage and compare everything to the compiled baseline. Also theoretically run the test for bs=1 for the weight only quantization types. Though that's less important. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good though would like to add metric for peak cuda memory usage
and if it can't do it already, compare to baseline.
device: cuda | ||
extra_args: --precision bf16 --torchdynamo inductor --inductor-compile-mode max-autotune | ||
metrics: | ||
- latencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add CPU/GPU peak memory, add cpu_peak_mem
and gpu_peak_mem
here. @HDCharles
@HDCharles To add CPU and GPU memory, simply add This PR is only a proof-of-concept of what the framework can do. We can leave further development to follow-up PRs. |
This pull request has been merged in 52a4b44. |
Summary:
Support torchao quantization code in the framework.
Add a new config
torch_ao.yaml
in the group_bench userbenchmark.Differential Revision: D52802534