Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable torchao quantization in framework and group_bench #2116

Closed
wants to merge 1 commit into from

Conversation

xuzhao9
Copy link
Contributor

@xuzhao9 xuzhao9 commented Jan 16, 2024

Summary:
Support torchao quantization code in the framework.

Add a new config torch_ao.yaml in the group_bench userbenchmark.

Differential Revision: D52802534

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52802534

xuzhao9 added a commit to xuzhao9/benchmark that referenced this pull request Jan 17, 2024
Summary:

Support torchao quantization code in the framework.

Add a new config `torch_ao.yaml` in the group_bench userbenchmark.

Differential Revision: D52802534
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52802534

xuzhao9 added a commit to xuzhao9/benchmark that referenced this pull request Jan 17, 2024
Summary:

Support torchao quantization code in the framework.

Add a new config `torch_ao.yaml` in the group_bench userbenchmark.

Differential Revision: D52802534
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52802534

xuzhao9 added a commit to xuzhao9/benchmark that referenced this pull request Jan 23, 2024
Summary:

Support torchao quantization code in the framework.

Add a new config `torch_ao.yaml` in the group_bench userbenchmark.

Differential Revision: D52802534
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52802534

xuzhao9 added a commit to xuzhao9/benchmark that referenced this pull request Jan 23, 2024
Summary:

Support torchao quantization code in the framework.

Add a new config `torch_ao.yaml` in the group_bench userbenchmark.

Differential Revision: D52802534
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52802534

xuzhao9 added a commit to xuzhao9/benchmark that referenced this pull request Jan 23, 2024
Summary:

Support torchao quantization code in the framework.

Add a new config `torch_ao.yaml` in the group_bench userbenchmark.

Differential Revision: D52802534
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52802534

xuzhao9 added a commit to xuzhao9/benchmark that referenced this pull request Jan 23, 2024
Summary:

Support torchao quantization code in the framework.

Add a new config `torch_ao.yaml` in the group_bench userbenchmark.

Differential Revision: D52802534
Summary:

Support torchao quantization code in the framework.

Add a new config `torch_ao.yaml` in the group_bench userbenchmark.

Differential Revision: D52802534
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52802534

xuzhao9 added a commit to xuzhao9/benchmark that referenced this pull request Jan 23, 2024
Summary:

Support torchao quantization code in the framework.

Add a new config `torch_ao.yaml` in the group_bench userbenchmark.

Differential Revision: D52802534
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52802534

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52802534

@HDCharles
Copy link
Contributor

correct me if i'm wrong, the dynamo benchmarks are comparing the compiled vs non compiled models and this change would be quantized+compiled vs quantized, right? We'd want to compare quantized+compiled vs compiled to get something usable.

@xuzhao9
Copy link
Contributor Author

xuzhao9 commented Jan 25, 2024

correct me if i'm wrong, the dynamo benchmarks are comparing the compiled vs non compiled models and this change would be quantized+compiled vs quantized, right? We'd want to compare quantized+compiled vs compiled to get something usable.

@HDCharles This change would be comparing quantized+compiled vs. compiled.

model: "*"
test: eval
device: cuda
extra_args: --precision bf16 --torchdynamo inductor --inductor-compile-mode max-autotune
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HDCharles In this config, we define the baseline extra_args being --precision bf16 --torchdynamo inductor --inductor-compile-mode max-autotune, so it will apply this to every test_group/subgroup defined below.

test_batch_size_default:
subgroup:
- extra_args:
- extra_args: --quantization int8dynamic
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As shown in D52802534 test plan, here are the test results:

Running TorchBenchModelConfig(name='resnet50', test='eval', device='cuda', batch_size=None, extra_args=['--precision', 'bf16', '--torchdynamo', 'inductor', '--inductor-compile-mode', 'max-autotune', '--quantization', 'int8dynamic'], extra_env=None, output_dir=None) ... [done]
Running TorchBenchModelConfig(name='resnet50', test='eval', device='cuda', batch_size=None, extra_args=['--precision', 'bf16', '--torchdynamo', 'inductor', '--inductor-compile-mode', 'max-autotune', '--quantization', 'int8weightonly'], extra_env=None, output_dir=None) ... [done]
Running TorchBenchModelConfig(name='resnet50', test='eval', device='cuda', batch_size=None, extra_args=['--precision', 'bf16', '--torchdynamo', 'inductor', '--inductor-compile-mode', 'max-autotune', '--quantization', 'int4weightonly'], extra_env=None, output_dir=None) ... [done]

They are all running with compiler enabled.

@HDCharles
Copy link
Contributor

so looking at the run in the test plan, it looks really good.

I see its collecting latencies, we'd also like to collect is peak memory usage and compare everything to the compiled baseline.

Also theoretically run the test for bs=1 for the weight only quantization types. Though that's less important.

Copy link
Contributor

@HDCharles HDCharles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good though would like to add metric for peak cuda memory usage

and if it can't do it already, compare to baseline.

device: cuda
extra_args: --precision bf16 --torchdynamo inductor --inductor-compile-mode max-autotune
metrics:
- latencies
Copy link
Contributor Author

@xuzhao9 xuzhao9 Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add CPU/GPU peak memory, add cpu_peak_mem and gpu_peak_mem here. @HDCharles

@xuzhao9
Copy link
Contributor Author

xuzhao9 commented Jan 26, 2024

@HDCharles To add CPU and GPU memory, simply add cpu_peak_mem and gpu_peak_mem to the metrics section in the YAML file.

This PR is only a proof-of-concept of what the framework can do. We can leave further development to follow-up PRs.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 52a4b44.

@xuzhao9 xuzhao9 deleted the export-D52802534 branch January 26, 2024 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants