New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add sum reduction operator to TritonBench #2282

Closed

jananisriram wants to merge 1 commit into pytorch:main from jananisriram:export-D58048782

Contributor

jananisriram commented Jun 6, 2024 •

edited

Loading

Summary:
Add a Triton reduction kernel for the sum operator where dim=None to TritonBench, following the TritonBench guide. This implementation works for all matrices being reduced to a scalar value.

To measure accuracy of Triton reduction kernel, add accuracy metric to sum kernel in TritonBench in order to test accuracy of Triton implementation against baseline PyTorch implementation, referencing torchbenchmark/operators/gemm/operator.py. Reset output registers per run of the Triton kernel for accurate Triton output.

To measure performance of the Triton reduction kernel against PyTorch, add gbps metric, referencing torchbenchmark/operators/vector_add/operator.py.

Referenced the existing vector_add and grouped_gemm TritonBench operators as frameworks for implementation.

See the TritonBench Operator Coverage Tracker for current operator coverage in TritonBench.

Reviewed By: xuzhao9, davidberard98

Differential Revision: D58048782

facebook-github-bot added the cla signed label

Contributor

facebook-github-bot commented Jun 6, 2024

This pull request was exported from Phabricator. Differential Revision: D58048782

facebook-github-bot added the fb-exported label

jananisriram had a problem deploying to docker-s3-upload

June 6, 2024 20:10

— with

GitHub Actions Error

jananisriram had a problem deploying to docker-s3-upload

June 6, 2024 20:10

— with

GitHub Actions Error

jananisriram force-pushed the export-D58048782 branch from 5b1d48f to 605f419 Compare

June 6, 2024 20:12

Contributor

facebook-github-bot commented Jun 6, 2024

This pull request was exported from Phabricator. Differential Revision: D58048782

jananisriram had a problem deploying to docker-s3-upload

June 6, 2024 20:13

— with

GitHub Actions Error

jananisriram had a problem deploying to docker-s3-upload

June 6, 2024 20:13

— with

GitHub Actions Error

jananisriram added a commit to jananisriram/benchmark that referenced this pull request


          Add sum reduction operator to TritonBench (pytorch#2282)

c331f9c

Summary:

Add a Triton reduction kernel for the `sum` operator where `dim=None` to TritonBench, following the [TritonBench guide](https://fb.workplace.com/notes/953949486404240). This implementation works for all matrices being reduced to a scalar value.

To measure accuracy of Triton reduction kernel, add accuracy metric to sum kernel in TritonBench in order to test accuracy of Triton implementation against baseline PyTorch implementation, referencing [`torchbenchmark/operators/gemm/operator.py`](https://www.internalfb.com/code/fbsource/[767bb6faa353685b84f08a39f36fdcf6ca170c85]/fbcode/pytorch/benchmark/torchbenchmark/operators/gemm/operator.py?lines=236). Reset output registers per run of the Triton kernel for accurate Triton output.

Referenced the existing [vector_add](https://www.internalfb.com/code/fbsource/fbcode/pytorch/benchmark/torchbenchmark/operators/vector_add/) and [grouped_gemm](https://www.internalfb.com/code/fbsource/fbcode/pytorch/benchmark/torchbenchmark/operators/grouped_gemm/) TritonBench operators as frameworks for implementation.

See the [TritonBench Operator Coverage Tracker](https://docs.google.com/spreadsheets/d/1091POOPSPsUnlNVEKaz2X_DQXdIwFv-fGOH_g9by-Zo/edit#gid=0) for current operator coverage in TritonBench.

Reviewed By: xuzhao9, davidberard98

Differential Revision: D58048782

jananisriram force-pushed the export-D58048782 branch from 605f419 to c331f9c Compare

June 6, 2024 20:14

Contributor

facebook-github-bot commented Jun 6, 2024

This pull request was exported from Phabricator. Differential Revision: D58048782

jananisriram had a problem deploying to docker-s3-upload

June 6, 2024 20:14

— with

GitHub Actions Error

jananisriram had a problem deploying to docker-s3-upload

June 6, 2024 20:15

— with

GitHub Actions Error

jananisriram force-pushed the export-D58048782 branch from c331f9c to 6d44f49 Compare

June 6, 2024 20:22

Contributor

facebook-github-bot commented Jun 6, 2024

This pull request was exported from Phabricator. Differential Revision: D58048782

jananisriram had a problem deploying to docker-s3-upload

June 6, 2024 20:22

— with

GitHub Actions Error

jananisriram had a problem deploying to docker-s3-upload

June 6, 2024 20:22

— with

GitHub Actions Error


          Add sum reduction operator to TritonBench (pytorch#2282)

1f8df93

Summary:

Add a Triton reduction kernel for the `sum` operator where `dim=None` to TritonBench, following the [TritonBench guide](https://fb.workplace.com/notes/953949486404240). This implementation works for all matrices being reduced to a scalar value.

To measure accuracy of Triton reduction kernel, add accuracy metric to sum kernel in TritonBench in order to test accuracy of Triton implementation against baseline PyTorch implementation, referencing [`torchbenchmark/operators/gemm/operator.py`](https://www.internalfb.com/code/fbsource/[767bb6faa353685b84f08a39f36fdcf6ca170c85]/fbcode/pytorch/benchmark/torchbenchmark/operators/gemm/operator.py?lines=236). Reset output registers per run of the Triton kernel for accurate Triton output.

To measure performance of the Triton reduction kernel against PyTorch, add gbps metric, referencing [`torchbenchmark/operators/vector_add/operator.py`](https://www.internalfb.com/code/fbsource/[858eda681c7618f9427ba55cef8d4aba712cb26e]/fbcode/pytorch/benchmark/torchbenchmark/operators/vector_add/operator.py?lines=19).

Referenced the existing [vector_add](https://www.internalfb.com/code/fbsource/fbcode/pytorch/benchmark/torchbenchmark/operators/vector_add/) and [grouped_gemm](https://www.internalfb.com/code/fbsource/fbcode/pytorch/benchmark/torchbenchmark/operators/grouped_gemm/) TritonBench operators as frameworks for implementation.

See the [TritonBench Operator Coverage Tracker](https://docs.google.com/spreadsheets/d/1091POOPSPsUnlNVEKaz2X_DQXdIwFv-fGOH_g9by-Zo/edit#gid=0) for current operator coverage in TritonBench.

Reviewed By: xuzhao9, davidberard98

Differential Revision: D58048782

jananisriram force-pushed the export-D58048782 branch from 6d44f49 to 1f8df93 Compare

June 6, 2024 20:23

Contributor

facebook-github-bot commented Jun 6, 2024

This pull request was exported from Phabricator. Differential Revision: D58048782

jananisriram had a problem deploying to docker-s3-upload

June 6, 2024 20:24

— with

GitHub Actions Failure

jananisriram had a problem deploying to docker-s3-upload

June 6, 2024 20:24

— with

GitHub Actions Failure

facebook-github-bot closed this in

2d8999b

facebook-github-bot added the Merged label

Contributor

facebook-github-bot commented Jun 7, 2024

This pull request has been merged in 2d8999b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed fb-exported Merged