[tritonbench] Benchmark int4 gemm implementations #2261

bertmaher · 2024-05-10T17:27:55Z

Summary: Focusing on llama2-70b inference, this compares tinygemm (_weight_int4pack_mm) to a Triton implementation.

Note that these are not numerically equivalent now, as the Triton implementation does not apply scale and zero point. TODO!

Test Plan:

pytorch run_benchmark.py triton --op int4_gemm

Summary: Focusing on llama2-70b inference, this compares tinygemm (_weight_int4pack_mm) to a Triton implementation. Note that these are not numerically equivalent now, as the Triton implementation does not apply scale and zero point. TODO! Test Plan: ``` pytorch run_benchmark.py triton --op int4_gemm ```

facebook-github-bot · 2024-05-10T17:28:23Z

@bertmaher has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-05-16T16:19:32Z

@bertmaher merged this pull request in 43621bc.

facebook-github-bot added the cla signed label May 10, 2024

bertmaher temporarily deployed to docker-s3-upload May 10, 2024 17:28 — with GitHub Actions Inactive

bertmaher temporarily deployed to docker-s3-upload May 10, 2024 17:29 — with GitHub Actions Inactive

facebook-github-bot closed this in 43621bc May 16, 2024

facebook-github-bot added the Merged label May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tritonbench] Benchmark int4 gemm implementations #2261

[tritonbench] Benchmark int4 gemm implementations #2261

bertmaher commented May 10, 2024

facebook-github-bot commented May 10, 2024

facebook-github-bot commented May 16, 2024

[tritonbench] Benchmark int4 gemm implementations #2261

[tritonbench] Benchmark int4 gemm implementations #2261

Conversation

bertmaher commented May 10, 2024

facebook-github-bot commented May 10, 2024

facebook-github-bot commented May 16, 2024