-
Notifications
You must be signed in to change notification settings - Fork 486
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Summary: Pull Request resolved: #2611 This diff extends the `fp8fp8bf16_rowwise` gemm operation to AMD through a new CK kernel. The new kernel requires new stride support that is only available in developer branches of CK, so we must rely on the `ai_codesign/gen_ai` CK repo. I also extend the fp8 benchmarking suite to include rowwise measurements. I'll soon add detailed benchmarking results but the quick summary is that performance looks quite good, typically inline with tensorwise quantization and sometimes faster, presumably due to using the latest and greatest CK pipelines. Full performance results can be found on the rowwise sheet linked below. Overall, the rowwise kernel is similarly performant to tensorwise scaling and actually does quite a bit better in some cases. https://docs.google.com/spreadsheets/d/1hK62EJIt9mZQJdZBxf5rGZ3u0hjMng7AkpwKnIjyn3k/edit#gid=881561326 Reviewed By: jianyuh, jiawenliu64 Differential Revision: D57600068 fbshipit-source-id: c5c533f1d332de293f788792816e535ab1131cec
- Loading branch information
1 parent
f739e0a
commit 8e335d3
Showing
5 changed files
with
334 additions
and
49 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.