Skip to content

Commit

Permalink
MoE BMM FP8 rowwise
Browse files Browse the repository at this point in the history
Summary:
Enable MoE BMM FP8 rowwise:
- MoE BMM FP8 rowwise achieves **up to 4.5x (2.1x on average) speedup compared to BF16 BMM**
- In E2E with MoE 16b x 16, FP8 with BMM achieves **1.2x speedup than BF16**
- Integrated in E2E and verified correctness which matches BF16 generations
- More results are in the [data sheet](https://docs.google.com/spreadsheets/d/1OLdz4MlzWS9pdgTBq4Jjy0-9_nPn-NmdrMolY0jZOXE/edit?gid=0#gid=0)

 {F1903027122}

Differential Revision: D63681109
  • Loading branch information
jiawenliu64 authored and facebook-github-bot committed Oct 2, 2024
1 parent c24a72d commit e3d6b1e
Show file tree
Hide file tree
Showing 5 changed files with 560 additions and 1 deletion.
1 change: 1 addition & 0 deletions fbgemm_gpu/experimental/gen_ai/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ else()
src/quantize/cutlass_extensions/f8f8bf16_blockwise.cu
src/quantize/cutlass_extensions/f8f8bf16_cublas.cu
src/quantize/cutlass_extensions/f8f8bf16_rowwise.cu
src/quantize/cutlass_extensions/f8f8bf16_rowwise_batched.cu
src/quantize/cutlass_extensions/f8f8bf16_tensorwise.cu
src/quantize/cutlass_extensions/i8i8bf16.cu
src/quantize/cutlass_extensions/f8i4bf16_rowwise.cu
Expand Down
Loading

0 comments on commit e3d6b1e

Please sign in to comment.