Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Summary: Enable MoE BMM FP8 rowwise: - MoE BMM FP8 rowwise achieves **up to 4.5x (2.1x on average) speedup compared to BF16 BMM** - In E2E with MoE 16b x 16, FP8 with BMM achieves **1.2x speedup than BF16** - Integrated in E2E and verified correctness which matches BF16 generations - More results are in the [data sheet](https://docs.google.com/spreadsheets/d/1OLdz4MlzWS9pdgTBq4Jjy0-9_nPn-NmdrMolY0jZOXE/edit?gid=0#gid=0) {F1903027122} Differential Revision: D63681109
- Loading branch information