Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Rowwise F8F8BF16 GEMMs - Auto-generate kernel library, auto-generated…
… heuristics cache, add to FBGEMM quantize_bench Summary: # Summary - Auto-generated F8F8BF16 Rowwise Scaled Kernels. - Auto-generation of Heuristic Cache. - Add to quantize_bench # Performance Improvements ## DisaggBench Cultass Prefill B=1 T=2048: Elapsed: 109.13ms FLOPs: 333.74TF/s Prefill B=1 T=4928: Elapsed: 272.55ms FLOPs: 338.62TF/s Prefill B=1 T=6336: Elapsed: 354.93ms FLOPs: 342.55TF/s Prefill B=1 T=8192: Elapsed: 468.64ms FLOPs: 346.06TF/s Cultass extensions Prefill B=1 T=2048: Elapsed: 108.83ms FLOPs: 334.66TF/s Prefill B=1 T=4928: Elapsed: 260.46ms FLOPs: 354.34TF/s Prefill B=1 T=6336: Elapsed: 336.39ms FLOPs: 361.43TF/s Prefill B=1 T=8192: Elapsed: 442.64ms FLOPs: 366.39TF/s Differential Revision: D63744054
- Loading branch information