Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Summary: Biggest movement is 4% HF inference, 9% TIMM inference. Note, this is max-autotune mode so we are more tolerant of compilation increases. We could improve compilation time by limiting: ``` # Take how many of the top triton kernels to benchmark epilogue max_epilogue_benchmarked_choices = 3 ``` There is a hf_Whisper failure which you can repro on main without this stack with `TORCHINDUCTOR_MAX_AUTOTUNE_GEMM_BACKENDS=TRITON TORCHINDUCTOR_MAX_AUTOTUNE=1 python benchmarks/dynamo/torchbench.py --backend inductor --amp --accuracy --training --only hf_Whisper`. When you turn off epilogue fusion, it fixes the accuracy. I bisected the failure to an epilogue, however when you compare the results of that epilogue with the corresponding separate kernels the results of the output are equivalent. Inference: <img width="1686" alt="image" src="https://github.com/pytorch/pytorch/assets/11477974/0b240080-cd33-4c08-89d3-583103b1fb0c"> Training: <img width="1329" alt="Screenshot 2024-04-16 at 6 16 30 PM" src="https://github.com/pytorch/pytorch/assets/11477974/db0afcc9-7288-4c27-84ce-4fc1a5690788"> X-link: pytorch/pytorch#124031 Approved by: https://github.com/Chillee, https://github.com/shunting314 ghstack dependencies: #124030, #122642, #123229, #122825 Reviewed By: jeanschmidt Differential Revision: D56379580 Pulled By: eellison fbshipit-source-id: 8e11d1636a2f48bb8c8d0380dad3a2ac76294422
- Loading branch information