Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add coefficient of variance to the bench mark report. #1554

Merged
merged 1 commit into from
Jul 8, 2024

Conversation

chengjunlu
Copy link
Contributor

Add coefficient of variance to the bench mark report in the micro-benchmark report.

@pbchekin
Copy link
Contributor

pbchekin commented Jul 3, 2024

Please test with the actual workflow (you can run "Triton benchmarks" for your branch). Currently the changes do not work:
https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/9781720957/job/27006440292

@chengjunlu chengjunlu force-pushed the chengjun/llvm-target-add-cv-in-microbench branch from 277cfe3 to c0f624d Compare July 8, 2024 01:22
@pbchekin
Copy link
Contributor

pbchekin commented Jul 8, 2024

Successful run: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/9833683619/job/27144280954

softmax-performance:
         N  Triton-GB/s  XeTLA-GB/s  Triton-GB/s-min  XeTLA-GB/s-min  Triton-GB/s-max  XeTLA-GB/s-max  Triton-TFlops  XeTLA-TFlops  Triton-TFlops-min  XeTLA-TFlops-min  Triton-TFlops-max  XeTLA-TFlops-max  Triton-CV  XeTLA-CV
0    256.0   666.959247  751.912338       639.375598      476.625457       708.497308      873.813292       0.666959      0.751912           0.639376          0.476625           0.708497          0.873813   0.020683  0.112134
1   1024.0   852.178476  871.008855       845.625798      794.375734       866.591724     1205.259785       0.852178      0.871009           0.845626          0.794376           0.866592          1.205260   0.006625  0.057388
2   2048.0  1326.681649  924.058975      1152.281316      822.412594      1407.484513     1327.311359       1.326682      0.924059           1.152281          0.822413           1.407485          1.327311   0.047909  0.077342
3   4096.0   777.372987  774.463015       718.202711      716.975062       812.849658     1158.647559       0.777373      0.774463           0.718203          0.716975           0.812850          1.158648   0.030980  0.068119
4   8192.0   797.892135  746.956533       772.431690      724.404855       870.187483      812.062733       0.797892      0.746957           0.772432          0.724405           0.870187          0.812063   0.020727  0.021783
5  16384.0   771.169338  753.176996       761.908050      745.654015       794.752057      782.519359       0.771169      0.753177           0.761908          0.745654           0.794752          0.782519   0.010341  0.009802
6  32768.0   840.465288  839.332881       834.064957      832.409566       848.834653      852.284270       0.840465      0.839333           0.834065          0.832410           0.848835          0.852284   0.005450  0.006[801](https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/9833683619/job/27144280954#step:16:802)

XeTLA-CV for N=256 is 11%.

@chengjunlu
Copy link
Contributor Author

Successful run: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/9833683619/job/27144280954

softmax-performance:
         N  Triton-GB/s  XeTLA-GB/s  Triton-GB/s-min  XeTLA-GB/s-min  Triton-GB/s-max  XeTLA-GB/s-max  Triton-TFlops  XeTLA-TFlops  Triton-TFlops-min  XeTLA-TFlops-min  Triton-TFlops-max  XeTLA-TFlops-max  Triton-CV  XeTLA-CV
0    256.0   666.959247  751.912338       639.375598      476.625457       708.497308      873.813292       0.666959      0.751912           0.639376          0.476625           0.708497          0.873813   0.020683  0.112134
1   1024.0   852.178476  871.008855       845.625798      794.375734       866.591724     1205.259785       0.852178      0.871009           0.845626          0.794376           0.866592          1.205260   0.006625  0.057388
2   2048.0  1326.681649  924.058975      1152.281316      822.412594      1407.484513     1327.311359       1.326682      0.924059           1.152281          0.822413           1.407485          1.327311   0.047909  0.077342
3   4096.0   777.372987  774.463015       718.202711      716.975062       812.849658     1158.647559       0.777373      0.774463           0.718203          0.716975           0.812850          1.158648   0.030980  0.068119
4   8192.0   797.892135  746.956533       772.431690      724.404855       870.187483      812.062733       0.797892      0.746957           0.772432          0.724405           0.870187          0.812063   0.020727  0.021783
5  16384.0   771.169338  753.176996       761.908050      745.654015       794.752057      782.519359       0.771169      0.753177           0.761908          0.745654           0.794752          0.782519   0.010341  0.009802
6  32768.0   840.465288  839.332881       834.064957      832.409566       848.834653      852.284270       0.840465      0.839333           0.834065          0.832410           0.848835          0.852284   0.005450  0.006[801](https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/9833683619/job/27144280954#step:16:802)

XeTLA-CV for N=256 is 11%.

Let's use this issue #1566 to track the outlier and the other issue found here.

@pbchekin pbchekin merged commit 69eba17 into llvm-target Jul 8, 2024
7 checks passed
@pbchekin pbchekin deleted the chengjun/llvm-target-add-cv-in-microbench branch July 8, 2024 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[softmax] Investigate performance variation / degradation from c141986 to 93d168c
3 participants