Add a better performing config in the matmul example #1139
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It has been reported in #1122 that the performance in the matmul tutorial is way below
torch.matmul
performance.After playing with the parameters I found that the current grid search does not seem adapted to the max series gpu.
Adding this set of parameters to the grid search (basically changing
num_warps
from2
to16
to the config that I found is selected as the best config) gives a big (3 times) speedup on the 512 * 512 matmul:for comparison, those are the performances I get from this tutorial on the current main branch:
I didn't go further in depth in trying to change the grid search. This is just a single change that I noticed improves a lot the performance for max series. Maybe the grid search can be tweaked even further to achieve other speedups.