Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xp/decompse matmul split or matmul gather #25196
base: master
Are you sure you want to change the base?
Xp/decompse matmul split or matmul gather #25196
Changes from 27 commits
6d67fc9
2541834
a39065f
361b151
9e2855f
6a7f280
ccadbc5
6fbaf4e
6ea2baf
25c5e17
43dea54
23665d0
7153b31
b37ca44
f0b38a7
350f8a8
93c3e8a
57e8b56
537bb39
f35f8b3
218027b
c7528d2
efd55e7
8760ced
2e56d87
3dc306e
9290aa9
a154e26
aefb31f
0393c50
f574620
e0cc4eb
eea39c7
d094d70
8a45461
1f68706
54a5798
461b708
53f9eca
c2d71f4
3dec73b
e35702e
f24bbbf
26f9314
621a4dc
c8a16f0
a7b978a
cbb4fc2
4d5f03d
121e6cb
cb15e8e
cde2f32
6f9ebad
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this transformation is trying to match a very specific pattern from llm models but shouldn't we have some heuristic for the weights size or something?
I mean do we expect any model with any weights sizes to benefit from this transformation?
Also, please describe in the commit message / PR description the motivation of having this transformation, why we expect it to speed up llms in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I want to match VIT similar structure model, and I also add some heuristics, check Rank, decompose_num, and specific transpose order, do you think these are not enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably this will be enough most of the times.
But, again, this is mostly about the reason we are getting the speed-ups.
I assume we observe speed-ups not because of the ranks, decompose_num and transpose order, but because we become less memory bound. But maybe I am wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it is related to input data size, because it is dynamic shape, so it is hard to custom describe it, just try to best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like unnecessary check: we already check that while filled the
gathers
vectorThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it also check "indices_val[0]" cover 3 cases.(0,1,2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still don't understand this check: I don't see any indices related check in this
any_of
, and as for type check: this was already done a few lines upperThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check somewhere upper that
matmul->get_transpose_a() == false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so.
1: The real case does not include tranpose_a being False.
2: It makes the code more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although there are no known cases where such patterns have
transpose_a==true
, it doesn't mean that there will be no such cases in future (not necessary the cases from real models, we should also take into account synthetic cases, e.g. from tests).And I would prefer to avoid debugging MM shape inference failure if such patterns will appear. Moreover, this is one-line check that can be done right after
matmul
variable creation: almost no code is spent on that. So please do that