-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Snippets] Support Fully Connected tokenization #26498
[Snippets] Support Fully Connected tokenization #26498
Conversation
77f3cee
to
241ed9a
Compare
241ed9a
to
39805e0
Compare
ded748e
to
c86c9ae
Compare
f8b7baa
to
30cb278
Compare
@@ -249,14 +218,10 @@ bool TokenizeSnippets::AppropriateForSubgraph(const std::shared_ptr<const Node> | |||
|
|||
TokenizeSnippets::TokenizeSnippets(const SnippetsTokenization::Config& config) { | |||
MATCHER_SCOPE(TokenizeSnippets); | |||
enum continuation_strategy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: tokenization code is moved without changes to tokenization_utils.cpp file with one exception: enum continuation_strategy
is removed since it was used only with one value: continuation_strategy::reset
50218af
to
4d252cf
Compare
…::operator<< using
if (transformation_callback(matmul)) { | ||
return false; | ||
} | ||
return ov::snippets::utils::tokenize_node(matmul, config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to extract the common tokenization logic to utilities just to tokenize one op?
Can we use the following code here?
auto subgraph = op::Subgraph::wrap_node_as_subgraph(matmul);
subgraph->get_rt_info()["originalLayersNames"] = matmul->get_friendly_name();
ov::replace_node(matmul, subgraph);
op::update_out_tensor_name(subgraph);
Probably I missed something 🤔 Could you please elaborate your decision with ov::snippets::utils::tokenize_node
?
The only one thought is to connected to the existing Subgraphs on first input 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why not?
It promotes code reusage and thus simplifies maintenance. We would also be able to extend this pass to cover MLP patterns more easily.
I agree that the tokenize_node
itself is quite complex, but that's because we have a bunch of checks there (like the ones related to Converts or num registers) that shouldn't really be in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only one thought is to connected to the existing Subgraphs on first input
You are right, that is one of the reasons why I reused the helper for FC tokenization. Thanks to that, we cover e.g. "Transposes on inputs + FC" cases by tests
Also, the moving of tokenization logic in a separate helper is a good prerequisite for Tokenization refactoring. Currently, the tokenization looks as follows (I cut the code a bit):
auto tokenization_passes = manager.register_pass<ov::pass::GraphRewrite>();
tokenization_passes->add_matcher<TokenizeGNSnippets>();
tokenization_passes->add_matcher<TokenizeFCSnippets>(m_config);
tokenization_passes->add_matcher<TokenizeSnippets>(m_config);
TokenizeSnippets
here is responsible for tokenization of the ops which were not tokenized by the previous passes. The pass contain a lot of checks, which are related to different ops, in one place (is_supported_op
lambda): as a result, the pass looks quite cumbersome.
Ideally, I would want to separate this large pass into sequence of small matcher passes, which will match only on the specific ops, and contain only the checks which are needed for the matched ops:
auto tokenization_passes = manager.register_pass<ov::pass::GraphRewrite>();
tokenization_passes->add_matcher<TokenizeGNSnippets>();
tokenization_passes->add_matcher<TokenizeFCSnippets>(m_config);
tokenization_passes->add_matcher<TokenizeUnaryEltwise>(m_config);
tokenization_passes->add_matcher<TokenizeBinaryEltwise>(m_config);
tokenization_passes->add_matcher<TokenizeTranspose>(m_config);
tokenization_passes->add_matcher<TokenizeMatmul>(m_config);
...
I believe this will improve tokenization code readability, and could allow us to configure the pipeline more precisely (e.g. TokenizeTranspose
and TokenizeMatmul
are used only in tests, so they can be disabled by the plugin in product code).
And each of this new matchers can reuse ov::snippets::utils::tokenize_node
-- that's another reason why I moved the tokenization code in a separate helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, that is one of the reasons why I reused the helper for FC tokenization. Thanks to that, we cover e.g. "Transposes on inputs + FC" cases by tests
Cool! Thank you for the explanation! I just wanted to check that I fully got the goal of this PR 😊
Yeah, I saw some tests like FC->Eltwise->FC
.
Ideally, I would want to separate this large pass into sequence of small matcher passes, which will match only on the specific ops, and contain only the checks which are needed for the matched ops
This is really good idea! 🤔
* @param config tokenization config which regulates | ||
* @return whether the node was tokenized or not | ||
*/ | ||
bool tokenize_node(const std::shared_ptr<ov::Node>& node, const ov::snippets::pass::SnippetsTokenization::Config& config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we need to create the static public method in the pass "TokenizeSnippets" instead of new file creation with one function. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to move this in a separate file because
- This is a common helper which can be reused in numerous tokenization passes in the future
- The helper is large, so it's a bit easier to have it in a separate file from readability perspective (At least that's what it seems to me 😄 )
- This file may be extended with
bool tokenize_nodes(const ov::NodeVector& nodes)
helper which tokenizes a nodes sequence. Now this is a part ofTokenizeMHASnippets
pass, but this code can be easily extracted and reused (e.g. for MLP tokenization)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also move AppropriateForSubgraph
here 😁
In general, whether we want it or not, we are moving from one universal tokenization pass towards several transformations responsible for particular pattern families (eltwise/MHA/MLP).
From this perspective, this attempt to derive and reuse some common logic seems like a step in the right direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should go even further and create a dedicated class for tokenization (SnippetsNodeTokenizer
or smth). This class should have 3 methods: private bool check_necessary(node)
public virtual bool check_sufficient(node)
and public bool fuse(node)
.
check_necessary
will perform all the fundamental plugin-independent checks (like cyclic dependencies, num_results_children etc) and can potentially also gather some aggregated info (internal/external inputs/body parameters etc). If possible, we should separate the check from aggregated info gathering (another methodanalyze
).check_sufficient
will be overrided by plugin and will contain backend-specific checks possibly partially based on the gathered aggregated info (hidden_data_count, unique_biffer_count) etc- Finally,
fuse
will callif(check_necessary() && check_sufficient())
and create subgraph based on aggregated info.
So the idea is that the plugin creates its own instance of thisSnippetsNodeTokenizer
and passes it to all the tokenization transformations. And the transformations can use this class to do all the dirty work, plus they can impose their own limitations of course. For example, the eltwise tokenization pass will additionally check op types and precisions.
@a-sidorova, @v-Golubev what do you think? I'm nit saying we should refactor it know, but let's try to outline a more convenient and scalable design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IvanNovoselov your idea looks interesting, but I need some time to think if the proposed architecture is enough. Can we allocate a time slot in the next command meeting to discuss it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we really need to discuss how the current logic of the tokenization can be refactored and improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sure let's discuss it on the sync
src/tests/ov_helpers/ov_snippets_models/src/subgraph_matmul.cpp
Outdated
Show resolved
Hide resolved
src/tests/ov_helpers/ov_snippets_models/src/subgraph_matmul.cpp
Outdated
Show resolved
Hide resolved
src/tests/ov_helpers/ov_snippets_models/src/subgraph_matmul.cpp
Outdated
Show resolved
Hide resolved
src/tests/ov_helpers/ov_snippets_models/src/subgraph_matmul.cpp
Outdated
Show resolved
Hide resolved
if (transformation_callback(matmul)) { | ||
return false; | ||
} | ||
return ov::snippets::utils::tokenize_node(matmul, config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why not?
It promotes code reusage and thus simplifies maintenance. We would also be able to extend this pass to cover MLP patterns more easily.
I agree that the tokenize_node
itself is quite complex, but that's because we have a bunch of checks there (like the ones related to Converts or num registers) that shouldn't really be in this function.
* @param config tokenization config which regulates | ||
* @return whether the node was tokenized or not | ||
*/ | ||
bool tokenize_node(const std::shared_ptr<ov::Node>& node, const ov::snippets::pass::SnippetsTokenization::Config& config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also move AppropriateForSubgraph
here 😁
In general, whether we want it or not, we are moving from one universal tokenization pass towards several transformations responsible for particular pattern families (eltwise/MHA/MLP).
From this perspective, this attempt to derive and reuse some common logic seems like a step in the right direction.
* @param config tokenization config which regulates | ||
* @return whether the node was tokenized or not | ||
*/ | ||
bool tokenize_node(const std::shared_ptr<ov::Node>& node, const ov::snippets::pass::SnippetsTokenization::Config& config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should go even further and create a dedicated class for tokenization (SnippetsNodeTokenizer
or smth). This class should have 3 methods: private bool check_necessary(node)
public virtual bool check_sufficient(node)
and public bool fuse(node)
.
check_necessary
will perform all the fundamental plugin-independent checks (like cyclic dependencies, num_results_children etc) and can potentially also gather some aggregated info (internal/external inputs/body parameters etc). If possible, we should separate the check from aggregated info gathering (another methodanalyze
).check_sufficient
will be overrided by plugin and will contain backend-specific checks possibly partially based on the gathered aggregated info (hidden_data_count, unique_biffer_count) etc- Finally,
fuse
will callif(check_necessary() && check_sufficient())
and create subgraph based on aggregated info.
So the idea is that the plugin creates its own instance of thisSnippetsNodeTokenizer
and passes it to all the tokenization transformations. And the transformations can use this class to do all the dirty work, plus they can impose their own limitations of course. For example, the eltwise tokenization pass will additionally check op types and precisions.
@a-sidorova, @v-Golubev what do you think? I'm nit saying we should refactor it know, but let's try to outline a more convenient and scalable design.
// We cannot add new node, that is not Convert, after Convert (that is start node) to avoid arithmetic problems with conversion | ||
// We can add any new node in Subgraph after Convert (bacause after Input) | ||
// Parameter | ||
// | | ||
// Convert | ||
// | ||
// We cannot add new node, that isn't Convert, in Subgraph after existing Convert | ||
// Parameter | ||
// Relu | ||
// Convert | ||
// | ||
// But we can add new Convert in Subgraph after existing Convert | ||
// Parameter | ||
// Relu | ||
// Convert | ||
// Convert | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this logic is obsolete, because we should be able to support Converts everywhere. Do you think we can remove it?
CC @a-sidorova, we discussed it offline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we can indeed remove this logic. However, since this code is sensitive and affect tokenization of all models, I especially tried to avoid any changes in this helper.
I suggest merging this PR without the proposed changes, and do it in a separate PR which will contain only this change: this would save our time on root-cause investigation in case of unexpected degradations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, fair enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is created: #26843
fcec6a7
to
44e463d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼
if (transformation_callback(matmul)) { | ||
return false; | ||
} | ||
return ov::snippets::utils::tokenize_node(matmul, config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, that is one of the reasons why I reused the helper for FC tokenization. Thanks to that, we cover e.g. "Transposes on inputs + FC" cases by tests
Cool! Thank you for the explanation! I just wanted to check that I fully got the goal of this PR 😊
Yeah, I saw some tests like FC->Eltwise->FC
.
Ideally, I would want to separate this large pass into sequence of small matcher passes, which will match only on the specific ops, and contain only the checks which are needed for the matched ops
This is really good idea! 🤔
* @param config tokenization config which regulates | ||
* @return whether the node was tokenized or not | ||
*/ | ||
bool tokenize_node(const std::shared_ptr<ov::Node>& node, const ov::snippets::pass::SnippetsTokenization::Config& config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we really need to discuss how the current logic of the tokenization can be refactored and improved.
Details:
Tickets: