Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Snippets] Support Fully Connected tokenization #26498

Merged

Conversation

v-Golubev
Copy link
Contributor

@v-Golubev v-Golubev commented Sep 9, 2024

Details:

  • Support FullyConnected tokenization in tests
  • Matmul tests: added MatMul with constant instances

Tickets:

@v-Golubev v-Golubev requested review from a team as code owners September 9, 2024 16:41
@github-actions github-actions bot added category: IE Tests OpenVINO Test: plugins and common category: CPU OpenVINO CPU plugin labels Sep 9, 2024
@v-Golubev v-Golubev force-pushed the vg/snippets/fc_tokenization branch 3 times, most recently from 77f3cee to 241ed9a Compare September 10, 2024 20:22
@v-Golubev v-Golubev requested a review from a team as a code owner September 10, 2024 20:22
@v-Golubev v-Golubev requested review from itikhono and removed request for a team September 10, 2024 20:22
@github-actions github-actions bot added the category: transformations OpenVINO Runtime library - Transformations label Sep 10, 2024
@github-actions github-actions bot removed the category: transformations OpenVINO Runtime library - Transformations label Sep 11, 2024
@v-Golubev v-Golubev force-pushed the vg/snippets/fc_tokenization branch 10 times, most recently from ded748e to c86c9ae Compare September 18, 2024 14:42
@v-Golubev v-Golubev added this to the 2024.5 milestone Sep 18, 2024
@@ -249,14 +218,10 @@ bool TokenizeSnippets::AppropriateForSubgraph(const std::shared_ptr<const Node>

TokenizeSnippets::TokenizeSnippets(const SnippetsTokenization::Config& config) {
MATCHER_SCOPE(TokenizeSnippets);
enum continuation_strategy {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: tokenization code is moved without changes to tokenization_utils.cpp file with one exception: enum continuation_strategy is removed since it was used only with one value: continuation_strategy::reset

if (transformation_callback(matmul)) {
return false;
}
return ov::snippets::utils::tokenize_node(matmul, config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to extract the common tokenization logic to utilities just to tokenize one op?
Can we use the following code here?

        auto subgraph = op::Subgraph::wrap_node_as_subgraph(matmul);
        subgraph->get_rt_info()["originalLayersNames"] = matmul->get_friendly_name();
        ov::replace_node(matmul, subgraph);
        op::update_out_tensor_name(subgraph);

Probably I missed something 🤔 Could you please elaborate your decision with ov::snippets::utils::tokenize_node?

The only one thought is to connected to the existing Subgraphs on first input 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why not?
It promotes code reusage and thus simplifies maintenance. We would also be able to extend this pass to cover MLP patterns more easily.
I agree that the tokenize_node itself is quite complex, but that's because we have a bunch of checks there (like the ones related to Converts or num registers) that shouldn't really be in this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only one thought is to connected to the existing Subgraphs on first input

You are right, that is one of the reasons why I reused the helper for FC tokenization. Thanks to that, we cover e.g. "Transposes on inputs + FC" cases by tests

Also, the moving of tokenization logic in a separate helper is a good prerequisite for Tokenization refactoring. Currently, the tokenization looks as follows (I cut the code a bit):

    auto tokenization_passes = manager.register_pass<ov::pass::GraphRewrite>();
    tokenization_passes->add_matcher<TokenizeGNSnippets>();
    tokenization_passes->add_matcher<TokenizeFCSnippets>(m_config);
    tokenization_passes->add_matcher<TokenizeSnippets>(m_config);

TokenizeSnippets here is responsible for tokenization of the ops which were not tokenized by the previous passes. The pass contain a lot of checks, which are related to different ops, in one place (is_supported_op lambda): as a result, the pass looks quite cumbersome.

Ideally, I would want to separate this large pass into sequence of small matcher passes, which will match only on the specific ops, and contain only the checks which are needed for the matched ops:

    auto tokenization_passes = manager.register_pass<ov::pass::GraphRewrite>();
    tokenization_passes->add_matcher<TokenizeGNSnippets>();
    tokenization_passes->add_matcher<TokenizeFCSnippets>(m_config);
    tokenization_passes->add_matcher<TokenizeUnaryEltwise>(m_config);
    tokenization_passes->add_matcher<TokenizeBinaryEltwise>(m_config);
    tokenization_passes->add_matcher<TokenizeTranspose>(m_config);
    tokenization_passes->add_matcher<TokenizeMatmul>(m_config);
    ...

I believe this will improve tokenization code readability, and could allow us to configure the pipeline more precisely (e.g. TokenizeTranspose and TokenizeMatmul are used only in tests, so they can be disabled by the plugin in product code).

And each of this new matchers can reuse ov::snippets::utils::tokenize_node -- that's another reason why I moved the tokenization code in a separate helper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, that is one of the reasons why I reused the helper for FC tokenization. Thanks to that, we cover e.g. "Transposes on inputs + FC" cases by tests

Cool! Thank you for the explanation! I just wanted to check that I fully got the goal of this PR 😊
Yeah, I saw some tests like FC->Eltwise->FC.

Ideally, I would want to separate this large pass into sequence of small matcher passes, which will match only on the specific ops, and contain only the checks which are needed for the matched ops

This is really good idea! 🤔

src/common/snippets/src/pass/fc_tokenization.cpp Outdated Show resolved Hide resolved
* @param config tokenization config which regulates
* @return whether the node was tokenized or not
*/
bool tokenize_node(const std::shared_ptr<ov::Node>& node, const ov::snippets::pass::SnippetsTokenization::Config& config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we need to create the static public method in the pass "TokenizeSnippets" instead of new file creation with one function. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to move this in a separate file because

  1. This is a common helper which can be reused in numerous tokenization passes in the future
  2. The helper is large, so it's a bit easier to have it in a separate file from readability perspective (At least that's what it seems to me 😄 )
  3. This file may be extended with bool tokenize_nodes(const ov::NodeVector& nodes) helper which tokenizes a nodes sequence. Now this is a part of TokenizeMHASnippets pass, but this code can be easily extracted and reused (e.g. for MLP tokenization)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also move AppropriateForSubgraph here 😁
In general, whether we want it or not, we are moving from one universal tokenization pass towards several transformations responsible for particular pattern families (eltwise/MHA/MLP).
From this perspective, this attempt to derive and reuse some common logic seems like a step in the right direction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go even further and create a dedicated class for tokenization (SnippetsNodeTokenizer or smth). This class should have 3 methods: private bool check_necessary(node) public virtual bool check_sufficient(node) and public bool fuse(node).

  1. check_necessary will perform all the fundamental plugin-independent checks (like cyclic dependencies, num_results_children etc) and can potentially also gather some aggregated info (internal/external inputs/body parameters etc). If possible, we should separate the check from aggregated info gathering (another method analyze).
  2. check_sufficient will be overrided by plugin and will contain backend-specific checks possibly partially based on the gathered aggregated info (hidden_data_count, unique_biffer_count) etc
  3. Finally, fuse will call if(check_necessary() && check_sufficient()) and create subgraph based on aggregated info.
    So the idea is that the plugin creates its own instance of this SnippetsNodeTokenizer and passes it to all the tokenization transformations. And the transformations can use this class to do all the dirty work, plus they can impose their own limitations of course. For example, the eltwise tokenization pass will additionally check op types and precisions.

@a-sidorova, @v-Golubev what do you think? I'm nit saying we should refactor it know, but let's try to outline a more convenient and scalable design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IvanNovoselov your idea looks interesting, but I need some time to think if the proposed architecture is enough. Can we allocate a time slot in the next command meeting to discuss it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we really need to discuss how the current logic of the tokenization can be refactored and improved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sure let's discuss it on the sync

src/common/snippets/src/pass/tokenization.cpp Outdated Show resolved Hide resolved
src/common/snippets/src/utils/tokenization_utils.cpp Outdated Show resolved Hide resolved
src/tests/functional/plugin/shared/src/snippets/matmul.cpp Outdated Show resolved Hide resolved
src/common/snippets/src/pass/fc_tokenization.cpp Outdated Show resolved Hide resolved
if (transformation_callback(matmul)) {
return false;
}
return ov::snippets::utils::tokenize_node(matmul, config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why not?
It promotes code reusage and thus simplifies maintenance. We would also be able to extend this pass to cover MLP patterns more easily.
I agree that the tokenize_node itself is quite complex, but that's because we have a bunch of checks there (like the ones related to Converts or num registers) that shouldn't really be in this function.

* @param config tokenization config which regulates
* @return whether the node was tokenized or not
*/
bool tokenize_node(const std::shared_ptr<ov::Node>& node, const ov::snippets::pass::SnippetsTokenization::Config& config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also move AppropriateForSubgraph here 😁
In general, whether we want it or not, we are moving from one universal tokenization pass towards several transformations responsible for particular pattern families (eltwise/MHA/MLP).
From this perspective, this attempt to derive and reuse some common logic seems like a step in the right direction.

* @param config tokenization config which regulates
* @return whether the node was tokenized or not
*/
bool tokenize_node(const std::shared_ptr<ov::Node>& node, const ov::snippets::pass::SnippetsTokenization::Config& config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go even further and create a dedicated class for tokenization (SnippetsNodeTokenizer or smth). This class should have 3 methods: private bool check_necessary(node) public virtual bool check_sufficient(node) and public bool fuse(node).

  1. check_necessary will perform all the fundamental plugin-independent checks (like cyclic dependencies, num_results_children etc) and can potentially also gather some aggregated info (internal/external inputs/body parameters etc). If possible, we should separate the check from aggregated info gathering (another method analyze).
  2. check_sufficient will be overrided by plugin and will contain backend-specific checks possibly partially based on the gathered aggregated info (hidden_data_count, unique_biffer_count) etc
  3. Finally, fuse will call if(check_necessary() && check_sufficient()) and create subgraph based on aggregated info.
    So the idea is that the plugin creates its own instance of this SnippetsNodeTokenizer and passes it to all the tokenization transformations. And the transformations can use this class to do all the dirty work, plus they can impose their own limitations of course. For example, the eltwise tokenization pass will additionally check op types and precisions.

@a-sidorova, @v-Golubev what do you think? I'm nit saying we should refactor it know, but let's try to outline a more convenient and scalable design.

Comment on lines +233 to +249
// We cannot add new node, that is not Convert, after Convert (that is start node) to avoid arithmetic problems with conversion
// We can add any new node in Subgraph after Convert (bacause after Input)
// Parameter
// |
// Convert
//
// We cannot add new node, that isn't Convert, in Subgraph after existing Convert
// Parameter
// Relu
// Convert
//
// But we can add new Convert in Subgraph after existing Convert
// Parameter
// Relu
// Convert
// Convert
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this logic is obsolete, because we should be able to support Converts everywhere. Do you think we can remove it?
CC @a-sidorova, we discussed it offline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we can indeed remove this logic. However, since this code is sensitive and affect tokenization of all models, I especially tried to avoid any changes in this helper.
I suggest merging this PR without the proposed changes, and do it in a separate PR which will contain only this change: this would save our time on root-cause investigation in case of unexpected degradations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fair enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is created: #26843

Copy link
Contributor

@a-sidorova a-sidorova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

if (transformation_callback(matmul)) {
return false;
}
return ov::snippets::utils::tokenize_node(matmul, config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, that is one of the reasons why I reused the helper for FC tokenization. Thanks to that, we cover e.g. "Transposes on inputs + FC" cases by tests

Cool! Thank you for the explanation! I just wanted to check that I fully got the goal of this PR 😊
Yeah, I saw some tests like FC->Eltwise->FC.

Ideally, I would want to separate this large pass into sequence of small matcher passes, which will match only on the specific ops, and contain only the checks which are needed for the matched ops

This is really good idea! 🤔

* @param config tokenization config which regulates
* @return whether the node was tokenized or not
*/
bool tokenize_node(const std::shared_ptr<ov::Node>& node, const ov::snippets::pass::SnippetsTokenization::Config& config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we really need to discuss how the current logic of the tokenization can be refactored and improved.

src/common/snippets/src/pass/tokenization.cpp Outdated Show resolved Hide resolved
@IvanNovoselov IvanNovoselov added this pull request to the merge queue Sep 27, 2024
Merged via the queue into openvinotoolkit:master with commit 44866a7 Sep 27, 2024
157 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin category: IE Tests OpenVINO Test: plugins and common
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants