-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Snippets] Support Fully Connected tokenization #26498
Changes from 9 commits
e3e30cf
51f26d8
39b477f
81a1989
54bb63f
6683467
bbfd959
4d252cf
00d2b90
44e463d
c7bd877
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
// Copyright (C) 2018-2024 Intel Corporation | ||
// SPDX-License-Identifier: Apache-2.0 | ||
// | ||
|
||
#pragma once | ||
|
||
#include "openvino/pass/matcher_pass.hpp" | ||
#include "snippets/pass/tokenization.hpp" | ||
|
||
namespace ov { | ||
namespace snippets { | ||
namespace pass { | ||
|
||
/** | ||
* @interface TokenizeFCSnippets | ||
* @brief The pass tokenizes FullyConnected like (with constant path on B input) MatMuls | ||
* @ingroup snippets | ||
*/ | ||
class TokenizeFCSnippets: public ov::pass::MatcherPass { | ||
public: | ||
OPENVINO_RTTI("TokenizeFCSnippets", "0"); | ||
TokenizeFCSnippets(const SnippetsTokenization::Config& config); | ||
}; | ||
|
||
} // namespace pass | ||
} // namespace snippets | ||
} // namespace ov |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
// Copyright (C) 2024 Intel Corporation | ||
// SPDX-License-Identifier: Apache-2.0 | ||
// | ||
|
||
/** | ||
* @brief A file contains tokenization related utilities. | ||
* @file tokenization_utils.hpp | ||
*/ | ||
#pragma once | ||
|
||
#include "snippets/op/subgraph.hpp" | ||
#include "snippets/pass/tokenization.hpp" | ||
|
||
namespace ov { | ||
namespace snippets { | ||
namespace utils { | ||
/** | ||
* @brief Tokenizes a node into Subgraph. 2 options are possible (depending on config's values and internal logic)L | ||
* 1. The node is wrapped in a trivial Subgraph which contains only this node | ||
* 2. The node is fused in parent's Subgraphs | ||
* @param node node which should be tokenized | ||
* @param config tokenization config which regulates | ||
* @return whether the node was tokenized or not | ||
*/ | ||
bool tokenize_node(const std::shared_ptr<ov::Node>& node, const ov::snippets::pass::SnippetsTokenization::Config& config); | ||
} // namespace utils | ||
} // namespace snippets | ||
} // namespace ov |
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
// Copyright (C) 2024 Intel Corporation | ||
// SPDX-License-Identifier: Apache-2.0 | ||
// | ||
|
||
#include "snippets/pass/fc_tokenization.hpp" | ||
|
||
#include "openvino/pass/graph_rewrite.hpp" | ||
#include "openvino/pass/pattern/op/wrap_type.hpp" | ||
#include "snippets/itt.hpp" | ||
#include "snippets/op/subgraph.hpp" | ||
a-sidorova marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#include "snippets/utils/tokenization_utils.hpp" | ||
|
||
ov::snippets::pass::TokenizeFCSnippets::TokenizeFCSnippets(const SnippetsTokenization::Config& config) { | ||
MATCHER_SCOPE(TokenizeFCSnippets); | ||
// TODO: extend constant path coverage: | ||
IvanNovoselov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// 1. Transpose support | ||
// 2. Convert support | ||
// 3. Decompression subgraphs support (and all the possible compressed weights related precisions) | ||
auto constant = ov::pass::pattern::wrap_type<ov::op::v0::Constant>(); | ||
auto m_matmul = ov::pass::pattern::wrap_type<ov::opset1::MatMul>({ov::pass::pattern::any_input(), constant}); | ||
|
||
auto callback = [OV_CAPTURE_CPY_AND_THIS](ov::pass::pattern::Matcher &m) { | ||
OV_ITT_SCOPED_TASK(ov::pass::itt::domains::SnippetsTransform, "Snippets::op::TokenizeFCSnippets") | ||
const auto matmul = m.get_match_root(); | ||
if (transformation_callback(matmul)) { | ||
return false; | ||
} | ||
return ov::snippets::utils::tokenize_node(matmul, config); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we really need to extract the common tokenization logic to utilities just to tokenize one op?
Probably I missed something 🤔 Could you please elaborate your decision with The only one thought is to connected to the existing Subgraphs on first input 🤔 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But why not? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
You are right, that is one of the reasons why I reused the helper for FC tokenization. Thanks to that, we cover e.g. "Transposes on inputs + FC" cases by tests Also, the moving of tokenization logic in a separate helper is a good prerequisite for Tokenization refactoring. Currently, the tokenization looks as follows (I cut the code a bit):
Ideally, I would want to separate this large pass into sequence of small matcher passes, which will match only on the specific ops, and contain only the checks which are needed for the matched ops:
I believe this will improve tokenization code readability, and could allow us to configure the pipeline more precisely (e.g. And each of this new matchers can reuse There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Cool! Thank you for the explanation! I just wanted to check that I fully got the goal of this PR 😊
This is really good idea! 🤔 |
||
}; | ||
|
||
auto matcher = std::make_shared<ov::pass::pattern::Matcher>(m_matmul, matcher_name); | ||
register_matcher(matcher, callback); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we need to create the static public method in the pass "TokenizeSnippets" instead of new file creation with one function. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to move this in a separate file because
bool tokenize_nodes(const ov::NodeVector& nodes)
helper which tokenizes a nodes sequence. Now this is a part ofTokenizeMHASnippets
pass, but this code can be easily extracted and reused (e.g. for MLP tokenization)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also move
AppropriateForSubgraph
here 😁In general, whether we want it or not, we are moving from one universal tokenization pass towards several transformations responsible for particular pattern families (eltwise/MHA/MLP).
From this perspective, this attempt to derive and reuse some common logic seems like a step in the right direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should go even further and create a dedicated class for tokenization (
SnippetsNodeTokenizer
or smth). This class should have 3 methods: privatebool check_necessary(node)
public virtualbool check_sufficient(node)
and publicbool fuse(node)
.check_necessary
will perform all the fundamental plugin-independent checks (like cyclic dependencies, num_results_children etc) and can potentially also gather some aggregated info (internal/external inputs/body parameters etc). If possible, we should separate the check from aggregated info gathering (another methodanalyze
).check_sufficient
will be overrided by plugin and will contain backend-specific checks possibly partially based on the gathered aggregated info (hidden_data_count, unique_biffer_count) etcfuse
will callif(check_necessary() && check_sufficient())
and create subgraph based on aggregated info.So the idea is that the plugin creates its own instance of this
SnippetsNodeTokenizer
and passes it to all the tokenization transformations. And the transformations can use this class to do all the dirty work, plus they can impose their own limitations of course. For example, the eltwise tokenization pass will additionally check op types and precisions.@a-sidorova, @v-Golubev what do you think? I'm nit saying we should refactor it know, but let's try to outline a more convenient and scalable design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IvanNovoselov your idea looks interesting, but I need some time to think if the proposed architecture is enough. Can we allocate a time slot in the next command meeting to discuss it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we really need to discuss how the current logic of the tokenization can be refactored and improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sure let's discuss it on the sync