Xp/decompse matmul split or matmul gather #25196

xipingyan · 2024-06-25T03:03:09Z

Details:

Decompress MatMul+Some Nodes+3Gather->3 (MatMul+Nodes)
It can speed up VIT int8 model about 10% on throughput mode.

Tickets:

133080

fix

Signed-off-by: xipingya <xiping.yan@intel.com>

Fix "TransposeToReshape" trigger's new problem. Signed-off-by: xipingya <xiping.yan@intel.com>

Signed-off-by: xipingya <xiping.yan@intel.com>

yuxu42 · 2024-08-19T06:45:12Z

Hi @dmitry-gorokhov could you please review the PR? Thanks!

…test_utils Signed-off-by: xipingya <xiping.yan@intel.com>

Signed-off-by: xipingya <xiping.yan@intel.com>

EgorDuplensky · 2024-09-19T13:29:40Z

src/tests/test_utils/functional_test_utils/src/check_node_type.cpp

+    return std::string("{") + ov::util::join(s) + "}";
+}
+
+void CheckNumberOfNodesWithTypeImpl(std::shared_ptr<const ov::Model> function,


Just wondering, why was it required to move those functions to a separate file?

Thanks @EgorDuplensky for reviewing this PR.
1: Path: src/tests/functional/plugin/shared/include/subgraph_tests/ and src/plugins/intel_cpu/tests/functional/custom/subgraph_tests/src/ share this infrastructure. Avoid to duplicate codes, just move here, as @iefode 's suggestion.
2: I have a special reason to split CheckNumberOfNodesWithTypeImpl, just copy original implementation.

EgorDuplensky · 2024-09-19T13:50:55Z

...mmon/transformations/src/transformations/common_optimizations/matmul_split_decomposition.cpp

+    return true;
+}
+
+pass::MatmulGatherDecomposition::MatmulGatherDecomposition() {


I understand that this transformation is trying to match a very specific pattern from llm models but shouldn't we have some heuristic for the weights size or something?
I mean do we expect any model with any weights sizes to benefit from this transformation?
Also, please describe in the commit message / PR description the motivation of having this transformation, why we expect it to speed up llms in the first place.

Yes, I want to match VIT similar structure model, and I also add some heuristics, check Rank, decompose_num, and specific transpose order, do you think these are not enough?

Probably this will be enough most of the times.
But, again, this is mostly about the reason we are getting the speed-ups.
I assume we observe speed-ups not because of the ranks, decompose_num and transpose order, but because we become less memory bound. But maybe I am wrong.

Yes, I think it is related to input data size, because it is dynamic shape, so it is hard to custom describe it, just try to best.

.../transformations/include/transformations/common_optimizations/matmul_split_decomposition.hpp

itikhono · 2024-09-20T13:25:29Z

...mmon/transformations/src/transformations/common_optimizations/matmul_split_decomposition.cpp

+    auto transpose_pattern =
+        wrap_type<opset1::Transpose>({reshape_pattern, ov::pass::pattern::wrap_type<ov::opset1::Constant>()},
+                                     ov::pass::pattern::consumers_count(decompose_num));
+    auto reshape2_pattern =


probably we can use pattern::optional (https://github.com/openvinotoolkit/openvino/blob/master/src/core/include/openvino/pass/pattern/op/optional.hpp) here to simplify the pattern logic
auto reshape_pattern = wrap_typeopset1::Reshape(...)
auto optional_transpose = pattern::optionalopset1::Transpose(...)
auto reshape2_pattern = wrap_typeopset1::Reshape({optional_transpose, ...}...)

example:

openvino/src/core/tests/pattern.cpp

Lines 686 to 720 in 7cf0564

// complex pattern matching with `optional` and `wrap_type`

TEST(pattern, optional_complex_pattern_matching) {

auto model_param = make_shared<op::v0::Parameter>(element::f32, ov::Shape{2, 3, 4});

auto model_constant = make_shared<op::v0::Constant>(element::i32, ov::Shape{3}, std::vector<int>{2, 0, 1});

auto model_abs = make_shared<op::v0::Abs>(model_param);

auto model_transpose_negative = std::make_shared<op::v1::Transpose>(model_abs, model_constant);

auto model_negative = std::make_shared<op::v0::Relu>(model_transpose_negative);

auto model_relu = make_shared<op::v0::Relu>(model_param);

auto model_transpose_positive = std::make_shared<op::v1::Transpose>(model_relu, model_constant);

auto model_positive = std::make_shared<op::v0::Relu>(model_transpose_positive);

auto pattern_param = ov::pass::pattern::any_input();

auto pattern_constant = ov::pass::pattern::wrap_type<ov::op::v0::Constant>();

auto pattern_relu = ov::pass::pattern::wrap_type<ov::op::v0::Relu>({pattern_param});

auto pattern_transpose = ov::pass::pattern::optional<op::v1::Transpose>({pattern_relu, pattern_constant});

auto pattern = ov::pass::pattern::wrap_type<op::v0::Relu>({pattern_transpose});

TestMatcher matcher;

ASSERT_FALSE(matcher.match(pattern, model_negative));

ASSERT_TRUE(matcher.match(pattern, model_positive));

}

TEST(pattern, optional_full_match) {

Shape shape{};

auto model_input = std::make_shared<op::v0::Parameter>(element::i32, shape);

auto model_relu = std::make_shared<op::v0::Relu>(model_input);

auto model_relu1 = std::make_shared<op::v0::Relu>(model_relu->output(0));

auto pattern_relu = ov::pass::pattern::optional<op::v0::Relu>();

auto pattern_relu1 = std::make_shared<op::v0::Relu>(pattern_relu->output(0));

TestMatcher tm;

ASSERT_TRUE(tm.match(pattern_relu1, model_relu1));

It is good option. But there is a little special. My case as follow:

Reshape->Transpose->Others Reshape->Reshape2->Others

Pattern::OR seems to be better. @itikhono

...mmon/transformations/src/transformations/common_optimizations/matmul_split_decomposition.cpp

2:Remove opset1.hpp, replace with op::v1 format. Signed-off-by: xipingya <xiping.yan@intel.com>

src/plugins/intel_cpu/src/transformations/cpu_opset/common/pass/matmul_split_decomposition.hpp src/plugins/intel_cpu/src/transformations/cpu_opset/common/pass/matmul_split_decomposition.cpp Signed-off-by: xipingya <xiping.yan@intel.com>

ceciliapeng2011 and others added 16 commits June 12, 2024 15:33

init

6d67fc9

support fp16

2541834

support fp16

a39065f

Add chrome trace

361b151

fix

fix accuracy issue, matmul transpose_B

9e2855f

fork initial version from cecilia

6a7f280

remove profilier

ccadbc5

tmp version, phi verify pass.

6fbaf4e

Signed-off-by: xipingya <xiping.yan@intel.com>

refactor test code.

6ea2baf

Add test for MatMulSplit

25c5e17

Signed-off-by: xipingya <xiping.yan@intel.com>

Improve test.

43dea54

Signed-off-by: xipingya <xiping.yan@intel.com>

Support without bias

23665d0

Signed-off-by: xipingya <xiping.yan@intel.com>

B=1,L=1 can't match pattern, why?

7153b31

scalar input test pass.

b37ca44

Fix "TransposeToReshape" trigger's new problem. Signed-off-by: xipingya <xiping.yan@intel.com>

Test pass, ready to review.

f0b38a7

Signed-off-by: xipingya <xiping.yan@intel.com>

dynamic shape should be got from partial shape.

350f8a8

Signed-off-by: xipingya <xiping.yan@intel.com>

xipingyan added do_not_review do_not_merge labels Jun 25, 2024

github-actions bot added category: IE Tests OpenVINO Test: plugins and common category: CPU OpenVINO CPU plugin category: transformations OpenVINO Runtime library - Transformations labels Jun 25, 2024

xipingyan added 5 commits June 25, 2024 08:27

compitalbe int8 quantize model.

93c3e8a

Signed-off-by: xipingya <xiping.yan@intel.com>

fix replace fq node error.

57e8b56

Signed-off-by: xipingya <xiping.yan@intel.com>

clang format

537bb39

Signed-off-by: xipingya <xiping.yan@intel.com>

Add enable_fq to test.

f35f8b3

Signed-off-by: xipingya <xiping.yan@intel.com>

clange issue.

218027b

xipingyan requested a review from ceciliapeng2011 June 27, 2024 08:18

xipingyan added 2 commits July 4, 2024 13:45

Merge branch 'master' into xp/decompse_matmul_split_or_matmul_gather

c7528d2

remove debug code, and remvoe MatMulVariadicSplitDecomposition

efd55e7

xipingyan marked this pull request as ready for review July 4, 2024 06:29

Fix build error. Remove namespace CPUTestUtils::

26f9314

Signed-off-by: xipingya <xiping.yan@intel.com>

xipingyan requested review from praasz and v-Golubev August 14, 2024 07:07

regist pass to CPU_REGISTER_PASS_COMMON, work for ARM

621a4dc

praasz approved these changes Aug 19, 2024

View reviewed changes

wenjiew added the Code Freeze label Aug 20, 2024

wenjiew requested a review from dmitry-gorokhov August 21, 2024 01:32

Merge commit '54f58b86' into xp/decompse_matmul_split_or_matmul_gather

c8a16f0

xipingyan force-pushed the xp/decompse_matmul_split_or_matmul_gather branch from 185b4ce to c8a16f0 Compare August 26, 2024 07:26

Move "CheckNumberOfNodesWithType" to src/tests/test_utils/functional_…

a7b978a

…test_utils Signed-off-by: xipingya <xiping.yan@intel.com>

wenjiew removed the Code Freeze label Aug 30, 2024

wenjiew modified the milestones: 2024.4, 2024.5 Aug 30, 2024

v-Golubev approved these changes Sep 6, 2024

View reviewed changes

xipingyan added 2 commits September 10, 2024 15:36

Merge branch 'master' into xp/decompse_matmul_split_or_matmul_gather

cbb4fc2

Fix merge master conflict issue.

4d5f03d

Signed-off-by: xipingya <xiping.yan@intel.com>

EgorDuplensky reviewed Sep 19, 2024

View reviewed changes

itikhono requested changes Sep 20, 2024

View reviewed changes

.../transformations/include/transformations/common_optimizations/matmul_split_decomposition.hpp Outdated Show resolved Hide resolved

.../transformations/include/transformations/common_optimizations/matmul_split_decomposition.hpp Outdated Show resolved Hide resolved

itikhono reviewed Sep 20, 2024

View reviewed changes

...mmon/transformations/src/transformations/common_optimizations/matmul_split_decomposition.cpp Outdated Show resolved Hide resolved

itikhono reviewed Sep 20, 2024

View reviewed changes

...mmon/transformations/src/transformations/common_optimizations/matmul_split_decomposition.cpp Outdated Show resolved Hide resolved

xipingyan added 2 commits September 24, 2024 09:37

1:Move decompose_num private

121e6cb

2:Remove opset1.hpp, replace with op::v1 format. Signed-off-by: xipingya <xiping.yan@intel.com>

Move "matmul_split_decomposition.cpp" to CPU:

cb15e8e

src/plugins/intel_cpu/src/transformations/cpu_opset/common/pass/matmul_split_decomposition.hpp src/plugins/intel_cpu/src/transformations/cpu_opset/common/pass/matmul_split_decomposition.cpp Signed-off-by: xipingya <xiping.yan@intel.com>

github-actions bot removed the category: transformations OpenVINO Runtime library - Transformations label Sep 24, 2024

xipingyan requested review from itikhono and EgorDuplensky September 24, 2024 03:09

xipingyan added 2 commits September 25, 2024 14:12

Merge branch 'master' into xp/decompse_matmul_split_or_matmul_gather

cde2f32

Merge branch 'master' into xp/decompse_matmul_split_or_matmul_gather

6f9ebad

itikhono approved these changes Oct 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xp/decompse matmul split or matmul gather #25196

Xp/decompse matmul split or matmul gather #25196

xipingyan commented Jun 25, 2024 •

edited

Loading

yuxu42 commented Aug 19, 2024

EgorDuplensky Sep 19, 2024

xipingyan Sep 19, 2024

EgorDuplensky Sep 19, 2024 •

edited

Loading

xipingyan Sep 19, 2024

EgorDuplensky Sep 20, 2024

xipingyan Sep 23, 2024

itikhono Sep 20, 2024

xipingyan Sep 24, 2024 •

edited

Loading

	// complex pattern matching with `optional` and `wrap_type`
	TEST(pattern, optional_complex_pattern_matching) {
	auto model_param = make_shared<op::v0::Parameter>(element::f32, ov::Shape{2, 3, 4});
	auto model_constant = make_shared<op::v0::Constant>(element::i32, ov::Shape{3}, std::vector<int>{2, 0, 1});
	auto model_abs = make_shared<op::v0::Abs>(model_param);
	auto model_transpose_negative = std::make_shared<op::v1::Transpose>(model_abs, model_constant);
	auto model_negative = std::make_shared<op::v0::Relu>(model_transpose_negative);

	auto model_relu = make_shared<op::v0::Relu>(model_param);
	auto model_transpose_positive = std::make_shared<op::v1::Transpose>(model_relu, model_constant);
	auto model_positive = std::make_shared<op::v0::Relu>(model_transpose_positive);

	auto pattern_param = ov::pass::pattern::any_input();
	auto pattern_constant = ov::pass::pattern::wrap_type<ov::op::v0::Constant>();
	auto pattern_relu = ov::pass::pattern::wrap_type<ov::op::v0::Relu>({pattern_param});
	auto pattern_transpose = ov::pass::pattern::optional<op::v1::Transpose>({pattern_relu, pattern_constant});
	auto pattern = ov::pass::pattern::wrap_type<op::v0::Relu>({pattern_transpose});

	TestMatcher matcher;
	ASSERT_FALSE(matcher.match(pattern, model_negative));
	ASSERT_TRUE(matcher.match(pattern, model_positive));
	}

	TEST(pattern, optional_full_match) {
	Shape shape{};
	auto model_input = std::make_shared<op::v0::Parameter>(element::i32, shape);
	auto model_relu = std::make_shared<op::v0::Relu>(model_input);
	auto model_relu1 = std::make_shared<op::v0::Relu>(model_relu->output(0));

	auto pattern_relu = ov::pass::pattern::optional<op::v0::Relu>();
	auto pattern_relu1 = std::make_shared<op::v0::Relu>(pattern_relu->output(0));

	TestMatcher tm;

	ASSERT_TRUE(tm.match(pattern_relu1, model_relu1));

Xp/decompse matmul split or matmul gather #25196

Are you sure you want to change the base?

Xp/decompse matmul split or matmul gather #25196

Conversation

xipingyan commented Jun 25, 2024 • edited Loading

Details:

Tickets:

yuxu42 commented Aug 19, 2024

EgorDuplensky Sep 19, 2024

Choose a reason for hiding this comment

xipingyan Sep 19, 2024

Choose a reason for hiding this comment

EgorDuplensky Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

xipingyan Sep 19, 2024

Choose a reason for hiding this comment

EgorDuplensky Sep 20, 2024

Choose a reason for hiding this comment

xipingyan Sep 23, 2024

Choose a reason for hiding this comment

itikhono Sep 20, 2024

Choose a reason for hiding this comment

xipingyan Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

xipingyan commented Jun 25, 2024 •

edited

Loading

EgorDuplensky Sep 19, 2024 •

edited

Loading

xipingyan Sep 24, 2024 •

edited

Loading