[CPU]whisper readvalue optimize #26130

xipingyan · 2024-08-20T08:33:41Z

Details:

New ReadValueWithSubgraph node.
Move ReadValue's initial subgraph nodes to ReadValueWithSubgraph
Mirror ReadValueWithSubgraph to MemoryInput
Replace MemoryOutput to MemoryOutputStub
Call new interface Init and Activate of ov::intel_cpu::Graph, avoid to memory copy. Refer: [CPU] Introduce SubModel op and Composite node #25385

Tickets:

128743

Profile each node execute time. Support Static and Dynamic infer. Signed-off-by: xipingya <xiping.yan@intel.com>

If reset is not called, these marked nodes also desn't need to be executed. Signed-off-by: xipingya <xiping.yan@intel.com>

Signed-off-by: xipingya <xiping.yan@intel.com>

…e_optimize

Signed-off-by: xipingya <xiping.yan@intel.com>

decoder network: 20ms -> 5 ms. Signed-off-by: xipingya <xiping.yan@intel.com>

Signed-off-by: xipingya <xiping.yan@intel.com>

…pingyan/openvino into xp/whisper_readvalue_optimize

Signed-off-by: xipingya <xiping.yan@intel.com>

xipingyan · 2024-09-18T03:04:42Z

src/plugins/intel_cpu/src/graph.cpp

+                if (memInp && memInp->haveSubgraph()) {
+                    // Since the ReadValueWithSubgraph is middle node, just add this branch in order to use
+                    // ProxyMemoryBlock to share memory
+                    edge->getParent()->resolveInPlaceEdges(Edge::LOOK_UP);


Hi @maxnick ,
For example:
MemoryInput(ReadValueWithSubgraph)->computer nodes
MemoryInput(ReadValueWithSubgraph)->MemoryOutputStub
MemoryInput(ReadValueWithSubgraph)->Stateful
I have to add this branch to call MemoryInput::resolveInPlaceEdges, it will let Stateful and MemoryOutputStub, computer nodes, share memory. It is reasonable.

If not call, these nodes's output will be inPlace, but MemoryOutputStub will
allocate memory, so maybe these output will empty ptr. Just clarify.

The problem is in the MemoryInput::selectOptimalPrimitiveDescriptor implementation you developed for the subgraph scenario. There you:

Lose the inPlace port tag, when copy the output port config from the internal subgraph. Therefore the output edge inplace look up property is lost and the resolveInPlaceEdges(Edge::LOOK_UP) isn't being called for such a node.

It doesn't make sense to enforce the output port memory descriptor to mimic the internal subgraph output node memory descriptor, as the data are being read from the internal subgraph pretty rarely (only per reset state). So that it makes sense to stick with the external memory representation (just a default behavior of the the MemoryInput node), as the data will be read from the state via proxy memory manager more often.
Thus I would recommend the following:

Don't redefine selectPrimitiveDescriptor as it's more important to preserve the external graph memory representation to avoid extra reorders in runtime

Provide the MemoryInput node output memory descriptors to the subgraph output config, so that the subgraph does all the necessary memory layout alignment for us.

1: Removed redefine supportedPrimitiveDescriptors
2: Internal subgraph provide input setting InputConfig interface. but there is no OutputConfig setting interface.
@maxnick

Signed-off-by: xipingya <xiping.yan@intel.com>

src/plugins/intel_cpu/src/transformations/cpu_opset/common/op/read_value_with_subgraph.hpp

src/plugins/intel_cpu/src/nodes/memory.cpp

Signed-off-by: xipingya <xiping.yan@intel.com>

maxnick · 2024-09-23T10:50:57Z

src/plugins/intel_cpu/src/graph_optimizer.cpp

+        auto memInput = std::dynamic_pointer_cast<node::MemoryInput>(node);
+        if (memInput) {
+            return memInput->haveSubgraph();
+        }


To my understanding, in all the cases when Assign is directly attached to the ReadValue node, we should replace it with a stub, since the assign node is practically useless. ReadValue->Assign pair means that the state values aren't really changed by the assign node. So it looks like it can always be safely replaced with a stub.

Yes, Same to my first understanding.
But it also depends on another factor, the memory is VariableStateDoubleBuffer or VariableStateSingleBuffer(I introduced).
For VariableStateSingleBuffer, it is OK, I can replace with Stub for ReadValue->Assign pair.
For VariableStateDoubleBuffer, it is not safe.
About using VariableStateSingleBuffer or VariableStateDoubleBuffer, it still depends on if (haveSubgraph()) in my codes.
@maxnick

To my understanding, when we have a direct ReadValue->Assign pair, we most definitely should use a single buffer, as nothing new will be written to the state during the assign stage. May be we can check for the MemoryOutput child in the MemoryInput node to select an appropriate state type (i.e. single buffer or double buffer).

src/plugins/intel_cpu/src/graph_optimizer.cpp

src/plugins/intel_cpu/src/memory_state.h

src/plugins/intel_cpu/src/memory_state.cpp

src/plugins/intel_cpu/src/transformations/cpu_opset/common/op/read_value_with_subgraph.hpp

src/plugins/intel_cpu/src/nodes/memory.hpp

src/plugins/intel_cpu/src/nodes/memory.cpp

maxnick · 2024-09-24T15:33:46Z

src/plugins/intel_cpu/src/graph.cpp

+                if (memInp && memInp->haveSubgraph()) {
+                    // Since the ReadValueWithSubgraph is middle node, just add this branch in order to use
+                    // ProxyMemoryBlock to share memory
+                    edge->getParent()->resolveInPlaceEdges(Edge::LOOK_UP);


The problem is in the MemoryInput::selectOptimalPrimitiveDescriptor implementation you developed for the subgraph scenario. There you:

Lose the inPlace port tag, when copy the output port config from the internal subgraph. Therefore the output edge inplace look up property is lost and the resolveInPlaceEdges(Edge::LOOK_UP) isn't being called for such a node.

It doesn't make sense to enforce the output port memory descriptor to mimic the internal subgraph output node memory descriptor, as the data are being read from the internal subgraph pretty rarely (only per reset state). So that it makes sense to stick with the external memory representation (just a default behavior of the the MemoryInput node), as the data will be read from the state via proxy memory manager more often.
Thus I would recommend the following:

Don't redefine selectPrimitiveDescriptor as it's more important to preserve the external graph memory representation to avoid extra reorders in runtime

Provide the MemoryInput node output memory descriptors to the subgraph output config, so that the subgraph does all the necessary memory layout alignment for us.

2: cast to MemoryNode. Signed-off-by: xipingya <xiping.yan@intel.com>

2: ReadValueWithSubgraphNode->ReadValueWithSubgraph 3: ReadValueWithSubgraph add new inheritance public ov::op::util::VariableExtension 4: Remove private variable: m_variable; 5: Change MAX_RECURSIVE_DEEP_CHECK_NODE to constexpr 6: Removed: using MemoryInputBase::MemoryInputBase Signed-off-by: xipingya <xiping.yan@intel.com>

2. MemoryInputBase::isSupportedOperation(op, errorMessage) also should be called. Signed-off-by: xipingya <xiping.yan@intel.com>

Signed-off-by: xipingya <xiping.yan@intel.com>

yuxu42 · 2024-09-29T00:16:02Z

Hi @maxnick could you please review the PR again? Thanks!

maxnick · 2024-09-30T08:21:05Z

@EgorDuplensky, do you have any further comments from your side?

xipingyan added 3 commits August 19, 2024 02:18

Add profiler for CPU plugin.

2916414

Profile each node execute time. Support Static and Dynamic infer. Signed-off-by: xipingya <xiping.yan@intel.com>

Mark ReadValue's inputs and corresponding Assign.

451c76d

If reset is not called, these marked nodes also desn't need to be executed. Signed-off-by: xipingya <xiping.yan@intel.com>

Only mark: ReadValue->Assign pairs.

137beee

Signed-off-by: xipingya <xiping.yan@intel.com>

xipingyan requested a review from maxnick August 20, 2024 08:33

github-actions bot added category: Core OpenVINO Core (aka ngraph) category: CPU OpenVINO CPU plugin category: transformations OpenVINO Runtime library - Transformations category: CPP API OpenVINO CPP API bindings labels Aug 20, 2024

xipingyan requested review from yuxu42 and ceciliapeng2011 August 20, 2024 08:34

xipingyan added 2 commits August 21, 2024 06:18

Optimize pattern match.

737fe5c

Signed-off-by: xipingya <xiping.yan@intel.com>

transformation test pass.

6b05005

Signed-off-by: xipingya <xiping.yan@intel.com>

github-actions bot removed the category: transformations OpenVINO Runtime library - Transformations label Sep 3, 2024

xipingyan added 11 commits September 6, 2024 02:10

Test pass.

58d9f6f

Signed-off-by: xipingya <xiping.yan@intel.com>

Fix error: one param link to mulitple ReadValueWithSubgraphNode

d54dc25

Signed-off-by: xipingya <xiping.yan@intel.com>

Add submodel infer to MemoryInput::runDynamic

a533d73

Signed-off-by: xipingya <xiping.yan@intel.com>

Debug code

f7339e3

Merge remote-tracking branch 'origin/master' into xp/whisper_readvalu…

e142a06

…e_optimize

fix merge error

4a2dba0

Dynamic shape test pass

bf7e493

Signed-off-by: xipingya <xiping.yan@intel.com>

test whisper pass

d90144e

Disable debug log to test performance. Got expected result:

5a98e7b

decoder network: 20ms -> 5 ms. Signed-off-by: xipingya <xiping.yan@intel.com>

Add test.

577721d

Remove stateName in ov::Node

c23062e

Signed-off-by: xipingya <xiping.yan@intel.com>

github-actions bot removed category: Core OpenVINO Core (aka ngraph) category: CPP API OpenVINO CPP API bindings labels Sep 10, 2024

xipingyan added 4 commits September 10, 2024 01:27

Add env: ENABLE_RV for comprison test.

592919b

Merge branch 'master' into xp/whisper_readvalue_optimize

f134307

Merge branch 'xp/whisper_readvalue_optimize' of https://github.com/xi…

258c3c8

…pingyan/openvino into xp/whisper_readvalue_optimize

rm debug log

5c771b0

Merge branch 'master' into xp/whisper_readvalue_optimize

87664d4

xipingyan marked this pull request as ready for review September 18, 2024 02:19

xipingyan requested review from a team as code owners September 18, 2024 02:19

xipingyan added 2 commits September 18, 2024 02:20

Revert unchanged code.

153b4b8

Simplify codes.

e188276

Signed-off-by: xipingya <xiping.yan@intel.com>

xipingyan commented Sep 18, 2024

View reviewed changes

xipingyan added 2 commits September 18, 2024 05:49

Add judge whether subGraph can be called.

2833602

Signed-off-by: xipingya <xiping.yan@intel.com>

Fix test fail issue: readvalue have no any input.

0a6f13f

Signed-off-by: xipingya <xiping.yan@intel.com>

EgorDuplensky reviewed Sep 18, 2024

View reviewed changes

src/plugins/intel_cpu/src/nodes/memory.cpp Outdated Show resolved Hide resolved

xipingyan added 2 commits September 19, 2024 02:10

Remove get_body, m_subgraph, and update haveSubgraph.

7ae318f

Signed-off-by: xipingya <xiping.yan@intel.com>

Replace set_body with base class set_function

79b8272

Signed-off-by: xipingya <xiping.yan@intel.com>

xipingyan requested a review from EgorDuplensky September 19, 2024 02:52

remove debug env

9c0989d

maxnick assigned EgorDuplensky and maxnick Sep 23, 2024

maxnick reviewed Sep 24, 2024

View reviewed changes

xipingyan added 5 commits September 25, 2024 06:50

1: Add check memoryNode null

3eaea83

2: cast to MemoryNode. Signed-off-by: xipingya <xiping.yan@intel.com>

Remove getSupportedDescriptors in cpp.

266cf38

1. Removed const MemoryPtr& prime_mem() const

e94e67f

2. MemoryInputBase::isSupportedOperation(op, errorMessage) also should be called. Signed-off-by: xipingya <xiping.yan@intel.com>

1: Remove redefine supportedPrimitiveDescriptors

e5402f8

Signed-off-by: xipingya <xiping.yan@intel.com>

xipingyan requested a review from maxnick September 26, 2024 05:51

Merge branch 'master' into xp/whisper_readvalue_optimize

74147b7

This was referenced Sep 27, 2024

[CPU] Fuse SDPA before/after Reshape+Transpose Node to SDPA xipingyan/openvino#4

Draft

[CPU] Fuse SDPA before/after Reshape+Transpose Node to SDPA #26819

Open

EgorDuplensky approved these changes Oct 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU]whisper readvalue optimize #26130

[CPU]whisper readvalue optimize #26130

xipingyan commented Aug 20, 2024 •

edited

Loading

xipingyan Sep 18, 2024

maxnick Sep 24, 2024

xipingyan Sep 26, 2024

maxnick Sep 23, 2024

xipingyan Sep 25, 2024

maxnick Oct 2, 2024

maxnick Sep 24, 2024

yuxu42 commented Sep 29, 2024

maxnick commented Sep 30, 2024

[CPU]whisper readvalue optimize #26130

Are you sure you want to change the base?

[CPU]whisper readvalue optimize #26130

Conversation

xipingyan commented Aug 20, 2024 • edited Loading

Details:

Tickets:

xipingyan Sep 18, 2024

Choose a reason for hiding this comment

maxnick Sep 24, 2024

Choose a reason for hiding this comment

xipingyan Sep 26, 2024

Choose a reason for hiding this comment

maxnick Sep 23, 2024

Choose a reason for hiding this comment

xipingyan Sep 25, 2024

Choose a reason for hiding this comment

maxnick Oct 2, 2024

Choose a reason for hiding this comment

maxnick Sep 24, 2024

Choose a reason for hiding this comment

yuxu42 commented Sep 29, 2024

maxnick commented Sep 30, 2024

xipingyan commented Aug 20, 2024 •

edited

Loading