Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU]whisper readvalue optimize #26130

Open
wants to merge 47 commits into
base: master
Choose a base branch
from

Conversation

xipingyan
Copy link
Contributor

@xipingyan xipingyan commented Aug 20, 2024

Details:

  • New ReadValueWithSubgraph node.
  • Move ReadValue's initial subgraph nodes to ReadValueWithSubgraph
  • Mirror ReadValueWithSubgraph to MemoryInput
  • Replace MemoryOutput to MemoryOutputStub
  • Call new interface Init and Activate of ov::intel_cpu::Graph, avoid to memory copy. Refer: [CPU] Introduce SubModel op and Composite node #25385

Tickets:

  • 128743

Profile each node execute time.
Support Static and Dynamic infer.

Signed-off-by: xipingya <xiping.yan@intel.com>
If reset is not called, these marked nodes also desn't need to be executed.

Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
@xipingyan xipingyan requested a review from maxnick August 20, 2024 08:33
@github-actions github-actions bot added category: Core OpenVINO Core (aka ngraph) category: CPU OpenVINO CPU plugin category: transformations OpenVINO Runtime library - Transformations category: CPP API OpenVINO CPP API bindings labels Aug 20, 2024
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
@github-actions github-actions bot removed the category: transformations OpenVINO Runtime library - Transformations label Sep 3, 2024
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
decoder network: 20ms -> 5 ms.

Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
@github-actions github-actions bot removed category: Core OpenVINO Core (aka ngraph) category: CPP API OpenVINO CPP API bindings labels Sep 10, 2024
@xipingyan xipingyan marked this pull request as ready for review September 18, 2024 02:19
@xipingyan xipingyan requested review from a team as code owners September 18, 2024 02:19
Signed-off-by: xipingya <xiping.yan@intel.com>
if (memInp && memInp->haveSubgraph()) {
// Since the ReadValueWithSubgraph is middle node, just add this branch in order to use
// ProxyMemoryBlock to share memory
edge->getParent()->resolveInPlaceEdges(Edge::LOOK_UP);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @maxnick ,
For example:
MemoryInput(ReadValueWithSubgraph)->computer nodes
MemoryInput(ReadValueWithSubgraph)->MemoryOutputStub
MemoryInput(ReadValueWithSubgraph)->Stateful
I have to add this branch to call MemoryInput::resolveInPlaceEdges, it will let Stateful and MemoryOutputStub, computer nodes, share memory. It is reasonable.

If not call, these nodes's output will be inPlace, but MemoryOutputStub will
allocate memory, so maybe these output will empty ptr. Just clarify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is in the MemoryInput::selectOptimalPrimitiveDescriptor implementation you developed for the subgraph scenario. There you:

  1. Lose the inPlace port tag, when copy the output port config from the internal subgraph. Therefore the output edge inplace look up property is lost and the resolveInPlaceEdges(Edge::LOOK_UP) isn't being called for such a node.
  2. It doesn't make sense to enforce the output port memory descriptor to mimic the internal subgraph output node memory descriptor, as the data are being read from the internal subgraph pretty rarely (only per reset state). So that it makes sense to stick with the external memory representation (just a default behavior of the the MemoryInput node), as the data will be read from the state via proxy memory manager more often.
    Thus I would recommend the following:
  3. Don't redefine selectPrimitiveDescriptor as it's more important to preserve the external graph memory representation to avoid extra reorders in runtime
  4. Provide the MemoryInput node output memory descriptors to the subgraph output config, so that the subgraph does all the necessary memory layout alignment for us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1: Removed redefine supportedPrimitiveDescriptors
2: Internal subgraph provide input setting InputConfig interface. but there is no OutputConfig setting interface.
@maxnick

Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
Comment on lines +3078 to +3081
auto memInput = std::dynamic_pointer_cast<node::MemoryInput>(node);
if (memInput) {
return memInput->haveSubgraph();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding, in all the cases when Assign is directly attached to the ReadValue node, we should replace it with a stub, since the assign node is practically useless. ReadValue->Assign pair means that the state values aren't really changed by the assign node. So it looks like it can always be safely replaced with a stub.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Same to my first understanding.
But it also depends on another factor, the memory is VariableStateDoubleBuffer or VariableStateSingleBuffer(I introduced).
For VariableStateSingleBuffer, it is OK, I can replace with Stub for ReadValue->Assign pair.
For VariableStateDoubleBuffer, it is not safe.
About using VariableStateSingleBuffer or VariableStateDoubleBuffer, it still depends on if (haveSubgraph()) in my codes.
@maxnick

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding, when we have a direct ReadValue->Assign pair, we most definitely should use a single buffer, as nothing new will be written to the state during the assign stage. May be we can check for the MemoryOutput child in the MemoryInput node to select an appropriate state type (i.e. single buffer or double buffer).

src/plugins/intel_cpu/src/graph_optimizer.cpp Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/memory_state.h Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/memory_state.cpp Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/nodes/memory.hpp Show resolved Hide resolved
src/plugins/intel_cpu/src/nodes/memory.cpp Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/nodes/memory.cpp Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/nodes/memory.cpp Outdated Show resolved Hide resolved
if (memInp && memInp->haveSubgraph()) {
// Since the ReadValueWithSubgraph is middle node, just add this branch in order to use
// ProxyMemoryBlock to share memory
edge->getParent()->resolveInPlaceEdges(Edge::LOOK_UP);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is in the MemoryInput::selectOptimalPrimitiveDescriptor implementation you developed for the subgraph scenario. There you:

  1. Lose the inPlace port tag, when copy the output port config from the internal subgraph. Therefore the output edge inplace look up property is lost and the resolveInPlaceEdges(Edge::LOOK_UP) isn't being called for such a node.
  2. It doesn't make sense to enforce the output port memory descriptor to mimic the internal subgraph output node memory descriptor, as the data are being read from the internal subgraph pretty rarely (only per reset state). So that it makes sense to stick with the external memory representation (just a default behavior of the the MemoryInput node), as the data will be read from the state via proxy memory manager more often.
    Thus I would recommend the following:
  3. Don't redefine selectPrimitiveDescriptor as it's more important to preserve the external graph memory representation to avoid extra reorders in runtime
  4. Provide the MemoryInput node output memory descriptors to the subgraph output config, so that the subgraph does all the necessary memory layout alignment for us.

2: cast to MemoryNode.

Signed-off-by: xipingya <xiping.yan@intel.com>
2: ReadValueWithSubgraphNode->ReadValueWithSubgraph
3: ReadValueWithSubgraph add new inheritance public ov::op::util::VariableExtension
4: Remove private variable: m_variable;
5: Change MAX_RECURSIVE_DEEP_CHECK_NODE to constexpr
6: Removed: using MemoryInputBase::MemoryInputBase

Signed-off-by: xipingya <xiping.yan@intel.com>
2. MemoryInputBase::isSupportedOperation(op, errorMessage) also should be called.

Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
@yuxu42
Copy link
Contributor

yuxu42 commented Sep 29, 2024

Hi @maxnick could you please review the PR again? Thanks!

@maxnick
Copy link
Contributor

maxnick commented Sep 30, 2024

@EgorDuplensky, do you have any further comments from your side?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants