-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU]whisper readvalue optimize #26130
base: master
Are you sure you want to change the base?
[CPU]whisper readvalue optimize #26130
Conversation
Profile each node execute time. Support Static and Dynamic infer. Signed-off-by: xipingya <xiping.yan@intel.com>
If reset is not called, these marked nodes also desn't need to be executed. Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
decoder network: 20ms -> 5 ms. Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
if (memInp && memInp->haveSubgraph()) { | ||
// Since the ReadValueWithSubgraph is middle node, just add this branch in order to use | ||
// ProxyMemoryBlock to share memory | ||
edge->getParent()->resolveInPlaceEdges(Edge::LOOK_UP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @maxnick ,
For example:
MemoryInput(ReadValueWithSubgraph)->computer nodes
MemoryInput(ReadValueWithSubgraph)->MemoryOutputStub
MemoryInput(ReadValueWithSubgraph)->Stateful
I have to add this branch to call MemoryInput::resolveInPlaceEdges, it will let Stateful
and MemoryOutputStub
, computer nodes
, share memory. It is reasonable.
If not call, these nodes's output will be inPlace, but MemoryOutputStub will
allocate memory, so maybe these output will empty ptr. Just clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is in the MemoryInput::selectOptimalPrimitiveDescriptor
implementation you developed for the subgraph scenario. There you:
- Lose the inPlace port tag, when copy the output port config from the internal subgraph. Therefore the output edge inplace look up property is lost and the
resolveInPlaceEdges(Edge::LOOK_UP)
isn't being called for such a node. - It doesn't make sense to enforce the output port memory descriptor to mimic the internal subgraph output node memory descriptor, as the data are being read from the internal subgraph pretty rarely (only per reset state). So that it makes sense to stick with the external memory representation (just a default behavior of the the MemoryInput node), as the data will be read from the state via proxy memory manager more often.
Thus I would recommend the following: - Don't redefine selectPrimitiveDescriptor as it's more important to preserve the external graph memory representation to avoid extra reorders in runtime
- Provide the MemoryInput node output memory descriptors to the subgraph output config, so that the subgraph does all the necessary memory layout alignment for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1: Removed redefine supportedPrimitiveDescriptors
2: Internal subgraph provide input setting InputConfig
interface. but there is no OutputConfig
setting interface.
@maxnick
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
src/plugins/intel_cpu/src/transformations/cpu_opset/common/op/read_value_with_subgraph.hpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/transformations/cpu_opset/common/op/read_value_with_subgraph.hpp
Show resolved
Hide resolved
src/plugins/intel_cpu/src/transformations/cpu_opset/common/op/read_value_with_subgraph.hpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/transformations/cpu_opset/common/op/read_value_with_subgraph.hpp
Outdated
Show resolved
Hide resolved
Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
auto memInput = std::dynamic_pointer_cast<node::MemoryInput>(node); | ||
if (memInput) { | ||
return memInput->haveSubgraph(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my understanding, in all the cases when Assign is directly attached to the ReadValue node, we should replace it with a stub, since the assign node is practically useless. ReadValue->Assign pair means that the state values aren't really changed by the assign node. So it looks like it can always be safely replaced with a stub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Same to my first understanding.
But it also depends on another factor, the memory is VariableStateDoubleBuffer
or VariableStateSingleBuffer
(I introduced).
For VariableStateSingleBuffer
, it is OK, I can replace with Stub for ReadValue->Assign pair.
For VariableStateDoubleBuffer
, it is not safe.
About using VariableStateSingleBuffer
or VariableStateDoubleBuffer
, it still depends on if (haveSubgraph())
in my codes.
@maxnick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my understanding, when we have a direct ReadValue->Assign pair, we most definitely should use a single buffer, as nothing new will be written to the state during the assign stage. May be we can check for the MemoryOutput child in the MemoryInput node to select an appropriate state type (i.e. single buffer or double buffer).
src/plugins/intel_cpu/src/transformations/cpu_opset/common/op/read_value_with_subgraph.hpp
Outdated
Show resolved
Hide resolved
if (memInp && memInp->haveSubgraph()) { | ||
// Since the ReadValueWithSubgraph is middle node, just add this branch in order to use | ||
// ProxyMemoryBlock to share memory | ||
edge->getParent()->resolveInPlaceEdges(Edge::LOOK_UP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is in the MemoryInput::selectOptimalPrimitiveDescriptor
implementation you developed for the subgraph scenario. There you:
- Lose the inPlace port tag, when copy the output port config from the internal subgraph. Therefore the output edge inplace look up property is lost and the
resolveInPlaceEdges(Edge::LOOK_UP)
isn't being called for such a node. - It doesn't make sense to enforce the output port memory descriptor to mimic the internal subgraph output node memory descriptor, as the data are being read from the internal subgraph pretty rarely (only per reset state). So that it makes sense to stick with the external memory representation (just a default behavior of the the MemoryInput node), as the data will be read from the state via proxy memory manager more often.
Thus I would recommend the following: - Don't redefine selectPrimitiveDescriptor as it's more important to preserve the external graph memory representation to avoid extra reorders in runtime
- Provide the MemoryInput node output memory descriptors to the subgraph output config, so that the subgraph does all the necessary memory layout alignment for us.
2: cast to MemoryNode. Signed-off-by: xipingya <xiping.yan@intel.com>
2: ReadValueWithSubgraphNode->ReadValueWithSubgraph 3: ReadValueWithSubgraph add new inheritance public ov::op::util::VariableExtension 4: Remove private variable: m_variable; 5: Change MAX_RECURSIVE_DEEP_CHECK_NODE to constexpr 6: Removed: using MemoryInputBase::MemoryInputBase Signed-off-by: xipingya <xiping.yan@intel.com>
2. MemoryInputBase::isSupportedOperation(op, errorMessage) also should be called. Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xipingya <xiping.yan@intel.com>
Hi @maxnick could you please review the PR again? Thanks! |
@EgorDuplensky, do you have any further comments from your side? |
Details:
ReadValueWithSubgraph
node.ReadValue
's initial subgraph nodes toReadValueWithSubgraph
ReadValueWithSubgraph
toMemoryInput
MemoryOutput
toMemoryOutputStub
Init
andActivate
of ov::intel_cpu::Graph, avoid to memory copy. Refer: [CPU] Introduce SubModel op and Composite node #25385Tickets: