-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NPUW: Adding new config option to reshape weights #25691
NPUW: Adding new config option to reshape weights #25691
Conversation
promoting to something new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where in the code the weights are actually transposed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How the unpack is supposed to work now? I believe the closure tensor we have stored in the closure stays with older dims?
* Tranpose input weight tensors before the decompression procedure. | ||
* Works only with function "NPUW_FOLD"ing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the right wording here is "before passing as inputs". And this option applies to FOLD
and CWAI
modes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed!
@@ -1614,8 +1616,12 @@ void Partitioner::decompressionCutOff(const std::string& func_name) { | |||
->build(); | |||
|
|||
// ChatGLM (GPTQ) and New LLaMa-v2 patterns (Symmetric) | |||
rewr.add_matcher<ov::npuw::patterns::SymmZP::DCOFFPassReshape1>(dcoff_mode, dcoff_type, std::ref(params_to)) | |||
->build(); | |||
auto dcoffPassReshape1 = std::make_shared<ov::npuw::patterns::SymmZP::DCOFFPassReshape1>(dcoff_mode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AsyaPronina is right here, it should be added to the base class (like you actually did) and in the constructor (which is derived for all particular patterns).
@@ -329,6 +329,14 @@ bool DCOFFPassBase::matcher_callback(ov::pass::pattern::Matcher& m) { | |||
NPUW_ASSERT(ov::op::util::is_parameter(matched_nodeC)); | |||
|
|||
auto matched_paramA = std::static_pointer_cast<ov::op::v0::Parameter>(matched_nodeA); | |||
// Transpose weights specifically for QWEN as of now. | |||
if (getTransposeWeights()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if getter is required here, in other cases passes just refer to their member variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed!
ov::PartialShape current_shape = matched_paramA->get_partial_shape(); | ||
ov::Shape static_shape = current_shape.to_shape(); | ||
ov::Shape new_order = {static_shape[1], static_shape[2], static_shape[0]}; | ||
ov::PartialShape new_shape(new_order); | ||
matched_paramA->set_partial_shape(new_shape); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a base class pattern. It works for all other patterns.
Here you make a strong assumption you have group-quant weights with three dimensions. But what if you dont?
Also, the recommendation given was to ensure that the longest dimension is the last one.
At least please check that this contract is satisfied in your new_order
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we handle other cases then? What to do if the shape is dynamic and not 3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
write a proper if
src/plugins/intel_npu/src/plugin/npuw/partitioning/patterns/dcoff.cpp
Outdated
Show resolved
Hide resolved
} | ||
auto matched_valueB = std::static_pointer_cast<ov::op::v0::Constant>(matched_nodeB); | ||
auto matched_paramC = std::static_pointer_cast<ov::op::v0::Parameter>(matched_nodeC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this only change is not enough.
If you reshape the weights here, you will also need to reshape the Scale coefficient parameters if Scale parameters stay in in the model (no DCOFF).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, reshaping both weights and scales - i.e., if reshapable.
…ub.com/ujjayant-kadian/openvino into uk/adding-new-option-change-layout-npw
The unpack is processing the original tensors. yes. |
/** | ||
* @brief | ||
* Type: bool. | ||
* Tranpose input weight and the corresponding scale and zero tensors (if any) before passing as inputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Tranpose input weight and the corresponding scale and zero tensors (if any) before passing as inputs. | |
* Transpose input weight and the corresponding scale and zero tensors (if any) before passing as inputs, if required |
// Old LLaMa-v2 patterns (Symmetric) | ||
rewr.add_matcher<ov::npuw::patterns::SymmNoZP::DCOFFPassMatMul>(dcoff_mode, dcoff_type, std::ref(params_to)) | ||
rewr.add_matcher<ov::npuw::patterns::SymmNoZP::DCOFFPassMatMul>(dcoff_mode, dcoff_type, std::ref(params_to), enable_transpose) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as the configuration parameters go first, and the "output" remapping goes last, probably it should be
rewr.add_matcher<ov::npuw::patterns::SymmNoZP::DCOFFPassMatMul>(dcoff_mode, dcoff_type, std::ref(params_to), enable_transpose) | |
rewr.add_matcher<ov::npuw::patterns::SymmNoZP::DCOFFPassMatMul>(dcoff_mode, dcoff_type, enable_transpose, std::ref(params_to)) |
@@ -1606,29 +1606,32 @@ void Partitioner::decompressionCutOff(const std::string& func_name) { | |||
{ | |||
LOG_BLOCK(); | |||
|
|||
bool enable_transpose = cfg.get<::intel_npu::NPUW_TRANSPOSE_WEIGHTS>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool enable_transpose = cfg.get<::intel_npu::NPUW_TRANSPOSE_WEIGHTS>(); | |
const bool enable_transpose = cfg.get<::intel_npu::NPUW_TRANSPOSE_WEIGHTS>(); |
while (current_node) { | ||
// Check if the current node is a MatMul | ||
if (auto matmul = std::dynamic_pointer_cast<ov::op::v0::MatMul>(current_node)) { | ||
return matmul; | ||
} | ||
// Move to the next node in the path if there is one | ||
if (!current_node->outputs().empty()) { | ||
auto output = current_node->outputs().at(0); | ||
if (!output.get_target_inputs().empty()) { | ||
current_node = output.get_target_inputs().begin()->get_node()->shared_from_this(); | ||
} else { | ||
// No further outputs, end the search | ||
break; | ||
} | ||
} else { | ||
// No outputs, end the search | ||
break; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be no loop, it must be a straight direct link from your start_node
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not be true for every case. After the root node for example, reshape there maybe a convert present.
if (!matmul_node) { | ||
LOG_DEBUG("NOT a MATMUL NODE!"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just place an assert that the pointer is not null. Also, who'd pass a null pointer here?
if (input_shape.size() >= 2) { | ||
size_t max_dim = *std::max_element(input_shape.begin(), input_shape.end()); | ||
if (input_shape[0] != max_dim) { | ||
return true; // Transpose is required | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is not the check that's supposed to be here.
auto partial_shape = param->get_partial_shape(); | ||
|
||
// Ensure the shape is static before proceeding | ||
if (partial_shape.is_static()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace with ASSERT.
auto shape = partial_shape.to_shape(); | ||
|
||
// Check if the shape is 2D or 3D and needs transposing | ||
if (shape.size() == 2 && shape[0] < shape[1]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we shouldn't look at <
or max_elements here. The requirement is not to place the max dim to the proper dimension. Please check the issue description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Stopped review at this point)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rectified!
// Check if the index is marked for transposition | ||
if (std::find(m.transpose_indices.begin(), m.transpose_indices.end(), i) != m.transpose_indices.end()) { | ||
// Transpose the tensor before adding it to new_closure | ||
new_closure.push_back(pattern_utils::transpose_tensor(fcall._closure[i])); | ||
} else { | ||
// Add the original tensor to new_closure | ||
new_closure.push_back(fcall._closure[i]); | ||
} | ||
|
||
// Handle scale remap | ||
auto scale_iter = m.scale_remap.find(i); | ||
new_scales.push_back(scale_iter != m.scale_remap.end() ? fcall._closure[scale_iter->second] : ov::Tensor()); | ||
// Check for asymmetric zero points and add them to new_zerops | ||
if (scale_iter != m.scale_remap.end()) { | ||
// Check if the scale index is marked for transposition | ||
if (std::find(m.transpose_indices.begin(), m.transpose_indices.end(), scale_iter->second) != m.transpose_indices.end()) { | ||
// Transpose the tensor before adding it to new_scales | ||
new_scales.push_back(pattern_utils::transpose_tensor(fcall._closure[scale_iter->second])); | ||
} else { | ||
// Add the original tensor to new_scales | ||
new_scales.push_back(fcall._closure[scale_iter->second]); | ||
} | ||
} else { | ||
new_scales.push_back(ov::Tensor()); | ||
} | ||
|
||
// Handle zero point remap | ||
auto zerop_iter = m.zerop_remap.find(i); | ||
const auto& zerop = zerop_iter != m.zerop_remap.end() ? fcall._closure[zerop_iter->second] : m.zero_points[i]; | ||
new_zerops.push_back(zerop); | ||
if (zerop_iter != m.zerop_remap.end()) { | ||
// Check if the zero point index is marked for transposition | ||
if (std::find(m.transpose_indices.begin(), m.transpose_indices.end(), zerop_iter->second) != m.transpose_indices.end()) { | ||
// Transpose the tensor before adding it to new_zerops | ||
new_zerops.push_back(pattern_utils::transpose_tensor(fcall._closure[zerop_iter->second])); | ||
} else { | ||
// Add the original tensor to new_zerops | ||
new_zerops.push_back(fcall._closure[zerop_iter->second]); | ||
} | ||
} else { | ||
// Add the zero point tensor from the closure remap | ||
const auto& zerop = m.zero_points[i]; | ||
new_zerops.push_back(zerop); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks too much, you can form the tensor first and then transpose only the affected once. It may be one more separate loop. Will make the code clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed!
namespace pattern_utils { | ||
|
||
std::shared_ptr<ov::op::v0::MatMul> find_matmul_downwards(const std::shared_ptr<ov::Node>& start_node); | ||
std::shared_ptr<ov::op::v0::MatMul> get_root_matmul(ov::pass::pattern::Matcher& m); | ||
bool transpose_required(const std::shared_ptr<ov::op::v0::MatMul>& matmul_node); | ||
ov::Tensor transpose_tensor(const ov::Tensor& tensor); | ||
|
||
} // namespace matmul_utils | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need these definitions here if there's just dcoff.cpp
is using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed!
…jjayant-kadian/openvino into uk/adding-new-option-change-layout-npw
This PR will be closed in a week because of 2 weeks of no activity. |
This PR was closed because it has been stalled for 2 week with no activity. |
Details:
Adding new config option particularly "NPUW_RESHAPE_WEIGHTS" which is not applicable to all the patterns right now. If the option is true, the input weights will be reshaped as an action to improve performance. No other pattern (for other LLMs) is impacted by this new option, since each pattern has their own class for the pattern matcher. This transpose will occur just before the weights are decompressed into f16.
For example, the shapes of the input weight tensors will be changed like:
Input weight shapes: [4096,32,128]
New Input weight shapes: [32,128,4096]
Tickets: