Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPUW: Adding new config option to reshape weights #25691

Conversation

ujjayant-kadian
Copy link
Contributor

@ujjayant-kadian ujjayant-kadian commented Jul 23, 2024

Details:

Adding new config option particularly "NPUW_RESHAPE_WEIGHTS" which is not applicable to all the patterns right now. If the option is true, the input weights will be reshaped as an action to improve performance. No other pattern (for other LLMs) is impacted by this new option, since each pattern has their own class for the pattern matcher. This transpose will occur just before the weights are decompressed into f16.
For example, the shapes of the input weight tensors will be changed like:
Input weight shapes: [4096,32,128]
New Input weight shapes: [32,128,4096]

Tickets:

  • 126327

@dmatveev dmatveev changed the title NPUW: Adding new config option particularly "NPUW_TRANSPOSE_WEIGHTS" NPUW: Adding new config option to transpose weights Aug 7, 2024
Copy link
Contributor

@dmatveev dmatveev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where in the code the weights are actually transposed?

Copy link
Contributor

@dmatveev dmatveev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How the unpack is supposed to work now? I believe the closure tensor we have stored in the closure stays with older dims?

Comment on lines 183 to 184
* Tranpose input weight tensors before the decompression procedure.
* Works only with function "NPUW_FOLD"ing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the right wording here is "before passing as inputs". And this option applies to FOLD and CWAI modes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed!

@@ -1614,8 +1616,12 @@ void Partitioner::decompressionCutOff(const std::string& func_name) {
->build();

// ChatGLM (GPTQ) and New LLaMa-v2 patterns (Symmetric)
rewr.add_matcher<ov::npuw::patterns::SymmZP::DCOFFPassReshape1>(dcoff_mode, dcoff_type, std::ref(params_to))
->build();
auto dcoffPassReshape1 = std::make_shared<ov::npuw::patterns::SymmZP::DCOFFPassReshape1>(dcoff_mode,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AsyaPronina is right here, it should be added to the base class (like you actually did) and in the constructor (which is derived for all particular patterns).

@@ -329,6 +329,14 @@ bool DCOFFPassBase::matcher_callback(ov::pass::pattern::Matcher& m) {
NPUW_ASSERT(ov::op::util::is_parameter(matched_nodeC));

auto matched_paramA = std::static_pointer_cast<ov::op::v0::Parameter>(matched_nodeA);
// Transpose weights specifically for QWEN as of now.
if (getTransposeWeights()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if getter is required here, in other cases passes just refer to their member variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed!

Comment on lines 334 to 338
ov::PartialShape current_shape = matched_paramA->get_partial_shape();
ov::Shape static_shape = current_shape.to_shape();
ov::Shape new_order = {static_shape[1], static_shape[2], static_shape[0]};
ov::PartialShape new_shape(new_order);
matched_paramA->set_partial_shape(new_shape);
Copy link
Contributor

@dmatveev dmatveev Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a base class pattern. It works for all other patterns.

Here you make a strong assumption you have group-quant weights with three dimensions. But what if you dont?

Also, the recommendation given was to ensure that the longest dimension is the last one.
At least please check that this contract is satisfied in your new_order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would we handle other cases then? What to do if the shape is dynamic and not 3?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write a proper if

Comment on lines 339 to 341
}
auto matched_valueB = std::static_pointer_cast<ov::op::v0::Constant>(matched_nodeB);
auto matched_paramC = std::static_pointer_cast<ov::op::v0::Parameter>(matched_nodeC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this only change is not enough.

If you reshape the weights here, you will also need to reshape the Scale coefficient parameters if Scale parameters stay in in the model (no DCOFF).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, reshaping both weights and scales - i.e., if reshapable.

@ujjayant-kadian ujjayant-kadian changed the title NPUW: Adding new config option to transpose weights NPUW: Adding new config option to reshape weights Aug 26, 2024
@ujjayant-kadian
Copy link
Contributor Author

How the unpack is supposed to work now? I believe the closure tensor we have stored in the closure stays with older dims?

The unpack is processing the original tensors. yes.

/**
* @brief
* Type: bool.
* Tranpose input weight and the corresponding scale and zero tensors (if any) before passing as inputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Tranpose input weight and the corresponding scale and zero tensors (if any) before passing as inputs.
* Transpose input weight and the corresponding scale and zero tensors (if any) before passing as inputs, if required

// Old LLaMa-v2 patterns (Symmetric)
rewr.add_matcher<ov::npuw::patterns::SymmNoZP::DCOFFPassMatMul>(dcoff_mode, dcoff_type, std::ref(params_to))
rewr.add_matcher<ov::npuw::patterns::SymmNoZP::DCOFFPassMatMul>(dcoff_mode, dcoff_type, std::ref(params_to), enable_transpose)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as the configuration parameters go first, and the "output" remapping goes last, probably it should be

Suggested change
rewr.add_matcher<ov::npuw::patterns::SymmNoZP::DCOFFPassMatMul>(dcoff_mode, dcoff_type, std::ref(params_to), enable_transpose)
rewr.add_matcher<ov::npuw::patterns::SymmNoZP::DCOFFPassMatMul>(dcoff_mode, dcoff_type, enable_transpose, std::ref(params_to))

@@ -1606,29 +1606,32 @@ void Partitioner::decompressionCutOff(const std::string& func_name) {
{
LOG_BLOCK();

bool enable_transpose = cfg.get<::intel_npu::NPUW_TRANSPOSE_WEIGHTS>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool enable_transpose = cfg.get<::intel_npu::NPUW_TRANSPOSE_WEIGHTS>();
const bool enable_transpose = cfg.get<::intel_npu::NPUW_TRANSPOSE_WEIGHTS>();

Comment on lines +28 to +46
while (current_node) {
// Check if the current node is a MatMul
if (auto matmul = std::dynamic_pointer_cast<ov::op::v0::MatMul>(current_node)) {
return matmul;
}
// Move to the next node in the path if there is one
if (!current_node->outputs().empty()) {
auto output = current_node->outputs().at(0);
if (!output.get_target_inputs().empty()) {
current_node = output.get_target_inputs().begin()->get_node()->shared_from_this();
} else {
// No further outputs, end the search
break;
}
} else {
// No outputs, end the search
break;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no loop, it must be a straight direct link from your start_node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not be true for every case. After the root node for example, reshape there maybe a convert present.

Comment on lines 66 to 68
if (!matmul_node) {
LOG_DEBUG("NOT a MATMUL NODE!");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just place an assert that the pointer is not null. Also, who'd pass a null pointer here?

Comment on lines 74 to 79
if (input_shape.size() >= 2) {
size_t max_dim = *std::max_element(input_shape.begin(), input_shape.end());
if (input_shape[0] != max_dim) {
return true; // Transpose is required
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is not the check that's supposed to be here.

auto partial_shape = param->get_partial_shape();

// Ensure the shape is static before proceeding
if (partial_shape.is_static()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace with ASSERT.

auto shape = partial_shape.to_shape();

// Check if the shape is 2D or 3D and needs transposing
if (shape.size() == 2 && shape[0] < shape[1]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we shouldn't look at < or max_elements here. The requirement is not to place the max dim to the proper dimension. Please check the issue description

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Stopped review at this point)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rectified!

Comment on lines 220 to 261
// Check if the index is marked for transposition
if (std::find(m.transpose_indices.begin(), m.transpose_indices.end(), i) != m.transpose_indices.end()) {
// Transpose the tensor before adding it to new_closure
new_closure.push_back(pattern_utils::transpose_tensor(fcall._closure[i]));
} else {
// Add the original tensor to new_closure
new_closure.push_back(fcall._closure[i]);
}

// Handle scale remap
auto scale_iter = m.scale_remap.find(i);
new_scales.push_back(scale_iter != m.scale_remap.end() ? fcall._closure[scale_iter->second] : ov::Tensor());
// Check for asymmetric zero points and add them to new_zerops
if (scale_iter != m.scale_remap.end()) {
// Check if the scale index is marked for transposition
if (std::find(m.transpose_indices.begin(), m.transpose_indices.end(), scale_iter->second) != m.transpose_indices.end()) {
// Transpose the tensor before adding it to new_scales
new_scales.push_back(pattern_utils::transpose_tensor(fcall._closure[scale_iter->second]));
} else {
// Add the original tensor to new_scales
new_scales.push_back(fcall._closure[scale_iter->second]);
}
} else {
new_scales.push_back(ov::Tensor());
}

// Handle zero point remap
auto zerop_iter = m.zerop_remap.find(i);
const auto& zerop = zerop_iter != m.zerop_remap.end() ? fcall._closure[zerop_iter->second] : m.zero_points[i];
new_zerops.push_back(zerop);
if (zerop_iter != m.zerop_remap.end()) {
// Check if the zero point index is marked for transposition
if (std::find(m.transpose_indices.begin(), m.transpose_indices.end(), zerop_iter->second) != m.transpose_indices.end()) {
// Transpose the tensor before adding it to new_zerops
new_zerops.push_back(pattern_utils::transpose_tensor(fcall._closure[zerop_iter->second]));
} else {
// Add the original tensor to new_zerops
new_zerops.push_back(fcall._closure[zerop_iter->second]);
}
} else {
// Add the zero point tensor from the closure remap
const auto& zerop = m.zero_points[i];
new_zerops.push_back(zerop);
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks too much, you can form the tensor first and then transpose only the affected once. It may be one more separate loop. Will make the code clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed!

Comment on lines 25 to 33
namespace pattern_utils {

std::shared_ptr<ov::op::v0::MatMul> find_matmul_downwards(const std::shared_ptr<ov::Node>& start_node);
std::shared_ptr<ov::op::v0::MatMul> get_root_matmul(ov::pass::pattern::Matcher& m);
bool transpose_required(const std::shared_ptr<ov::op::v0::MatMul>& matmul_node);
ov::Tensor transpose_tensor(const ov::Tensor& tensor);

} // namespace matmul_utils

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need these definitions here if there's just dcoff.cpp is using it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed!

@dmatveev dmatveev added this to the 2024.5 milestone Sep 3, 2024
Copy link
Contributor

This PR will be closed in a week because of 2 weeks of no activity.

@github-actions github-actions bot added the Stale label Sep 21, 2024
Copy link
Contributor

This PR was closed because it has been stalled for 2 week with no activity.

@github-actions github-actions bot closed this Sep 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants