NPUW: Adding new config option to reshape weights #25691

ujjayant-kadian · 2024-07-23T11:44:29Z

Details:

Adding new config option particularly "NPUW_RESHAPE_WEIGHTS" which is not applicable to all the patterns right now. If the option is true, the input weights will be reshaped as an action to improve performance. No other pattern (for other LLMs) is impacted by this new option, since each pattern has their own class for the pattern matcher. This transpose will occur just before the weights are decompressed into f16.
For example, the shapes of the input weight tensors will be changed like:
Input weight shapes: [4096,32,128]
New Input weight shapes: [32,128,4096]

Tickets:

126327

promoting to something new

…e-layout-npw

dmatveev

Where in the code the weights are actually transposed?

dmatveev

How the unpack is supposed to work now? I believe the closure tensor we have stored in the closure stays with older dims?

dmatveev · 2024-08-22T17:28:07Z

src/plugins/intel_npu/src/al/include/npuw_private_properties.hpp

+ * Tranpose input weight tensors before the decompression procedure.
+ * Works only with function "NPUW_FOLD"ing.


I believe the right wording here is "before passing as inputs". And this option applies to FOLD and CWAI modes

src/plugins/intel_npu/src/al/include/npuw_private_properties.hpp

dmatveev · 2024-08-22T18:03:22Z

src/plugins/intel_npu/src/plugin/npuw/partitioning/partitioning.cpp

@@ -1614,8 +1616,12 @@ void Partitioner::decompressionCutOff(const std::string& func_name) {
            ->build();

        // ChatGLM (GPTQ) and New LLaMa-v2 patterns (Symmetric)
-        rewr.add_matcher<ov::npuw::patterns::SymmZP::DCOFFPassReshape1>(dcoff_mode, dcoff_type, std::ref(params_to))
-            ->build();
+        auto dcoffPassReshape1 = std::make_shared<ov::npuw::patterns::SymmZP::DCOFFPassReshape1>(dcoff_mode,


@AsyaPronina is right here, it should be added to the base class (like you actually did) and in the constructor (which is derived for all particular patterns).

dmatveev · 2024-08-22T18:04:07Z

src/plugins/intel_npu/src/plugin/npuw/partitioning/patterns/dcoff.cpp

@@ -329,6 +329,14 @@ bool DCOFFPassBase::matcher_callback(ov::pass::pattern::Matcher& m) {
    NPUW_ASSERT(ov::op::util::is_parameter(matched_nodeC));

    auto matched_paramA = std::static_pointer_cast<ov::op::v0::Parameter>(matched_nodeA);
+    // Transpose weights specifically for QWEN as of now.
+    if (getTransposeWeights()) {


not sure if getter is required here, in other cases passes just refer to their member variables.

dmatveev · 2024-08-22T18:05:43Z

src/plugins/intel_npu/src/plugin/npuw/partitioning/patterns/dcoff.cpp

+        ov::PartialShape current_shape = matched_paramA->get_partial_shape();
+        ov::Shape static_shape = current_shape.to_shape();
+        ov::Shape new_order = {static_shape[1], static_shape[2], static_shape[0]};
+        ov::PartialShape new_shape(new_order);
+        matched_paramA->set_partial_shape(new_shape);


This is a base class pattern. It works for all other patterns.

Here you make a strong assumption you have group-quant weights with three dimensions. But what if you dont?

Also, the recommendation given was to ensure that the longest dimension is the last one.
At least please check that this contract is satisfied in your new_order.

How would we handle other cases then? What to do if the shape is dynamic and not 3?

write a proper if

src/plugins/intel_npu/src/plugin/npuw/partitioning/patterns/dcoff.cpp

dmatveev · 2024-08-22T18:10:19Z

src/plugins/intel_npu/src/plugin/npuw/partitioning/patterns/dcoff.cpp

+    }
    auto matched_valueB = std::static_pointer_cast<ov::op::v0::Constant>(matched_nodeB);
    auto matched_paramC = std::static_pointer_cast<ov::op::v0::Parameter>(matched_nodeC);


So this only change is not enough.

If you reshape the weights here, you will also need to reshape the Scale coefficient parameters if Scale parameters stay in in the model (no DCOFF).

Yes, reshaping both weights and scales - i.e., if reshapable.

…ub.com/ujjayant-kadian/openvino into uk/adding-new-option-change-layout-npw

ujjayant-kadian · 2024-08-26T13:54:26Z

How the unpack is supposed to work now? I believe the closure tensor we have stored in the closure stays with older dims?

The unpack is processing the original tensors. yes.

dmatveev · 2024-08-30T18:04:48Z

src/plugins/intel_npu/src/al/include/npuw_private_properties.hpp

+/**
+ * @brief
+ * Type: bool.
+ * Tranpose input weight and the corresponding scale and zero tensors (if any) before passing as inputs.


Suggested change

* Tranpose input weight and the corresponding scale and zero tensors (if any) before passing as inputs.

* Transpose input weight and the corresponding scale and zero tensors (if any) before passing as inputs, if required

dmatveev · 2024-08-30T18:06:00Z

src/plugins/intel_npu/src/plugin/npuw/partitioning/partitioning.cpp

        // Old LLaMa-v2 patterns (Symmetric)
-        rewr.add_matcher<ov::npuw::patterns::SymmNoZP::DCOFFPassMatMul>(dcoff_mode, dcoff_type, std::ref(params_to))
+        rewr.add_matcher<ov::npuw::patterns::SymmNoZP::DCOFFPassMatMul>(dcoff_mode, dcoff_type, std::ref(params_to), enable_transpose)


as the configuration parameters go first, and the "output" remapping goes last, probably it should be

Suggested change

rewr.add_matcher<ov::npuw::patterns::SymmNoZP::DCOFFPassMatMul>(dcoff_mode, dcoff_type, std::ref(params_to), enable_transpose)

rewr.add_matcher<ov::npuw::patterns::SymmNoZP::DCOFFPassMatMul>(dcoff_mode, dcoff_type, enable_transpose, std::ref(params_to))

dmatveev · 2024-08-30T18:06:08Z

src/plugins/intel_npu/src/plugin/npuw/partitioning/partitioning.cpp

@@ -1606,29 +1606,32 @@ void Partitioner::decompressionCutOff(const std::string& func_name) {
    {
        LOG_BLOCK();

+        bool enable_transpose = cfg.get<::intel_npu::NPUW_TRANSPOSE_WEIGHTS>();


Suggested change

bool enable_transpose = cfg.get<::intel_npu::NPUW_TRANSPOSE_WEIGHTS>();

const bool enable_transpose = cfg.get<::intel_npu::NPUW_TRANSPOSE_WEIGHTS>();

dmatveev · 2024-08-30T18:07:01Z

src/plugins/intel_npu/src/plugin/npuw/partitioning/patterns/dcoff.cpp

+    while (current_node) {
+        // Check if the current node is a MatMul
+        if (auto matmul = std::dynamic_pointer_cast<ov::op::v0::MatMul>(current_node)) {
+            return matmul;
+        }
+        // Move to the next node in the path if there is one
+        if (!current_node->outputs().empty()) {
+            auto output = current_node->outputs().at(0);
+            if (!output.get_target_inputs().empty()) {
+                current_node = output.get_target_inputs().begin()->get_node()->shared_from_this();
+            } else {
+                // No further outputs, end the search
+                break;
+            }
+        } else {
+            // No outputs, end the search
+            break;
+        }
+    }


There should be no loop, it must be a straight direct link from your start_node.

It might not be true for every case. After the root node for example, reshape there maybe a convert present.

dmatveev · 2024-08-30T18:08:03Z

src/plugins/intel_npu/src/plugin/npuw/partitioning/patterns/dcoff.cpp

+    if (!matmul_node) {
+        LOG_DEBUG("NOT a MATMUL NODE!");
+    }


Just place an assert that the pointer is not null. Also, who'd pass a null pointer here?

dmatveev · 2024-08-30T18:12:40Z

src/plugins/intel_npu/src/plugin/npuw/partitioning/patterns/dcoff.cpp

+    if (input_shape.size() >= 2) {
+        size_t max_dim = *std::max_element(input_shape.begin(), input_shape.end());
+        if (input_shape[0] != max_dim) {
+            return true; // Transpose is required
+        }
+    }


I believe this is not the check that's supposed to be here.

dmatveev · 2024-08-30T18:14:21Z

src/plugins/intel_npu/src/plugin/npuw/partitioning/patterns/dcoff.cpp

+    auto partial_shape = param->get_partial_shape();
+
+    // Ensure the shape is static before proceeding
+    if (partial_shape.is_static()) {


replace with ASSERT.

dmatveev · 2024-08-30T18:15:22Z

src/plugins/intel_npu/src/plugin/npuw/partitioning/patterns/dcoff.cpp

+        auto shape = partial_shape.to_shape();
+
+        // Check if the shape is 2D or 3D and needs transposing
+        if (shape.size() == 2 && shape[0] < shape[1]) {


I believe we shouldn't look at < or max_elements here. The requirement is not to place the max dim to the proper dimension. Please check the issue description

(Stopped review at this point)

dmatveev · 2024-08-30T18:16:47Z

src/plugins/intel_npu/src/plugin/npuw/partitioning/patterns/dcoff.cpp

+        // Check if the index is marked for transposition
+        if (std::find(m.transpose_indices.begin(), m.transpose_indices.end(), i) != m.transpose_indices.end()) {
+            // Transpose the tensor before adding it to new_closure
+            new_closure.push_back(pattern_utils::transpose_tensor(fcall._closure[i]));
+        } else {
+            // Add the original tensor to new_closure
+            new_closure.push_back(fcall._closure[i]);
+        }

+        // Handle scale remap
        auto scale_iter = m.scale_remap.find(i);
-        new_scales.push_back(scale_iter != m.scale_remap.end() ? fcall._closure[scale_iter->second] : ov::Tensor());
-        // Check for asymmetric zero points and add them to new_zerops
+        if (scale_iter != m.scale_remap.end()) {
+            // Check if the scale index is marked for transposition
+            if (std::find(m.transpose_indices.begin(), m.transpose_indices.end(), scale_iter->second) != m.transpose_indices.end()) {
+                // Transpose the tensor before adding it to new_scales
+                new_scales.push_back(pattern_utils::transpose_tensor(fcall._closure[scale_iter->second]));
+            } else {
+                // Add the original tensor to new_scales
+                new_scales.push_back(fcall._closure[scale_iter->second]);
+            }
+        } else {
+            new_scales.push_back(ov::Tensor());
+        }
+
+        // Handle zero point remap
        auto zerop_iter = m.zerop_remap.find(i);
-        const auto& zerop = zerop_iter != m.zerop_remap.end() ? fcall._closure[zerop_iter->second] : m.zero_points[i];
-        new_zerops.push_back(zerop);
+        if (zerop_iter != m.zerop_remap.end()) {
+            // Check if the zero point index is marked for transposition
+            if (std::find(m.transpose_indices.begin(), m.transpose_indices.end(), zerop_iter->second) != m.transpose_indices.end()) {
+                // Transpose the tensor before adding it to new_zerops
+                new_zerops.push_back(pattern_utils::transpose_tensor(fcall._closure[zerop_iter->second]));
+            } else {
+                // Add the original tensor to new_zerops
+                new_zerops.push_back(fcall._closure[zerop_iter->second]);
+            }
+        } else {
+            // Add the zero point tensor from the closure remap
+            const auto& zerop = m.zero_points[i];
+            new_zerops.push_back(zerop);
+        }
    }
+


Looks too much, you can form the tensor first and then transpose only the affected once. It may be one more separate loop. Will make the code clear.

dmatveev · 2024-08-30T18:17:55Z

src/plugins/intel_npu/src/plugin/npuw/partitioning/patterns/dcoff.hpp

+namespace pattern_utils {
+
+std::shared_ptr<ov::op::v0::MatMul> find_matmul_downwards(const std::shared_ptr<ov::Node>& start_node);
+std::shared_ptr<ov::op::v0::MatMul> get_root_matmul(ov::pass::pattern::Matcher& m);
+bool transpose_required(const std::shared_ptr<ov::op::v0::MatMul>& matmul_node);
+ov::Tensor transpose_tensor(const ov::Tensor& tensor);
+
+}  // namespace matmul_utils
+


you don't need these definitions here if there's just dcoff.cpp is using it.

…jjayant-kadian/openvino into uk/adding-new-option-change-layout-npw

github-actions · 2024-09-21T00:21:46Z

This PR will be closed in a week because of 2 weeks of no activity.

github-actions · 2024-09-28T00:22:26Z

This PR was closed because it has been stalled for 2 week with no activity.

Kadian added 30 commits July 10, 2024 10:07

Added pre post processor to change layout

d5fc909

Removed unnecessary whitespace

5b7601c

Modified ppp to set layout for all the inputs

0cf6a28

Changed the logic for applying layout change

7a7a95c

Changed printing of the tensors

cfb0b4b

Printing shapes of to vector

36fde73

Observing the shape of to and from vector

106a4c8

Finding the number of tensors with size 4

a938c39

Tried something

ce72c44

Printing model inputs

e55169e

Printing input port shape

ec04cb2

Some new print statements

ce76d55

Print statements

a6a1abe

Implemented a naive implementation

d649336

Added print

c25f928

Tried new implementation

0fb3efb

Checking if the tranpose is correct

b300fe3

Added tests to verify the change

834cdfa

minor change

650ec8b

minor change

f37a9a6

minor change

1c09664

Testing - 3

db80df5

Revert to a6a1abe

cd90ae3

Trying something new - setting the correct shape of weights before

d75cc58

promoting to something new

Merge remote-tracking branch 'origin' into uk/adding-new-option-chang…

4eea38d

…e-layout-npw

Some print statements

b1f3d53

Changed the partial shape of the weight tensors is repeating graph

72523b4

Corrected the print message

7b27abe

Corrected an error

b59160b

Merge branch 'master' into uk/adding-new-option-change-layout-npw

3aee8bb

dmatveev changed the title ~~NPUW: Adding new config option particularly "NPUW_TRANSPOSE_WEIGHTS"~~ NPUW: Adding new config option to transpose weights Aug 7, 2024

Merge branch 'master' into uk/adding-new-option-change-layout-npw

ab7d61a

dmatveev reviewed Aug 21, 2024

View reviewed changes

dmatveev reviewed Aug 22, 2024

View reviewed changes

ujjayant-kadian and others added 3 commits August 26, 2024 11:47

Merge branch 'master' into uk/adding-new-option-change-layout-npw

7d0b326

Merge branch 'uk/adding-new-option-change-layout-npw' of https://gith…

957b2f3

…ub.com/ujjayant-kadian/openvino into uk/adding-new-option-change-layout-npw

Changed Transpose->Reshape`

f56e601

ujjayant-kadian changed the title ~~NPUW: Adding new config option to transpose weights~~ NPUW: Adding new config option to reshape weights Aug 26, 2024

ujjayant-kadian added 2 commits August 26, 2024 13:45

Merge branch 'uk/adding-new-option-change-layout-npw' of https://gith…

a09a867

…ub.com/ujjayant-kadian/openvino into uk/adding-new-option-change-layout-npw

Correct clang format error

462d27c

Kadian added 2 commits August 26, 2024 15:01

Corrected build error

8f5de02

Bringing back NPUW_TRANSPOSE_WEIGHTS

ed3682b

dmatveev reviewed Aug 30, 2024

View reviewed changes

Kadian added 8 commits September 2, 2024 17:33

Added transpose operations

e1abf9d

Removed minor error

73de9d9

Final working solution

95cbba9

Fixed clang format error

7aeea0a

Correcting the condition for checking matmul node

0d1b68f

Merge branch 'uk/adding-new-option-change-layout-npw' of github.com:u…

fee3370

…jjayant-kadian/openvino into uk/adding-new-option-change-layout-npw

Changing the condition of transpose

236d4f9

Testing the GPTQ model

1af1103

dmatveev added this to the 2024.5 milestone Sep 3, 2024

Kadian added 3 commits September 5, 2024 10:54

Changing the shape of root node (reshape)

02758b5

Rectified transpose errors

51094f8

Updated apply_remap function

3781f84

github-actions bot added the Stale label Sep 21, 2024

github-actions bot closed this Sep 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NPUW: Adding new config option to reshape weights #25691

NPUW: Adding new config option to reshape weights #25691

ujjayant-kadian commented Jul 23, 2024 •

edited

Loading

dmatveev left a comment

dmatveev left a comment

dmatveev Aug 22, 2024

ujjayant-kadian Aug 26, 2024

dmatveev Aug 22, 2024

dmatveev Aug 22, 2024

ujjayant-kadian Aug 26, 2024

dmatveev Aug 22, 2024 •

edited

Loading

ujjayant-kadian Aug 26, 2024

dmatveev Aug 27, 2024

dmatveev Aug 22, 2024

ujjayant-kadian Aug 26, 2024

ujjayant-kadian commented Aug 26, 2024

dmatveev Aug 30, 2024

dmatveev Aug 30, 2024

dmatveev Aug 30, 2024

dmatveev Aug 30, 2024

ujjayant-kadian Sep 3, 2024

dmatveev Aug 30, 2024

dmatveev Aug 30, 2024

dmatveev Aug 30, 2024

dmatveev Aug 30, 2024

dmatveev Aug 30, 2024

ujjayant-kadian Sep 2, 2024

dmatveev Aug 30, 2024

ujjayant-kadian Sep 2, 2024

dmatveev Aug 30, 2024

ujjayant-kadian Sep 2, 2024

github-actions bot commented Sep 21, 2024

github-actions bot commented Sep 28, 2024

		* Tranpose input weight tensors before the decompression procedure.
		* Works only with function "NPUW_FOLD"ing.

	* Tranpose input weight and the corresponding scale and zero tensors (if any) before passing as inputs.
	* Transpose input weight and the corresponding scale and zero tensors (if any) before passing as inputs, if required

	rewr.add_matcher<ov::npuw::patterns::SymmNoZP::DCOFFPassMatMul>(dcoff_mode, dcoff_type, std::ref(params_to), enable_transpose)
	rewr.add_matcher<ov::npuw::patterns::SymmNoZP::DCOFFPassMatMul>(dcoff_mode, dcoff_type, enable_transpose, std::ref(params_to))

	bool enable_transpose = cfg.get<::intel_npu::NPUW_TRANSPOSE_WEIGHTS>();
	const bool enable_transpose = cfg.get<::intel_npu::NPUW_TRANSPOSE_WEIGHTS>();

NPUW: Adding new config option to reshape weights #25691

NPUW: Adding new config option to reshape weights #25691

Conversation

ujjayant-kadian commented Jul 23, 2024 • edited Loading

Details:

Tickets:

dmatveev left a comment

Choose a reason for hiding this comment

dmatveev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmatveev Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ujjayant-kadian commented Aug 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Sep 21, 2024

github-actions bot commented Sep 28, 2024

ujjayant-kadian commented Jul 23, 2024 •

edited

Loading

dmatveev Aug 22, 2024 •

edited

Loading