[GPU] Weightless caching #25731

tkrupa-intel · 2024-07-25T14:53:08Z

No description provided.

…g data from Constant

…formations

p-durandin · 2024-09-04T14:48:42Z

build_jenkins

…h is empty

p-durandin · 2024-09-12T11:32:05Z

build_jenkins

vladimir-paramuzov · 2024-09-12T12:02:30Z

src/plugins/intel_gpu/src/plugin/program_builder.cpp

@@ -304,6 +304,16 @@ void ProgramBuilder::add_primitive(const ov::Node& op, std::shared_ptr<cldnn::pr
    prim->origin_op_name = op.get_friendly_name();
    prim->origin_op_type_name = op.get_type_name();

+    if (auto data_prim = dynamic_cast<cldnn::data*>(prim.get())) {
+        auto rt_info = op.get_rt_info();
+        auto offset = rt_info.find("bin_offset");


Do you somehow ensure that the original constant values are not changed?
Given WA for strided slice in program::save impl it seems that you don't, but maybe I'm missing something.

I ensure it partially in two ways:
1. I save bin_offset in Node's rt_info before the transformation pipeline is launched. This way if a Node is replaced by a new one during the trasformation pipeline, this information is not retained and its values are cached in the traditional way.
2. During saving to cache I check if the byte size of the data has changed since the model was loaded - this mostly targets constants affected by the precision transformations.

I understand that this is a limited approach and there is a potential for more situations akin to the one with strided_slice in topologies I have not tested. I asked about it in the email thread you were just added to and received a response that it looks satisfactory. If you think it's not - do you have any suggestions on how to achieve it? I couldn't think of a 100% foolproof solution that does not involve direct comparison of the data (which would defeat the purpose).

src/plugins/intel_gpu/include/intel_gpu/primitives/data.hpp

src/plugins/intel_gpu/src/plugin/plugin.cpp

src/plugins/intel_gpu/src/plugin/compiled_model.cpp

src/plugins/intel_gpu/src/graph/program.cpp

vladimir-paramuzov · 2024-09-12T12:16:33Z

src/plugins/intel_gpu/src/graph/program.cpp

+
+    // Constants used as inputs of strided_slice nodes cannot be loaded from the original weights file
+    // because strided_slice undergoes transformation(s) altering their values.
+    // Setting their bin_offset fields to SIZE_MAX excludes them from weightless caching mechanism.


As mentioned above, we need more robust way to track if Constant op (and data primitive later) still has original values.

p-durandin · 2024-09-17T05:44:00Z

build_jenkins

src/plugins/intel_gpu/include/intel_gpu/primitives/data.hpp

vladimir-paramuzov · 2024-09-19T10:10:07Z

src/plugins/intel_gpu/src/runtime/execution_config.cpp

@@ -79,7 +79,8 @@ void ExecutionConfig::set_default() {
        std::make_tuple(ov::intel_gpu::allow_new_shape_infer, false),
        std::make_tuple(ov::intel_gpu::use_only_static_kernels_for_dynamic_shape, false),
        std::make_tuple(ov::intel_gpu::buffers_preallocation_ratio, 1.1f),


[random spot] please add some tests

What should they test specifically? There are no tests for traditional caching which I could use for reference. I wanted to compare constants' data directly in an original & imported models but I couldn't find a way to access that data using the standard API.

You can find many test cases that end with '_cached' as below:

openvino/src/plugins/intel_gpu/tests/unit/test_cases/gemm_gpu_test.cpp

Lines 3331 to 3333 in 4758030

TEST_P(GemmGPUTest, basic_cached) {

ASSERT_NO_FATAL_FAILURE(test(true));

}

These tests are for the model caching feature.

…== OPTIMIZE_SIZE

e-ddykim · 2024-09-20T12:42:55Z

Did you check accuracy? When I tested this PR with the resnet-18 static model, the outputs are different between non-caching and weightless-caching runs.
Additionally, I temporarily commented out the below two lines for weightless cache blob loading:

openvino/src/plugins/intel_gpu/src/plugin/plugin.cpp

Lines 311 to 312 in 7cf0564

    
           if (config.get_property(ov::cache_mode) == ov::CacheMode::OPTIMIZE_SIZE) 
        
               return nullptr;

tkrupa-intel · 2024-09-20T12:49:39Z

Did you check accuracy? When I tested this PR with the resnet-18 static model, the outputs are different between non-caching and weightless-caching runs. Additionally, I temporarily commented out the below two lines for weightless cache blob loading:

openvino/src/plugins/intel_gpu/src/plugin/plugin.cpp

Lines 311 to 312 in 7cf0564

if (config.get_property(ov::cache_mode) == ov::CacheMode::OPTIMIZE_SIZE)

return nullptr;

Hi, thanks for letting me know about issues with this topology! I checked accuracy only for Stable Diffusion v1.5 and Llama-3-8b. I'm aware that there may be mismatches in other topologies (see discussion here: #25731 (comment)).

I'm aware that this check prevents correct import, I'll push the fix soon.

tkrupa-intel requested review from a team as code owners July 25, 2024 14:53

github-actions bot added category: GPU OpenVINO GPU plugin category: IR FE OpenVINO IR v10 / v11 FrontEnd labels Jul 25, 2024

sys-openvino-ci added the ExternalIntelPR External contributor from Intel label Jul 25, 2024

tkrupa-intel force-pushed the private/tkrupa/weightless_caching branch from 3716066 to 48ba923 Compare August 14, 2024 08:40

tkrupa-intel requested review from a team as code owners August 14, 2024 08:40

tkrupa-intel force-pushed the private/tkrupa/weightless_caching branch from 48ba923 to ffcc3f5 Compare August 14, 2024 08:56

tkrupa-intel force-pushed the private/tkrupa/weightless_caching branch from ffcc3f5 to a1441b7 Compare August 28, 2024 14:51

[WIP] Add bin offset to cldnn::data and propagate it when constructin…

9167c82

…g data from Constant

tkrupa-intel added 10 commits September 2, 2024 11:19

Simplify and save both versions to aux files

f440407

Comment out dumping to aux files

c1f2867

Prevent weightless caching when constant size is changed during trans…

fd49b1e

…formations

Propagate weights path

35bd53d

Fix wstring default values

61c1496

Propagate weights_path as std::string unconditionally

ac3a068

Fix lvalue error

1b1abbc

Fix propagating weights path data to all serialized constants

81a9ff3

Add weights path to expected output of rt_info_serialization test

247fe60

Adjust for weights path in serialization deterministicity tests

02aa4a1

tkrupa-intel force-pushed the private/tkrupa/weightless_caching branch from 87f7219 to 02aa4a1 Compare September 4, 2024 09:02

tkrupa-intel requested a review from a team as a code owner September 4, 2024 09:02

github-actions bot added the category: Core OpenVINO Core (aka ngraph) label Sep 4, 2024

tkrupa-intel added 2 commits September 9, 2024 15:50

Exclude constants used as strided_slice inputs from weightless caching

b6e58df

Disable weightless caching instead of throwing error when weights_pat…

49d64cc

…h is empty

vladimir-paramuzov reviewed Sep 13, 2024

View reviewed changes

tkrupa-intel added 3 commits September 13, 2024 08:31

Remove unused variable

2c8c07f

Move saving and loading of weights_path from program.cpp to graph.cpp

cc489a8

Enable weightless caching iff ov::CacheMode::OPTIMIZE_SIZE is set

691ad30

tkrupa-intel changed the title ~~[DRAFT][DO NOT MERGE] Weightless caching PoC~~ [GPU] Weightless caching Sep 17, 2024

vladimir-paramuzov reviewed Sep 19, 2024

View reviewed changes

vladimir-paramuzov added the pr: needs tests PR needs tests updating label Sep 19, 2024

Move load/store of weights_path to fix import_model() when CacheMode …

5c3b493

…== OPTIMIZE_SIZE

tkrupa-intel added 2 commits September 20, 2024 15:13

Construct mmap object only once

20eabb9

Remove unused code

f512ff6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] Weightless caching #25731

[GPU] Weightless caching #25731

tkrupa-intel commented Jul 25, 2024 •

edited

Loading

p-durandin commented Sep 4, 2024

p-durandin commented Sep 12, 2024

vladimir-paramuzov Sep 12, 2024

tkrupa-intel Sep 13, 2024 •

edited

Loading

vladimir-paramuzov Sep 12, 2024

p-durandin commented Sep 17, 2024

vladimir-paramuzov Sep 19, 2024

tkrupa-intel Sep 20, 2024

e-ddykim Sep 21, 2024

e-ddykim commented Sep 20, 2024

tkrupa-intel commented Sep 20, 2024

	TEST_P(GemmGPUTest, basic_cached) {
	ASSERT_NO_FATAL_FAILURE(test(true));
	}

[GPU] Weightless caching #25731

Are you sure you want to change the base?

[GPU] Weightless caching #25731

Conversation

tkrupa-intel commented Jul 25, 2024 • edited Loading

p-durandin commented Sep 4, 2024

p-durandin commented Sep 12, 2024

vladimir-paramuzov Sep 12, 2024

Choose a reason for hiding this comment

tkrupa-intel Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

vladimir-paramuzov Sep 12, 2024

Choose a reason for hiding this comment

p-durandin commented Sep 17, 2024

vladimir-paramuzov Sep 19, 2024

Choose a reason for hiding this comment

tkrupa-intel Sep 20, 2024

Choose a reason for hiding this comment

e-ddykim Sep 21, 2024

Choose a reason for hiding this comment

e-ddykim commented Sep 20, 2024

tkrupa-intel commented Sep 20, 2024

tkrupa-intel commented Jul 25, 2024 •

edited

Loading

tkrupa-intel Sep 13, 2024 •

edited

Loading