Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] Weightless caching #25731

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

tkrupa-intel
Copy link
Contributor

@tkrupa-intel tkrupa-intel commented Jul 25, 2024

No description provided.

@tkrupa-intel tkrupa-intel requested review from a team as code owners July 25, 2024 14:53
@github-actions github-actions bot added category: GPU OpenVINO GPU plugin category: IR FE OpenVINO IR v10 / v11 FrontEnd labels Jul 25, 2024
@sys-openvino-ci sys-openvino-ci added the ExternalIntelPR External contributor from Intel label Jul 25, 2024
@tkrupa-intel tkrupa-intel force-pushed the private/tkrupa/weightless_caching branch from 3716066 to 48ba923 Compare August 14, 2024 08:40
@tkrupa-intel tkrupa-intel requested review from a team as code owners August 14, 2024 08:40
@github-actions github-actions bot added category: IE Tests OpenVINO Test: plugins and common category: CPU OpenVINO CPU plugin category: Python API OpenVINO Python bindings category: ONNX FE OpenVINO ONNX FrontEnd category: dependency_changes Pull requests that update a dependency file category: NPU OpenVINO NPU plugin labels Aug 14, 2024
@tkrupa-intel tkrupa-intel force-pushed the private/tkrupa/weightless_caching branch from 48ba923 to ffcc3f5 Compare August 14, 2024 08:56
@github-actions github-actions bot removed category: IE Tests OpenVINO Test: plugins and common category: CPU OpenVINO CPU plugin category: Python API OpenVINO Python bindings category: ONNX FE OpenVINO ONNX FrontEnd category: dependency_changes Pull requests that update a dependency file category: NPU OpenVINO NPU plugin labels Aug 14, 2024
@tkrupa-intel tkrupa-intel force-pushed the private/tkrupa/weightless_caching branch from ffcc3f5 to a1441b7 Compare August 28, 2024 14:51
@tkrupa-intel tkrupa-intel force-pushed the private/tkrupa/weightless_caching branch from 87f7219 to 02aa4a1 Compare September 4, 2024 09:02
@tkrupa-intel tkrupa-intel requested a review from a team as a code owner September 4, 2024 09:02
@github-actions github-actions bot added the category: Core OpenVINO Core (aka ngraph) label Sep 4, 2024
@p-durandin
Copy link
Contributor

build_jenkins

@p-durandin
Copy link
Contributor

build_jenkins

@@ -304,6 +304,16 @@ void ProgramBuilder::add_primitive(const ov::Node& op, std::shared_ptr<cldnn::pr
prim->origin_op_name = op.get_friendly_name();
prim->origin_op_type_name = op.get_type_name();

if (auto data_prim = dynamic_cast<cldnn::data*>(prim.get())) {
auto rt_info = op.get_rt_info();
auto offset = rt_info.find("bin_offset");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you somehow ensure that the original constant values are not changed?
Given WA for strided slice in program::save impl it seems that you don't, but maybe I'm missing something.

Copy link
Contributor Author

@tkrupa-intel tkrupa-intel Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ensure it partially in two ways:
1. I save bin_offset in Node's rt_info before the transformation pipeline is launched. This way if a Node is replaced by a new one during the trasformation pipeline, this information is not retained and its values are cached in the traditional way.
2. During saving to cache I check if the byte size of the data has changed since the model was loaded - this mostly targets constants affected by the precision transformations.

I understand that this is a limited approach and there is a potential for more situations akin to the one with strided_slice in topologies I have not tested. I asked about it in the email thread you were just added to and received a response that it looks satisfactory. If you think it's not - do you have any suggestions on how to achieve it? I couldn't think of a 100% foolproof solution that does not involve direct comparison of the data (which would defeat the purpose).

src/plugins/intel_gpu/src/plugin/plugin.cpp Show resolved Hide resolved
src/plugins/intel_gpu/src/plugin/compiled_model.cpp Outdated Show resolved Hide resolved
src/plugins/intel_gpu/src/graph/program.cpp Outdated Show resolved Hide resolved

// Constants used as inputs of strided_slice nodes cannot be loaded from the original weights file
// because strided_slice undergoes transformation(s) altering their values.
// Setting their bin_offset fields to SIZE_MAX excludes them from weightless caching mechanism.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, we need more robust way to track if Constant op (and data primitive later) still has original values.

@p-durandin
Copy link
Contributor

build_jenkins

@tkrupa-intel tkrupa-intel changed the title [DRAFT][DO NOT MERGE] Weightless caching PoC [GPU] Weightless caching Sep 17, 2024
@@ -79,7 +79,8 @@ void ExecutionConfig::set_default() {
std::make_tuple(ov::intel_gpu::allow_new_shape_infer, false),
std::make_tuple(ov::intel_gpu::use_only_static_kernels_for_dynamic_shape, false),
std::make_tuple(ov::intel_gpu::buffers_preallocation_ratio, 1.1f),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[random spot] please add some tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should they test specifically? There are no tests for traditional caching which I could use for reference. I wanted to compare constants' data directly in an original & imported models but I couldn't find a way to access that data using the standard API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can find many test cases that end with '_cached' as below:

TEST_P(GemmGPUTest, basic_cached) {
ASSERT_NO_FATAL_FAILURE(test(true));
}

These tests are for the model caching feature.

@vladimir-paramuzov vladimir-paramuzov added the pr: needs tests PR needs tests updating label Sep 19, 2024
@e-ddykim
Copy link
Contributor

Did you check accuracy? When I tested this PR with the resnet-18 static model, the outputs are different between non-caching and weightless-caching runs.
Additionally, I temporarily commented out the below two lines for weightless cache blob loading:

if (config.get_property(ov::cache_mode) == ov::CacheMode::OPTIMIZE_SIZE)
return nullptr;

@tkrupa-intel
Copy link
Contributor Author

Did you check accuracy? When I tested this PR with the resnet-18 static model, the outputs are different between non-caching and weightless-caching runs. Additionally, I temporarily commented out the below two lines for weightless cache blob loading:

if (config.get_property(ov::cache_mode) == ov::CacheMode::OPTIMIZE_SIZE)
return nullptr;

Hi, thanks for letting me know about issues with this topology! I checked accuracy only for Stable Diffusion v1.5 and Llama-3-8b. I'm aware that there may be mismatches in other topologies (see discussion here: #25731 (comment)).

I'm aware that this check prevents correct import, I'll push the fix soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Core OpenVINO Core (aka ngraph) category: GPU OpenVINO GPU plugin category: IR FE OpenVINO IR v10 / v11 FrontEnd ExternalIntelPR External contributor from Intel pr: needs tests PR needs tests updating
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants