-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] Weightless caching #25731
base: master
Are you sure you want to change the base?
[GPU] Weightless caching #25731
Conversation
3716066
to
48ba923
Compare
48ba923
to
ffcc3f5
Compare
ffcc3f5
to
a1441b7
Compare
…g data from Constant
87f7219
to
02aa4a1
Compare
build_jenkins |
build_jenkins |
@@ -304,6 +304,16 @@ void ProgramBuilder::add_primitive(const ov::Node& op, std::shared_ptr<cldnn::pr | |||
prim->origin_op_name = op.get_friendly_name(); | |||
prim->origin_op_type_name = op.get_type_name(); | |||
|
|||
if (auto data_prim = dynamic_cast<cldnn::data*>(prim.get())) { | |||
auto rt_info = op.get_rt_info(); | |||
auto offset = rt_info.find("bin_offset"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you somehow ensure that the original constant values are not changed?
Given WA for strided slice in program::save impl it seems that you don't, but maybe I'm missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ensure it partially in two ways:
1. I save bin_offset in Node's rt_info before the transformation pipeline is launched. This way if a Node is replaced by a new one during the trasformation pipeline, this information is not retained and its values are cached in the traditional way.
2. During saving to cache I check if the byte size of the data has changed since the model was loaded - this mostly targets constants affected by the precision transformations.
I understand that this is a limited approach and there is a potential for more situations akin to the one with strided_slice in topologies I have not tested. I asked about it in the email thread you were just added to and received a response that it looks satisfactory. If you think it's not - do you have any suggestions on how to achieve it? I couldn't think of a 100% foolproof solution that does not involve direct comparison of the data (which would defeat the purpose).
|
||
// Constants used as inputs of strided_slice nodes cannot be loaded from the original weights file | ||
// because strided_slice undergoes transformation(s) altering their values. | ||
// Setting their bin_offset fields to SIZE_MAX excludes them from weightless caching mechanism. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, we need more robust way to track if Constant op (and data primitive later) still has original values.
build_jenkins |
@@ -79,7 +79,8 @@ void ExecutionConfig::set_default() { | |||
std::make_tuple(ov::intel_gpu::allow_new_shape_infer, false), | |||
std::make_tuple(ov::intel_gpu::use_only_static_kernels_for_dynamic_shape, false), | |||
std::make_tuple(ov::intel_gpu::buffers_preallocation_ratio, 1.1f), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[random spot] please add some tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should they test specifically? There are no tests for traditional caching which I could use for reference. I wanted to compare constants' data directly in an original & imported models but I couldn't find a way to access that data using the standard API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can find many test cases that end with '_cached' as below:
openvino/src/plugins/intel_gpu/tests/unit/test_cases/gemm_gpu_test.cpp
Lines 3331 to 3333 in 4758030
TEST_P(GemmGPUTest, basic_cached) { | |
ASSERT_NO_FATAL_FAILURE(test(true)); | |
} |
These tests are for the model caching feature.
Did you check accuracy? When I tested this PR with the resnet-18 static model, the outputs are different between non-caching and weightless-caching runs. openvino/src/plugins/intel_gpu/src/plugin/plugin.cpp Lines 311 to 312 in 7cf0564
|
Hi, thanks for letting me know about issues with this topology! I checked accuracy only for Stable Diffusion v1.5 and Llama-3-8b. I'm aware that there may be mismatches in other topologies (see discussion here: #25731 (comment)). I'm aware that this check prevents correct import, I'll push the fix soon. |
No description provided.