-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reintroduce query plan reuse during warmup #5255
base: dev
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
CI performance tests
|
this makes sure there will be no possible collision by extension (example: hashing `ab` then `cd` VS hashing `a` then `bcd`)
✅ Docs Preview ReadyNo new or changed pages found. |
.update(serde_json::to_vec(&self.config_mode).expect("serialization should not fail")); | ||
hasher.update(&*self.schema_id); | ||
let mut hasher = StructHasher::new(); | ||
self.metadata.hash(&mut hasher); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should hash key names, not only values
Fix #5160 This splits part of the work from #5255 to make it easier to merge. This covers improvements and fixes to the query planner cache key from changes related to the query hashing algorithm and query plan reuse during warmup. Fixed: * use prefixes for each part of the redis cache key so they become self describing * remove the custom Hash implementation for the cache key * remove JSON serialization * hash the Rust planner's config only once, not on every cache query Co-authored-by: Ivan Goncharov <ivan.goncharov.ua@gmail.com> Co-authored-by: Jeremy Lempereur <jeremy.lempereur@iomentum.com> Co-authored-by: Gary Pennington <gary@apollographql.com> Co-authored-by: Jesse Rosenberger <git@jro.cc> Co-authored-by: Renée <renee.kooi@apollographql.com>
Fix apollographql#5160 This splits part of the work from apollographql#5255 to make it easier to merge. This covers improvements and fixes to the query planner cache key from changes related to the query hashing algorithm and query plan reuse during warmup. Fixed: * use prefixes for each part of the redis cache key so they become self describing * remove the custom Hash implementation for the cache key * remove JSON serialization * hash the Rust planner's config only once, not on every cache query Co-authored-by: Ivan Goncharov <ivan.goncharov.ua@gmail.com> Co-authored-by: Jeremy Lempereur <jeremy.lempereur@iomentum.com> Co-authored-by: Gary Pennington <gary@apollographql.com> Co-authored-by: Jesse Rosenberger <git@jro.cc> Co-authored-by: Renée <renee.kooi@apollographql.com>
Converting to draft due to internal discussion regarding the risks introduced by this PR. For example, adding new directives that influence Query planning without updating this code can result in a hash collision where a schema update changes the query plan, but the hash remains the same. |
Fix #5160
TL;DR: This reintroduces the schema aware query hashing algorithm with a configuration option to enable it for query plan reuse in warmup, and some performance and reliability fixes around the cache key generation.
Problem
The router implements a query hashing algorithm that takes the schema into account, so that if the parts of the schema relevant to the query do not change across a schema update (example: a field was added to an unrelated type), then the hash will not change.
This is useful in 2 ways:
supergraph.query_planning.warmed_up_queries
option to pregenerate query plans before switching traffic to the new schema. But that can result in a lot of queries being planned, a lot of CPU usage, and schema updates taking a long time. This query hashing algorithm can be used to detect that a schema update would not affect a query, so we can reuse its query plan, and avoid regenrating most of the query plans. This is the goal of theexperimental_reuse_query_plans
option.Proposed solution
An earlier version of this algorithm shipped in a router release with some significant issues that broke query plan caching, so it was promptly deactivated. This PR reintroduces the algorithm, with the
experimental_reuse_query_plans
set by default todo_not_reuse
, making its use in the query planner cache optional. It will generate the query hash in all cases, and use it with the schema hash (a hash of the schema SDL as bytes), but whenexperimental_reuse_query_plans
is set to "reuse", only the query hash will be employed. When the option is set tomeasure
, it measures the number of query plans that could have been reused by comparing old and new query plans, in the same way we do with JS and Rust planners.It also brings significant improvements to the query planner cache handling, using prefixes to make the Redis key self describing, reducing the amount of data serialization when generating the cache key, and making sure the rust planner's configuration is used in the cache key.
TODO:
configuration option to reactivate the schema hashthe hash is now always generated, but the cache key will by default use that query hash and the schema hash. If theexperimental_reuse_query_plans
option is enabled, then it will not use the schema hashcheck with the federation version for the in memory cachewe don't need to check it because a running router instance will only have one federation versionexperimental_reuse_query_plans
option from a boolean to an enumChecklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩