Reintroduce query plan reuse during warmup #5255

Geal · 2024-05-27T13:58:15Z

TL;DR: This reintroduces the schema aware query hashing algorithm with a configuration option to enable it for query plan reuse in warmup, and some performance and reliability fixes around the cache key generation.

Problem

The router implements a query hashing algorithm that takes the schema into account, so that if the parts of the schema relevant to the query do not change across a schema update (example: a field was added to an unrelated type), then the hash will not change.
This is useful in 2 ways:

when updating the schema, query plans must be regenerated for existing queries. We can use the supergraph.query_planning.warmed_up_queries option to pregenerate query plans before switching traffic to the new schema. But that can result in a lot of queries being planned, a lot of CPU usage, and schema updates taking a long time. This query hashing algorithm can be used to detect that a schema update would not affect a query, so we can reuse its query plan, and avoid regenrating most of the query plans. This is the goal of the experimental_reuse_query_plans option.
in entity caching, we hash the query as part of the cache key, but that is not enough, we need to check that relevant parts of the schema do not change. As an example: if the type of a field changes, we don't want to keep serving data with the wrong type. But we also do not want to rebuild the entire cache from scratch every time the schema updates. By using this query hashing algorithm, we can reuse most of the entity cache across schema updates

Proposed solution

An earlier version of this algorithm shipped in a router release with some significant issues that broke query plan caching, so it was promptly deactivated. This PR reintroduces the algorithm, with the experimental_reuse_query_plans set by default to do_not_reuse, making its use in the query planner cache optional. It will generate the query hash in all cases, and use it with the schema hash (a hash of the schema SDL as bytes), but when experimental_reuse_query_plans is set to "reuse", only the query hash will be employed. When the option is set to measure, it measures the number of query plans that could have been reused by comparing old and new query plans, in the same way we do with JS and Rust planners.
It also brings significant improvements to the query planner cache handling, using prefixes to make the Redis key self describing, reducing the amount of data serialization when generating the cache key, and making sure the rust planner's configuration is used in the cache key.

TODO:

~~configuration option to reactivate the schema hash~~ the hash is now always generated, but the cache key will by default use that query hash and the schema hash. If the experimental_reuse_query_plans option is enabled, then it will not use the schema hash
metric to measure the number of query plans that could have been reused during warmup
measure how many of the query plans that could have been reused are actually the same
~~check with the federation version for the in memory cache~~ we don't need to check it because a running router instance will only have one federation version
remove the custom Hash implementation for the caching key
use prefixes for each part of the redis cache key so they become self describing
remove JSON serialization
hash the Rust planner's config once a new version has been released (it implements Hash now)
configuration migration for the experimental_reuse_query_plans option from a boolean to an enum

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

Exceptions

Note any exceptions here

Notes

It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩

router-perf · 2024-05-27T13:59:45Z

this makes sure there will be no possible collision by extension (example: hashing `ab` then `cd` VS hashing `a` then `bcd`)

svc-apollo-docs · 2024-10-18T10:16:32Z

✅ Docs Preview Ready

No new or changed pages found.

Geal · 2024-10-28T10:14:29Z

apollo-router/src/query_planner/caching_query_planner.rs

-            .update(serde_json::to_vec(&self.config_mode).expect("serialization should not fail"));
-        hasher.update(&*self.schema_id);
+        let mut hasher = StructHasher::new();
+        self.metadata.hash(&mut hasher);


should hash key names, not only values

Fix #5160 This splits part of the work from #5255 to make it easier to merge. This covers improvements and fixes to the query planner cache key from changes related to the query hashing algorithm and query plan reuse during warmup. Fixed: * use prefixes for each part of the redis cache key so they become self describing * remove the custom Hash implementation for the cache key * remove JSON serialization * hash the Rust planner's config only once, not on every cache query Co-authored-by: Ivan Goncharov <ivan.goncharov.ua@gmail.com> Co-authored-by: Jeremy Lempereur <jeremy.lempereur@iomentum.com> Co-authored-by: Gary Pennington <gary@apollographql.com> Co-authored-by: Jesse Rosenberger <git@jro.cc> Co-authored-by: Renée <renee.kooi@apollographql.com>

Geal · 2024-11-05T17:43:06Z

now that #6206 is merged into dev, 59d1e2c merges back those changes here.

Fix apollographql#5160 This splits part of the work from apollographql#5255 to make it easier to merge. This covers improvements and fixes to the query planner cache key from changes related to the query hashing algorithm and query plan reuse during warmup. Fixed: * use prefixes for each part of the redis cache key so they become self describing * remove the custom Hash implementation for the cache key * remove JSON serialization * hash the Rust planner's config only once, not on every cache query Co-authored-by: Ivan Goncharov <ivan.goncharov.ua@gmail.com> Co-authored-by: Jeremy Lempereur <jeremy.lempereur@iomentum.com> Co-authored-by: Gary Pennington <gary@apollographql.com> Co-authored-by: Jesse Rosenberger <git@jro.cc> Co-authored-by: Renée <renee.kooi@apollographql.com>

IvanGoncharov · 2024-11-26T20:09:48Z

Converting to draft due to internal discussion regarding the risks introduced by this PR.
We want to add "Query hash stable to incremental schema changes" as the Router's feature.
However, in its current form, it is implemented outside of Query Planner and without any mechanism to determine what parts of the schema are used by Query Planner.

For example, adding new directives that influence Query planning without updating this code can result in a hash collision where a schema update changes the query plan, but the hash remains the same.

add test cases

f61d243

This comment has been minimized.

Sign in to view

apollo-bot2 assigned Geal May 27, 2024

Geal added 26 commits May 27, 2024 17:25

add a separator between each part of the hash

12658ac

this makes sure there will be no possible collision by extension (example: hashing `ab` then `cd` VS hashing `a` then `bcd`)

deactivate the query string hashing for now

b768d00

fix some tests

78ca936

reactivate all tests

2511864

fixes for query variables

d783eee

fix some field hashing tests

459681d

add separators

842dce9

hash some directives

9bb31df

hash the schema

d632e8a

hash directive definitions

9da105e

cleanup

6e6d517

hash interface implementers

85d93fe

update hashes

7968c22

Merge branch 'dev' into geal/fix-hashing-algorithm

47dc6fe

lint

4d7ceea

add a test for directives applied to interface definitions

9c34952

cleanup

fcb3f3f

add a metric tracking how many query plans could be reused

0b17980

simplify the caching key

bd7e5eb

update hashes

22359ed

Merge branch 'dev' into geal/fix-hashing-algorithm

dfd8819

fix integration tests

4d47696

Merge branch 'dev' into geal/fix-hashing-algorithm

d1e2f34

lint

adb9d11

Merge branch 'dev' into geal/fix-hashing-algorithm

2263dcb

Merge branch 'dev' into geal/fix-hashing-algorithm

98ec603

Geal added 12 commits August 30, 2024 12:26

Merge branch 'dev' into geal/fix-hashing-algorithm

efc3274

lint

6bf14a7

hash implemented interfaces for objects

2a37728

update keys in tests

603b27b

update snapshots

a5c0c7c

Merge branch 'dev' into geal/fix-hashing-algorithm

5fdaa4b

fix

0151f0f

Merge branch 'dev' into geal/fix-hashing-algorithm

028a088

add a mode to measure query plan reuse

1fcb704

fix

c57ec66

aranch 'dev' into geal/fix-hashing-algorithm

cde0142

Merge branch 'dev' into geal/fix-hashing-algorithm

bd325e6

unneeded change

fc348d7

abernix requested a review from IvanGoncharov October 21, 2024 09:43

Geal and others added 3 commits October 21, 2024 11:51

snapshots

e22917d

snapshots

3d053cb

Merge branch 'dev' into geal/fix-hashing-algorithm

86b70b0

Geal commented Oct 28, 2024

View reviewed changes

Geal added 2 commits October 29, 2024 09:52

add separators for the query plan cache key

72bc405

Merge branch 'dev' into geal/fix-hashing-algorithm

f58c7e8

This was referenced Oct 29, 2024

Fix the query hashing algorithm #6205

Merged

Query planner cache key improvements #6206

Merged

Merge branch 'dev' into geal/fix-hashing-algorithm

59d1e2c

Geal changed the title ~~Fix cache key hashing algorithm~~ Reintroduce query plan reuse during warmup Nov 5, 2024

IvanGoncharov marked this pull request as draft November 26, 2024 19:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reintroduce query plan reuse during warmup #5255

Reintroduce query plan reuse during warmup #5255

Geal commented May 27, 2024 •

edited

Loading

This comment has been minimized.

router-perf bot commented May 27, 2024

svc-apollo-docs commented Oct 18, 2024 •

edited

Loading

Geal Oct 28, 2024

Geal commented Nov 5, 2024

IvanGoncharov commented Nov 26, 2024

Reintroduce query plan reuse during warmup #5255

Are you sure you want to change the base?

Reintroduce query plan reuse during warmup #5255

Conversation

Geal commented May 27, 2024 • edited Loading

Problem

Proposed solution

Footnotes

This comment has been minimized.

router-perf bot commented May 27, 2024

svc-apollo-docs commented Oct 18, 2024 • edited Loading

✅ Docs Preview Ready

Geal Oct 28, 2024

Choose a reason for hiding this comment

Geal commented Nov 5, 2024

IvanGoncharov commented Nov 26, 2024

Geal commented May 27, 2024 •

edited

Loading

svc-apollo-docs commented Oct 18, 2024 •

edited

Loading