-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move heavy computation to a thread pool with a priority queue #6247
Conversation
These components can take non-trivial amounts of CPU time: * GraphQL parsing * GraphQL validation * Query planning * Schema introspection In order to avoid blocking threads that execute asynchronous code, they are now run (in their respective Rust implementations) in a new pool of as many threads as CPU cores are available. Previously we used Tokio’s [`spawn_blocking`] for this purpose, but it is appears to be intended for blocking I/O and uses up to 512 threads so it isn’t a great fit for computation tasks. [`spawn_blocking`]: https://docs.rs/tokio/latest/tokio/task/fn.spawn_blocking.html This PR supersedes and closes #6122 The ageing priority algorithm is based on @garypen’s work in https://github.com/apollographql/ageing
✅ Docs Preview ReadyNo new or changed pages found. |
CI performance tests
|
apollo-router/src/compute_job.rs
Outdated
/// We generate backpressure in tower `poll_ready` when reaching this many queued items | ||
// TODO: what’s a good number? should it be configurable? | ||
const QUEUE_SOFT_CAPACITY: usize = 100; | ||
|
||
// TODO: should this be configurable? | ||
fn thread_pool_size() -> NonZeroUsize { | ||
std::thread::available_parallelism().expect("available_parallelism() failed") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some open questions here
My opinion is that if we make something configurable just because we don’t know what a good value would be, most users won’t know either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Better to try and think of a good default and only make it configurable (if ever) later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be available - 1
so we keep 1 core free to handle traffic? Or is it fine to rely on the OS scheduler to still let traffic go through the router?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking with the initial PR was to rely on the OS scheduler, but minus one might be ok too. The downside is that for example minus one out of 2 available cores has a much bigger impact that minus one out of 32 cores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my proposal:
size: max(1, available - (ceiling(available/8)))
WORKINGS:
AVAILABLE: 1 POOL SIZE: 1
AVAILABLE: 2 POOL SIZE: 1
AVAILABLE: 3 POOL SIZE: 2
AVAILABLE: 4 POOL SIZE: 3
AVAILABLE: 5 POOL SIZE: 4
...
AVAILABLE: 8 POOL SIZE: 7
AVAILABLE: 9 POOL SIZE: 7
...
AVAILABLE: 16 POOL SIZE: 14
AVAILABLE: 17 POOL SIZE: 14
...
AVAILABLE: 32 POOL SIZE: 28
Tweaks on the basic approach are welcome, but it seems to offer reasonable scaling for query planning. We can always refine it later.
Because each task in the pool (by default just one) takes queries to plan from a queue in sequence, it is a parallelism bottleneck even in "new" mode
meter_provider() | ||
.meter("apollo/router") | ||
.u64_observable_gauge("apollo.router.compute_jobs.queued") | ||
.with_description( | ||
"Number of computation jobs (parsing, planning, …) waiting to be scheduled", | ||
) | ||
.with_callback(move |m| m.observe(queue().queued_count() as u64, &[])) | ||
.init() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new gauge metric. apollo.router.query_planning.queued
will only exist when the legacy planner is used. It is somewhat replaced by the new metric but not exactly since the new queue also contains parsing+validation jobs and introspection jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new metric should be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve added a short description in docs/source/reference/router/telemetry/instrumentation/standard-instruments.mdx
. Are there other places to add it to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's the correct location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great in general. Maybe we can find some time over the next couple of days to discuss how we figure out good default values for queue/pool sizes.
apollo-router/src/compute_job.rs
Outdated
/// We generate backpressure in tower `poll_ready` when reaching this many queued items | ||
// TODO: what’s a good number? should it be configurable? | ||
const QUEUE_SOFT_CAPACITY: usize = 100; | ||
|
||
// TODO: should this be configurable? | ||
fn thread_pool_size() -> NonZeroUsize { | ||
std::thread::available_parallelism().expect("available_parallelism() failed") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Better to try and think of a good default and only make it configurable (if ever) later.
apollo-router/src/compute_job.rs
Outdated
|
||
/// We generate backpressure in tower `poll_ready` when reaching this many queued items | ||
// TODO: what’s a good number? should it be configurable? | ||
const QUEUE_SOFT_CAPACITY: usize = 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it should be configurable initially. I think this queue represents the tradeoff between memory and losing time here vs being re-scheduled onto a different router, if one is available. If we are rejected from the queue here, then we know at least we have to spend the time/work to move the job elsewhere.
That's hard to quantify, but it's likely in the order of milliseconds. Perhaps we can workshop up a rough calculation based on this thinking?
meter_provider() | ||
.meter("apollo/router") | ||
.u64_observable_gauge("apollo.router.compute_jobs.queued") | ||
.with_description( | ||
"Number of computation jobs (parsing, planning, …) waiting to be scheduled", | ||
) | ||
.with_callback(move |m| m.observe(queue().queued_count() as u64, &[])) | ||
.init() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new metric should be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remaining decision about QUEUE_SOFT_CAPACITY is problematic. It might be best understood as a function of the thread_pool_size()
for now.
Proposal: 20 * thread_pool_size()
as a starting point.
I'm approving this as is, but I'd like to know what you think of my sizing suggestions.
apollo-router/src/compute_job.rs
Outdated
/// We generate backpressure in tower `poll_ready` when reaching this many queued items | ||
// TODO: what’s a good number? should it be configurable? | ||
const QUEUE_SOFT_CAPACITY: usize = 100; | ||
|
||
// TODO: should this be configurable? | ||
fn thread_pool_size() -> NonZeroUsize { | ||
std::thread::available_parallelism().expect("available_parallelism() failed") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my proposal:
size: max(1, available - (ceiling(available/8)))
WORKINGS:
AVAILABLE: 1 POOL SIZE: 1
AVAILABLE: 2 POOL SIZE: 1
AVAILABLE: 3 POOL SIZE: 2
AVAILABLE: 4 POOL SIZE: 3
AVAILABLE: 5 POOL SIZE: 4
...
AVAILABLE: 8 POOL SIZE: 7
AVAILABLE: 9 POOL SIZE: 7
...
AVAILABLE: 16 POOL SIZE: 14
AVAILABLE: 17 POOL SIZE: 14
...
AVAILABLE: 32 POOL SIZE: 28
Tweaks on the basic approach are welcome, but it seems to offer reasonable scaling for query planning. We can always refine it later.
meter_provider() | ||
.meter("apollo/router") | ||
.u64_observable_gauge("apollo.router.compute_jobs.queued") | ||
.with_description( | ||
"Number of computation jobs (parsing, planning, …) waiting to be scheduled", | ||
) | ||
.with_callback(move |m| m.observe(queue().queued_count() as u64, &[])) | ||
.init() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's the correct location.
apollo-router/src/compute_job.rs
Outdated
|
||
#[tokio::test] | ||
async fn test_parallelism() { | ||
if thread_pool_size().get() < 2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
I’ve updated the changelog and enabled auto-merge |
These components can take non-trivial amounts of CPU time:
In order to avoid blocking threads that execute asynchronous code, they are now run (in their respective Rust implementations) in a new pool of as many threads as CPU cores are available. Previously we used Tokio’s
spawn_blocking
for this purpose, but it is appears to be intended for blocking I/O and uses up to 512 threads so it isn’t a great fit for computation tasks.Additionally,
QueryPlannerPool
is bypassed for the new planner since the new thread pool and queue fulfill the same role. This makes the new planner parallel by default (whereas the old pool defaults to size 1, making planning effectively sequential).The first commit supersedes and closes #6122. The second (non-merge) commit supersedes and closes #6142.
The ageing priority algorithm is based on @garypen’s work in https://github.com/apollographql/ageing
Fixes ROUTER-827
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩