Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move heavy computation to a thread pool with a priority queue #6247
Move heavy computation to a thread pool with a priority queue #6247
Changes from 6 commits
a6ea805
43b353f
d2557cf
7517ca1
32bf712
9bdd6b2
6267a1f
679c1de
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it should be configurable initially. I think this queue represents the tradeoff between memory and losing time here vs being re-scheduled onto a different router, if one is available. If we are rejected from the queue here, then we know at least we have to spend the time/work to move the job elsewhere.
That's hard to quantify, but it's likely in the order of milliseconds. Perhaps we can workshop up a rough calculation based on this thinking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some open questions here
My opinion is that if we make something configurable just because we don’t know what a good value would be, most users won’t know either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Better to try and think of a good default and only make it configurable (if ever) later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be
available - 1
so we keep 1 core free to handle traffic? Or is it fine to rely on the OS scheduler to still let traffic go through the router?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking with the initial PR was to rely on the OS scheduler, but minus one might be ok too. The downside is that for example minus one out of 2 available cores has a much bigger impact that minus one out of 32 cores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my proposal:
size: max(1, available - (ceiling(available/8)))
WORKINGS:
AVAILABLE: 1 POOL SIZE: 1
AVAILABLE: 2 POOL SIZE: 1
AVAILABLE: 3 POOL SIZE: 2
AVAILABLE: 4 POOL SIZE: 3
AVAILABLE: 5 POOL SIZE: 4
...
AVAILABLE: 8 POOL SIZE: 7
AVAILABLE: 9 POOL SIZE: 7
...
AVAILABLE: 16 POOL SIZE: 14
AVAILABLE: 17 POOL SIZE: 14
...
AVAILABLE: 32 POOL SIZE: 28
Tweaks on the basic approach are welcome, but it seems to offer reasonable scaling for query planning. We can always refine it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new gauge metric.
apollo.router.query_planning.queued
will only exist when the legacy planner is used. It is somewhat replaced by the new metric but not exactly since the new queue also contains parsing+validation jobs and introspection jobs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new metric should be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve added a short description in
docs/source/reference/router/telemetry/instrumentation/standard-instruments.mdx
. Are there other places to add it to?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's the correct location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)