-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce TaskExecutor overhead #13861
Reduce TaskExecutor overhead #13861
Conversation
The `TaskGroup` class is redundant, the futures list can be a local variable shared by the tasks (this also removes any need for making it read-only). The tasks can be made a little more lightweight too by using a var handle instead of an atomic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left one comment about the var handle, the rest looks good to me.
static { | ||
try { | ||
STARTED_OR_CANCELLED = | ||
MethodHandles.lookup().findVarHandle(Task.class, "startedOrCancelled", boolean.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the one bit that makes me wonder if it's necessary and what the right trade-off is. We had an atomic boolean before. Perhaps it deserves a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I created this PR was to basically:
- shave some overhead
- benchmark
- shave some more
- benchmark more
- ...
- Like benchmark results and open PR
I'm not sure this one in isolation will have a lot of effect and it's hard to justify in isolation. What it does is save some bytes of heap and more importantly CPU cache. If you do enough of these changes in aggregate you get a couple % improvement in the benchmark. But each individual one obviously is trivial :)
That said, the JDK itself uses this pattern in e.g. the FutureTask
and to me the tradeoff seems worth it. It simply makes it more predictable what the code will compile to and at least we some guarantee that the byte for the boolean will be located right near the other mutable state fields in the object which is a good thing when they are so closely related logically and saves some cache thrashing :)
For the basic building blocks of our concurrency logic like this one I'd always err on the side of performance. Effectively it's 5 extra lines of code traded for the guarantee that the flag is flattened into the task instance and won't be moved away from it by the GC even if the task waits in the queue for 5s ever :) Seems like a good deal to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to put this in a separate PR though, but maybe then it'd be worthwhile to make this kind of adjustment in multiple places at once? (I haven't checked though, maybe there aren't that many actually)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your flow makes sense from a benchmarking perspective but there's also the code style perspective, what makes sense in a single PR, what parts of the code it touches. I find that this particular change could be highlighted and get more attention. That also helps because the rest of changes in this PR are low hanging fruit mechanical changes that are easily reviewed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing backed this one out for now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please remember to backport to 10x
Thanks Luca! And sure thing will backport :)! |
The `TaskGroup` class is redundant, the futures list can be a local variable shared by the tasks (this also removes any need for making it read-only).
The
TaskGroup
class is redundant, the futures list can be a local variable shared by the tasks (this also removes any need for making it read-only). Also, the tasks can be made a little more lightweight too by using a var handle instead of an atomic and being explicit about the fact that they need a reference to the list of other tasks.Similar to #13860 a small but not insignificant improvement here + reducing the noise in this code now sets up cleaner benchmarking for potential solutions to the small task overhead problems we ran into here and there: