-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: More optimal IterateHierarchyV2 and iterateChildrenV2 [#600] #601
feat: More optimal IterateHierarchyV2 and iterateChildrenV2 [#600] #601
Conversation
b0f2923
to
6279c76
Compare
Testing with ArgoCD argoproj/argo-cd#18972 |
…j#600] Closes argoproj#600 The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues. Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
6279c76
to
d777c9a
Compare
Here are some perf views of the system collected following argoproj/argo-cd#13534 (comment). The build is from master on 2024/07/07 including argoproj/argo-cd#18972, argoproj/argo-cd#18694, #601, #603. |
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
…j#600] Closes argoproj#600 The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues. Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
d777c9a
to
335ff88
Compare
…hy-and-iterate-children' into iterate-improvements Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
335ff88
to
905c87e
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #601 +/- ##
==========================================
+ Coverage 55.91% 58.38% +2.46%
==========================================
Files 42 42
Lines 4900 5008 +108
==========================================
+ Hits 2740 2924 +184
+ Misses 1937 1840 -97
- Partials 223 244 +21 ☔ View full report in Codecov by Sentry. |
…j#600] Closes argoproj#600 The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues. Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
905c87e
to
af08910
Compare
Could you take a look at this PR? andrii-korotkov-verkada#1 I'll rebase, the force-push messed up the diff. |
…hy-and-iterate-children' into iterate-improvements Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Pushed. |
improvements to graph building
@crenshaw-dev, looks good! Thanks for improving the code. I've merged the changes. |
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
@andrii-korotkov-verkada thanks! I may have a few more as I continue digging through the code. Bear with me. I plan to stick with it this week until we get it merged, as long as you have time to keep working on it! |
More fun: andrii-korotkov-verkada#2 |
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
discard unneeded copies of child resources as we go
Sounds good! Thanks for the help. I'd be pretty active on this. |
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
pkg/cache/cluster.go
Outdated
func buildGraph(nsNodes map[kube.ResourceKey]*Resource) (map[kube.ResourceKey][]kube.ResourceKey, map[kube.ResourceKey]map[types.UID]*Resource) { | ||
// Prepare to construct a graph | ||
nodesByUID := make(map[types.UID][]*Resource, len(nsNodes)) | ||
nodeByGraphKey := make(map[graphKey]*Resource, len(nsNodes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need nodeByGraphKey
, or could we just use nsNodes
, since all the resources should be in the same namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using nsNodes passes gitops-engine unit tests and saves a lot of memory, but I'm skeptical. I can put up a PoC tomorrow to analyze.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nodeByGraphKey
is for efficient node lookup during uid backfill. I don't know if we avoid it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's slightly different comparing to the kube resource key, e.g. uses api version instead of group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the original reasoning for this distinction, but left it for backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah of all the memory optimizations I looked at, this one would worry me the most. But its 25MB -> 17MB, which gets it pretty close to the memory footprint of IterateHierarchy v1.
I'll put up the PoC to look at and run Argo CD unit tests, but let's assume we're sticking with nodeByGraphKey unless we're super satisfied with the idea of switching to nsNodes.
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Probably the last graph building optimization: andrii-korotkov-verkada#3 |
make childrenByUID sparse
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
@andrii-korotkov-verkada up for considering this one? andrii-korotkov-verkada#4 I think we do risk missing parent relationships due to Resources lacking the correct namespace field. But so far unit tests, Argo CD e2e tests, and running in an Intuit system looks okay. I think it would be worth the risk for cutting the memory use in half and saving ~30% execution time. |
In general it'd make sense to me since all other places use group. I'll merge it. |
use nsNodes instead of dupe map
Test results from Intuit: Before, IterateHierarchy was taking about 10% of the application controller's time while refreshing apps. Now it takes around 1%. So roughly 6s before, now less than 1s out of a 60s profile. Heap use has gone up by about 2x, but it wasn't memory-heavy before, and it hasn't really increased in a problematic way. Steady-state CPU and memory is basically the same as before. I used log metrics to measure 95th percentile reconciliation, getting the resource tree, and setting managed resources. Reconciliation time is about the same, maaaaybe 25% faster. Getting the resource tree is about the same amount of time. Setting the managed resources now takes ~one fifth the time it did before, down to 200ms from 1000ms. I think the vast majority of that improvement doesn't actually come from the graph pre-build optimization, but instead comes from avoiding iterating over all the managed resources (an optimization which doesn't actually depend on pre-building the graph). In summary: my test showed no performance regressions, no functionality regressions, and maybe a slight performance improvement related to the new algorithm. I suspect the relatively small improvement is because we have relatively few resources per-namespace. Bigger namespaces will have a bigger CPU win and (probably) a higher memory cost. |
Thanks for all the testing and support! Yeah, I guess we just have quite large default namespace (which isn't best practice, but here we are), hence I saw larger wins. Longest running processor in the largest cluster went down from 30-60min to 1-2min. |
…#600] (argoproj#601) * chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600] Closes argoproj#600 The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues. Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> * improvements to graph building Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * use old name Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600] Closes argoproj#600 The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues. Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> * finish merge Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600] Closes argoproj#600 The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues. Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> * discard unneeded copies of child resources as we go Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * remove unnecessary comment Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * make childrenByUID sparse Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * eliminate duplicate map Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * fix comment Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * add useful comment back Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * use nsNodes instead of dupe map Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * remove unused struct Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * skip invalid APIVersion Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> --------- Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
…#600] (argoproj#601) * chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600] Closes argoproj#600 The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues. Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> * improvements to graph building Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * use old name Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600] Closes argoproj#600 The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues. Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> * finish merge Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600] Closes argoproj#600 The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues. Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> * discard unneeded copies of child resources as we go Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * remove unnecessary comment Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * make childrenByUID sparse Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * eliminate duplicate map Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * fix comment Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * add useful comment back Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * use nsNodes instead of dupe map Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * remove unused struct Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * skip invalid APIVersion Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> --------- Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
…#600] (argoproj#601) * chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600] Closes argoproj#600 The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues. Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> * improvements to graph building Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * use old name Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600] Closes argoproj#600 The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues. Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> * finish merge Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600] Closes argoproj#600 The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues. Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> * discard unneeded copies of child resources as we go Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * remove unnecessary comment Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * make childrenByUID sparse Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * eliminate duplicate map Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * fix comment Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * add useful comment back Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * use nsNodes instead of dupe map Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * remove unused struct Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * skip invalid APIVersion Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> --------- Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
…#600] (argoproj#601) * chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600] Closes argoproj#600 The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues. Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> * improvements to graph building Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * use old name Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600] Closes argoproj#600 The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues. Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> * finish merge Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * chore: More optimal IterateHierarchyV2 and iterateChildrenV2 [argoproj#600] Closes argoproj#600 The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking `isParentOf`, which can give `O(tree_size * namespace_resources_count)` time complexity. The v2 algorithms construct the graph and have `O(namespace_resources_count)` time complexity. See more details in the linked issues. Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> * discard unneeded copies of child resources as we go Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * remove unnecessary comment Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * make childrenByUID sparse Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * eliminate duplicate map Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * fix comment Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * add useful comment back Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * use nsNodes instead of dupe map Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * remove unused struct Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * skip invalid APIVersion Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> --------- Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com> Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Closes #600
The existing (effectively v1) implementations are suboptimal since they don't construct a graph before the iteration. They search for children by looking at all namespace resources and checking
isParentOf
, which can giveO(tree_size * namespace_resources_count)
time complexity. The v2 algorithms construct the graph and haveO(namespace_resources_count)
time complexity. See more details in the linked issues.