-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cleanup and streamline status computation #1032
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ffromani The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold need to wait for all the HCP work to go in anyway |
4cb9131
to
75c5334
Compare
75c5334
to
c425e08
Compare
c425e08
to
1ed3294
Compare
1ed3294
to
153f5c7
Compare
153f5c7
to
e223f7c
Compare
e223f7c
to
e25b479
Compare
/cc @shajmakh |
/hold cancel |
@shajmakh hey! let's discuss the cleanups in the last 2 commits to see if they can help you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
reviews are related to the last 2 commits
controllers/controllers.go
Outdated
return conditionInfo{ | ||
Type: status.ConditionDegraded, | ||
Message: messageFromError(err), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add reason here?
Reason: reasonFromError(err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably should, yes. It should be helpful and harmless.
if ok { | ||
instance.Status.Conditions = conditions | ||
} | ||
} | ||
|
||
func (r *NUMAResourcesOperatorReconciler) degradeStatus(ctx context.Context, instance *nropv1.NUMAResourcesOperator, reason string, stErr error) (ctrl.Result, error) { | ||
message := messageFromError(stErr) | ||
info := degradedConditionInfoFromError(stErr) | ||
info.Reason = reason |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if reason
is empty it will override info.Reason
(keep it as "InternalError"?) if we set it above Reason: reasonFromError(err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, good catch. I think I fixed in the later commits by rearranging the flow.
@@ -207,25 +208,27 @@ func (r *NUMAResourcesOperatorReconciler) degradeStatus(ctx context.Context, ins | |||
return ctrl.Result{}, nil | |||
} | |||
|
|||
func (r *NUMAResourcesOperatorReconciler) reconcileResourceAPI(ctx context.Context, instance *nropv1.NUMAResourcesOperator, trees []nodegroupv1.Tree) (bool, ctrl.Result, string, error) { | |||
func (r *NUMAResourcesOperatorReconciler) reconcileResourceAPI(ctx context.Context, instance *nropv1.NUMAResourcesOperator, trees []nodegroupv1.Tree) (bool, ctrl.Result, conditionInfo, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason to keep the returned error, it is already part of the returned conditionInfo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same below in the other reconciliation substeps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point. What it should go is not the top level error though, which is used to report to the upper layers, but the inner error in ConditionInfo
. Let's see what I can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s4cratch that, fixed in the following commit
Inline status update in the happy path, if the reconciliation loop completed all the expected steps. This is a intermediate step towards the final cleanup, and should cause no changes in behavior. Signed-off-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
Instead of relying in the chain of helpers to report if something changed in status, thus warrants a update, run a full diff of our status explictely. If something worthy (whose definition depends on the helper implemented in `pkg/status`) changed, then we will push a status update. This is probably slower than relying on the reports from subfunctions, but simplifies and streamlines the code significantly. Signed-off-by: Francesco Romani <fromani@redhat.com>
The only reason why we update the status conditions outside reconcileResources, while we update everything else related to status inside, is historical. We are now enabled to close this gap and streamline the code further. Signed-off-by: Francesco Romani <fromani@redhat.com>
we never use the return value, good riddance. Signed-off-by: Francesco Romani <fromani@redhat.com>
instead of passing condition types, possibly message, maybe error, then derive the full condition data in many place, factor all the data in a condition info struct, to be used as basis for creating the real metav1.Condition. This clean up things and unlocks further cleanups. Signed-off-by: Francesco Romani <fromani@redhat.com>
/hold |
edbe1bc
to
e2ecbe6
Compare
the reconciliation steps are returning a common (and growing) set of values, let's pack them in a struct, since we always want to return the same tuple anyway for consistency. Signed-off-by: Francesco Romani <fromani@redhat.com>
e2ecbe6
to
b4f8b44
Compare
know controller test failure. It's legit. I'll have a look and fix ASAP |
@ffromani: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
The way we computed the
NUMAResourcesOperator
status was messy, relying on inner functions and helpers to reported somehow if something changed in the object, abusing the name of the conditions (leading to awkward function signatures), doing and undoing checks and so forth.Besides messy and unecessarily hard to read code, the outcome was that we both sometimes missed to update the status, leading to not major bugs (yet) but to less than ideal experience.
To untangle this mess, the new approach is to just mutate the status freely during the reconciliation loop in the reconciliation sub-step. Every step is free to mutate the status just reporting error or not, and the top-level reconciliation code will detect semantic differences (e.g. if only timestamps changed but say nothing else, this is not a semantically relevant difference so no actual update should be sent).
Detecting changes this way needs nested comparations of objects but it's a very major simplifications; if we implement carefully the comparison code (coming soon), benchmarks show good numbers (= still some theoretical slowdowns, but the overall state is not terrible and better than expected) wrt the current implementation, so win-win.