-
Notifications
You must be signed in to change notification settings - Fork 952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to pause reconciliation of Provider/Configuration/Function instances #4820
Add ability to pause reconciliation of Provider/Configuration/Function instances #4820
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -468,6 +469,20 @@ func (r *Reconciler) Reconcile(ctx context.Context, req reconcile.Request) (reco | |||
return reconcile.Result{Requeue: false}, nil | |||
} | |||
|
|||
// Check the pause annotation and return if it has the value "true" | |||
// after logging, publishing an event and updating the SYNC status condition | |||
if meta.IsPaused(pr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move this up before meta.wasDeleted
check so that the pause will be effective for package revisins being deleted. This also aligns with how the feature was implemented in the managed reconciler and claim reconciler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning that we cannot remove paused objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, deleted objects would still need reconciliation to finalize their deletion.
One can remove a paused object after removing the paused annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that really what users expect to happen? In the docs we do not say anything about being impossible to remove paused objects.
Imagine the following situation: a user attempts to remove a namespace containing a paused claim. This operation would hangs forever, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that really what users expect to happen?
As said, deletion is also part of reconciliation, the controller needs some operation / clean-up to finalize deletion. I would be surprised that part of reconciliation was not "paused". And that could be more desirable if I want to debug before the resource is deleted.
Another point is, one can go and find why their resource not deleted if that is the behavior they are looking for and remove the annotation to proceed. But if we do the opposite, there wouldn't be a way to revert for a user expecting the deletion to hang on with paused annotation, as the resource would already be gone.
Imagine the following situation: a user attempts to remove a namespace containing a paused claim. This operation would hangs forever, right?
They can see why it hangs and remove the paused annotation to proceed.
I don't think this is a feature that people would end up having paused resources around. It is more like a deliberate action where people would revert after they get what they are after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @turkenh, pausing should pause everything, it would be surprising otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to be on @pedjak's side here. Broken deletion seems to be unexpected and strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tried the following:
- created namespace
example
- created claim in it
- paused claim
- tried to remove namespace with
kubectl delete ns example
kubectl
hangs with the messagenamespace "example" deleted
but it does not return to the prompt
How a user can know that there is a claim in the namespace holding the delete?
Even if I do kubectl describe
on that claim, there is no trace saying that deletion is blocked and can be resumed when pause annotation is removed.
Users do not need to know anything about how things are working under the hood, how many controllers are processing objects, in what order, etc. However, when users want to remove an object, we should either protect against that with a proper error message, or allow him to do that. If an object reconciliation is paused, we can still allow removal, cleaning up what is needed, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me link the discussion on slack: https://crossplane.slack.com/archives/CEF5N8X08/p1697618937115119?thread_ts=1697555426.091559&cid=CEF5N8X08
Even if I do
kubectl describe
on that claim, there is no trace saying that deletion is blocked and can be resumed when pause annotation is removed.
This is something different. There should be. Shouldn't this output some events/logs:
crossplane/internal/controller/apiextensions/claim/reconciler.go
Lines 371 to 379 in af2496e
// Check the pause annotation and return if it has the value "true" | |
// after logging, publishing an event and updating the SYNC status condition | |
if meta.IsPaused(cm) { | |
r.record.Event(cm, event.Normal(reasonPaused, "Reconciliation is paused via the pause annotation")) | |
cm.SetConditions(xpv1.ReconcilePaused()) | |
// If the pause annotation is removed, we will have a chance to reconcile again and resume | |
// and if status update fails, we will reconcile again to retry to update the status | |
return reconcile.Result{}, errors.Wrap(r.client.Status().Update(ctx, cm), errUpdateClaimStatus) | |
} |
As I stated in the thread, it feels like you are trying to optimize user experience for a very specific scenario while ignoring others who would expect the reconciliation to be paused, including the deletion.
FWIW, it is also implemented in the same way in cluster API, which could be considered as an indication that this is what most people expect from such a functionality, see: https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/b90810b70864664a00314f40a3f8a31d4215f1b5/controllers/awscluster_controller.go#L[…]4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something different. There should be. Shouldn't this output some events/logs:
The events are gone, because when you try to remove a namespace, k8s starts removing all of them in the given namespace, including events. Given that there is no problems to remove them (no finalizers, no blockers), the events are all gone.
it feels like you are trying to optimize user experience for a very specific scenario while ignoring others who would expect the reconciliation to be paused, including the deletion.
Who are the others, i.e. what is the usecase we want to support?
We indicate that reconciliation is paused via condition of type Synced
like this:
reason: ReconcilePaused
status: "False"
type: Synced
when False
it means that we are not syncing claim changes with the counterpart composite, i.e. we are not passing claim updates down the line. However, object deletion is not an update, at least not in CRUD/REST context, and therefore an user should be very surprised to discover that deletion of paused objects or higher order objects get stuck.
In order to explore/learn more about the use case mentioned by @turkenh I have updated the PR to match the behavior already available for claims/composites. Furthermore, I have added to the pause status condition a human readable message explain the consequences, and in the case of deletion we generate an additional k8s event and a log entry on info level.
We should certainly document this after it gets merged. /cc @plumbis |
06fd5c3
to
393d3fa
Compare
cm.SetConditions(v1.ReconcilePaused()) | ||
if meta.WasDeleted(cm) { | ||
log.Info("Deletion requested, but blocked by the pause annotation, remove it to unblock the request", "annotation", meta.AnnotationKeyReconciliationPaused, "value", "true") | ||
r.record.Event(cm, event.Normal(reasonPaused, "Deletion requested, but blocked by the pause annotation, remove it to unblock the request")) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cm.SetConditions(v1.ReconcilePaused()) | |
if meta.WasDeleted(cm) { | |
log.Info("Deletion requested, but blocked by the pause annotation, remove it to unblock the request", "annotation", meta.AnnotationKeyReconciliationPaused, "value", "true") | |
r.record.Event(cm, event.Normal(reasonPaused, "Deletion requested, but blocked by the pause annotation, remove it to unblock the request")) | |
} | |
r.record.Event(cm, event.Normal(reasonPaused, "Reconciliation (including deletion) is paused via the pause annotation")) | |
cm.SetConditions(v1.ReconcilePaused()) |
It should be enough to update the message as above, no need for special handling for meta.wasDeleted
.
After this message, it is clear that:
- Reconciliation is paused.
- This includes deletion.
- It is via pause annotation, so, you can remove it to resume. It is the same for both resuming deleted and non-deleted resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the PR with the suggested changes. I assumed I should apply it to all places in this PR. Please let me know if my understanding was correct.
It should be enough to update the message as above, no need for special handling for meta.wasDeleted.
It should be enough, but this degrades the previously attempted user experience improvements. Namely:
- the logs are generated on DEBUG level, making
Reconciliation (including deletion) is paused via the pause annotation"
message invisible by default. The previous attempt wanted to bring them to INFO level, so that users can quickly spot them in the logs - the current message is rather vague, refers to
pause annotation
. IMHO, we should be as much as explicit to users, so that they do not need to checks docs to understand what pause annotation is. Writing the exact annotation in the message is super helpful when one person pauses reconciliation, but another one performs some cluster maintenance/cleanup. In large orgs that is a common case. That person does not even need to know what Crossplane is, but the right message would unblock them quickly. - Having an extra message that deletion is attempted on paused object would further help by debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
- the logs are generated on DEBUG level, making Reconciliation (including deletion) is paused via the pause annotation" message invisible by default. The previous attempt wanted to bring them to INFO level, so that users can quickly spot them in the logs
I agree that INFO would be better here 👍
- the current message is rather vague, refers to pause annotation. IMHO, we should be as much as explicit to users, so that they do not need to checks docs to understand what pause annotation is. Writing the exact annotation in the message is super helpful when one person pauses reconciliation, but another one performs some cluster maintenance/cleanup. In large orgs that is a common case. That person does not even need to know what Crossplane is, but the right message would unblock them quickly.
Users can just check the annotations on the resource and figure out which is the pause annotation without necessarily checking the documentation. However, being more explicit wouldn't hurt.
- Having an extra message that deletion is attempted on paused object would further help by debugging.
Just like the above, I would agree with this in the context of being more explicit. However, I don't think this is worth introducing an additional branch (i.e. if
) to the flow.
393d3fa
to
6238129
Compare
cm.SetConditions(v1.ReconcilePaused()) | ||
if meta.WasDeleted(cm) { | ||
log.Info("Deletion requested, but blocked by the pause annotation, remove it to unblock the request", "annotation", meta.AnnotationKeyReconciliationPaused, "value", "true") | ||
r.record.Event(cm, event.Normal(reasonPaused, "Deletion requested, but blocked by the pause annotation, remove it to unblock the request")) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
- the logs are generated on DEBUG level, making Reconciliation (including deletion) is paused via the pause annotation" message invisible by default. The previous attempt wanted to bring them to INFO level, so that users can quickly spot them in the logs
I agree that INFO would be better here 👍
- the current message is rather vague, refers to pause annotation. IMHO, we should be as much as explicit to users, so that they do not need to checks docs to understand what pause annotation is. Writing the exact annotation in the message is super helpful when one person pauses reconciliation, but another one performs some cluster maintenance/cleanup. In large orgs that is a common case. That person does not even need to know what Crossplane is, but the right message would unblock them quickly.
Users can just check the annotations on the resource and figure out which is the pause annotation without necessarily checking the documentation. However, being more explicit wouldn't hurt.
- Having an extra message that deletion is attempted on paused object would further help by debugging.
Just like the above, I would agree with this in the context of being more explicit. However, I don't think this is worth introducing an additional branch (i.e. if
) to the flow.
…n instances The reconciliation can be paused by adding `crossplane.io/paused: "true"` to the requested object. Removing the annotation resumes its reconciliation. As with claims and composites, when paused, object removal are not possible. We do not explicitly return errors to callers, instead the delete requests are stuck until the pause annotation is removed. In order to improve UX, we add to the pause status condition a human readable message explain the consequences, and in case of deletion we generate an additional k8s event and a log entry on info level. Signed-off-by: Predrag Knezevic <predrag.knezevic@upbound.io>
6238129
to
96c78b9
Compare
Description of your changes
The reconciliation can be paused by adding
crossplane.io/paused: "true"
to the requested object. Removing the annotation resumes its reconciliation.
Fixes #4619
I have:
make reviewable
to ensure this PR is ready for review.backport release-x.y
labels to auto-backport this PR, if necessary.