Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to pause reconciliation of Provider/Configuration/Function instances #4820

Merged

Conversation

pedjak
Copy link
Contributor

@pedjak pedjak commented Oct 16, 2023

Description of your changes

The reconciliation can be paused by adding crossplane.io/paused: "true"
to the requested object. Removing the annotation resumes its reconciliation.

Fixes #4619

I have:

  • Read and followed Crossplane's contribution process.
  • Added or updated unit and E2E tests for my change.
  • Run make reviewable to ensure this PR is ready for review.
  • Added backport release-x.y labels to auto-backport this PR, if necessary.
  • Opened a PR updating the docs, if necessary.

Copy link
Member

@turkenh turkenh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pedjak!

Do you already have plans for documenting this? I think we need something similar to this but for packages. Please also note, there is an ongoing work with package documentation and may be we can include there (or at least build on top of that)

internal/controller/pkg/manager/reconciler.go Show resolved Hide resolved
@@ -468,6 +469,20 @@ func (r *Reconciler) Reconcile(ctx context.Context, req reconcile.Request) (reco
return reconcile.Result{Requeue: false}, nil
}

// Check the pause annotation and return if it has the value "true"
// after logging, publishing an event and updating the SYNC status condition
if meta.IsPaused(pr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move this up before meta.wasDeleted check so that the pause will be effective for package revisins being deleted. This also aligns with how the feature was implemented in the managed reconciler and claim reconciler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning that we cannot remove paused objects?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, deleted objects would still need reconciliation to finalize their deletion.
One can remove a paused object after removing the paused annotation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that really what users expect to happen? In the docs we do not say anything about being impossible to remove paused objects.

Imagine the following situation: a user attempts to remove a namespace containing a paused claim. This operation would hangs forever, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that really what users expect to happen?

As said, deletion is also part of reconciliation, the controller needs some operation / clean-up to finalize deletion. I would be surprised that part of reconciliation was not "paused". And that could be more desirable if I want to debug before the resource is deleted.

Another point is, one can go and find why their resource not deleted if that is the behavior they are looking for and remove the annotation to proceed. But if we do the opposite, there wouldn't be a way to revert for a user expecting the deletion to hang on with paused annotation, as the resource would already be gone.

Imagine the following situation: a user attempts to remove a namespace containing a paused claim. This operation would hangs forever, right?

They can see why it hangs and remove the paused annotation to proceed.
I don't think this is a feature that people would end up having paused resources around. It is more like a deliberate action where people would revert after they get what they are after.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @turkenh, pausing should pause everything, it would be surprising otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to be on @pedjak's side here. Broken deletion seems to be unexpected and strange.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tried the following:

  • created namespace example
  • created claim in it
  • paused claim
  • tried to remove namespace with kubectl delete ns example
  • kubectl hangs with the message namespace "example" deleted but it does not return to the prompt

How a user can know that there is a claim in the namespace holding the delete?

Even if I do kubectl describe on that claim, there is no trace saying that deletion is blocked and can be resumed when pause annotation is removed.

Users do not need to know anything about how things are working under the hood, how many controllers are processing objects, in what order, etc. However, when users want to remove an object, we should either protect against that with a proper error message, or allow him to do that. If an object reconciliation is paused, we can still allow removal, cleaning up what is needed, etc.

Copy link
Member

@turkenh turkenh Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me link the discussion on slack: https://crossplane.slack.com/archives/CEF5N8X08/p1697618937115119?thread_ts=1697555426.091559&cid=CEF5N8X08

Even if I do kubectl describe on that claim, there is no trace saying that deletion is blocked and can be resumed when pause annotation is removed.

This is something different. There should be. Shouldn't this output some events/logs:

// Check the pause annotation and return if it has the value "true"
// after logging, publishing an event and updating the SYNC status condition
if meta.IsPaused(cm) {
r.record.Event(cm, event.Normal(reasonPaused, "Reconciliation is paused via the pause annotation"))
cm.SetConditions(xpv1.ReconcilePaused())
// If the pause annotation is removed, we will have a chance to reconcile again and resume
// and if status update fails, we will reconcile again to retry to update the status
return reconcile.Result{}, errors.Wrap(r.client.Status().Update(ctx, cm), errUpdateClaimStatus)
}

As I stated in the thread, it feels like you are trying to optimize user experience for a very specific scenario while ignoring others who would expect the reconciliation to be paused, including the deletion.

FWIW, it is also implemented in the same way in cluster API, which could be considered as an indication that this is what most people expect from such a functionality, see: https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/b90810b70864664a00314f40a3f8a31d4215f1b5/controllers/awscluster_controller.go#L[…]4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something different. There should be. Shouldn't this output some events/logs:

The events are gone, because when you try to remove a namespace, k8s starts removing all of them in the given namespace, including events. Given that there is no problems to remove them (no finalizers, no blockers), the events are all gone.

it feels like you are trying to optimize user experience for a very specific scenario while ignoring others who would expect the reconciliation to be paused, including the deletion.

Who are the others, i.e. what is the usecase we want to support?

We indicate that reconciliation is paused via condition of type Synced like this:

reason: ReconcilePaused
status: "False"
type: Synced

when False it means that we are not syncing claim changes with the counterpart composite, i.e. we are not passing claim updates down the line. However, object deletion is not an update, at least not in CRUD/REST context, and therefore an user should be very surprised to discover that deletion of paused objects or higher order objects get stuck.

In order to explore/learn more about the use case mentioned by @turkenh I have updated the PR to match the behavior already available for claims/composites. Furthermore, I have added to the pause status condition a human readable message explain the consequences, and in the case of deletion we generate an additional k8s event and a log entry on info level.

@pedjak
Copy link
Contributor Author

pedjak commented Oct 17, 2023

Do you already have plans for documenting this? I think we need something similar to this but for packages. Please also note, crossplane/docs#566 is an ongoing work with package documentation and may be we can include there (or at least build on top of that)

We should certainly document this after it gets merged. /cc @plumbis

@pedjak pedjak force-pushed the paused-annotations-on-package-types branch 2 times, most recently from 06fd5c3 to 393d3fa Compare October 18, 2023 14:41
@pedjak pedjak self-assigned this Oct 18, 2023
apis/apiextensions/v1/conditions.go Outdated Show resolved Hide resolved
apis/apiextensions/v1/conditions.go Outdated Show resolved Hide resolved
Comment on lines 377 to 381
cm.SetConditions(v1.ReconcilePaused())
if meta.WasDeleted(cm) {
log.Info("Deletion requested, but blocked by the pause annotation, remove it to unblock the request", "annotation", meta.AnnotationKeyReconciliationPaused, "value", "true")
r.record.Event(cm, event.Normal(reasonPaused, "Deletion requested, but blocked by the pause annotation, remove it to unblock the request"))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cm.SetConditions(v1.ReconcilePaused())
if meta.WasDeleted(cm) {
log.Info("Deletion requested, but blocked by the pause annotation, remove it to unblock the request", "annotation", meta.AnnotationKeyReconciliationPaused, "value", "true")
r.record.Event(cm, event.Normal(reasonPaused, "Deletion requested, but blocked by the pause annotation, remove it to unblock the request"))
}
r.record.Event(cm, event.Normal(reasonPaused, "Reconciliation (including deletion) is paused via the pause annotation"))
cm.SetConditions(v1.ReconcilePaused())

It should be enough to update the message as above, no need for special handling for meta.wasDeleted.

After this message, it is clear that:

  • Reconciliation is paused.
  • This includes deletion.
  • It is via pause annotation, so, you can remove it to resume. It is the same for both resuming deleted and non-deleted resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR with the suggested changes. I assumed I should apply it to all places in this PR. Please let me know if my understanding was correct.

It should be enough to update the message as above, no need for special handling for meta.wasDeleted.

It should be enough, but this degrades the previously attempted user experience improvements. Namely:

  • the logs are generated on DEBUG level, making Reconciliation (including deletion) is paused via the pause annotation" message invisible by default. The previous attempt wanted to bring them to INFO level, so that users can quickly spot them in the logs
  • the current message is rather vague, refers to pause annotation. IMHO, we should be as much as explicit to users, so that they do not need to checks docs to understand what pause annotation is. Writing the exact annotation in the message is super helpful when one person pauses reconciliation, but another one performs some cluster maintenance/cleanup. In large orgs that is a common case. That person does not even need to know what Crossplane is, but the right message would unblock them quickly.
  • Having an extra message that deletion is attempted on paused object would further help by debugging.

Copy link
Member

@turkenh turkenh Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

  • the logs are generated on DEBUG level, making Reconciliation (including deletion) is paused via the pause annotation" message invisible by default. The previous attempt wanted to bring them to INFO level, so that users can quickly spot them in the logs

I agree that INFO would be better here 👍

  • the current message is rather vague, refers to pause annotation. IMHO, we should be as much as explicit to users, so that they do not need to checks docs to understand what pause annotation is. Writing the exact annotation in the message is super helpful when one person pauses reconciliation, but another one performs some cluster maintenance/cleanup. In large orgs that is a common case. That person does not even need to know what Crossplane is, but the right message would unblock them quickly.

Users can just check the annotations on the resource and figure out which is the pause annotation without necessarily checking the documentation. However, being more explicit wouldn't hurt.

  • Having an extra message that deletion is attempted on paused object would further help by debugging.

Just like the above, I would agree with this in the context of being more explicit. However, I don't think this is worth introducing an additional branch (i.e. if) to the flow.

internal/controller/pkg/manager/reconciler.go Outdated Show resolved Hide resolved
internal/controller/pkg/revision/reconciler.go Outdated Show resolved Hide resolved
apis/apiextensions/v1/conditions.go Outdated Show resolved Hide resolved
Comment on lines 377 to 381
cm.SetConditions(v1.ReconcilePaused())
if meta.WasDeleted(cm) {
log.Info("Deletion requested, but blocked by the pause annotation, remove it to unblock the request", "annotation", meta.AnnotationKeyReconciliationPaused, "value", "true")
r.record.Event(cm, event.Normal(reasonPaused, "Deletion requested, but blocked by the pause annotation, remove it to unblock the request"))
}
Copy link
Member

@turkenh turkenh Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

  • the logs are generated on DEBUG level, making Reconciliation (including deletion) is paused via the pause annotation" message invisible by default. The previous attempt wanted to bring them to INFO level, so that users can quickly spot them in the logs

I agree that INFO would be better here 👍

  • the current message is rather vague, refers to pause annotation. IMHO, we should be as much as explicit to users, so that they do not need to checks docs to understand what pause annotation is. Writing the exact annotation in the message is super helpful when one person pauses reconciliation, but another one performs some cluster maintenance/cleanup. In large orgs that is a common case. That person does not even need to know what Crossplane is, but the right message would unblock them quickly.

Users can just check the annotations on the resource and figure out which is the pause annotation without necessarily checking the documentation. However, being more explicit wouldn't hurt.

  • Having an extra message that deletion is attempted on paused object would further help by debugging.

Just like the above, I would agree with this in the context of being more explicit. However, I don't think this is worth introducing an additional branch (i.e. if) to the flow.

…n instances

The reconciliation can be paused by adding `crossplane.io/paused: "true"`
to the requested object. Removing the annotation resumes its reconciliation.

As with claims and composites, when paused, object removal are not possible.
We do not explicitly return errors to callers, instead the delete requests
are stuck until the pause annotation is removed.

In order to improve UX, we add to the pause status condition a human readable
message explain the consequences, and in case of deletion we generate
an additional k8s event and a log entry on info level.

Signed-off-by: Predrag Knezevic <predrag.knezevic@upbound.io>
@pedjak pedjak force-pushed the paused-annotations-on-package-types branch from 6238129 to 96c78b9 Compare October 19, 2023 13:43
@turkenh turkenh merged commit c2e206f into crossplane:master Oct 19, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

consider supporting "paused" annotation for core types
4 participants