Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📖 Proposal: MachineDrainRules #11241

Merged

Conversation

sbueringer
Copy link
Member

Signed-off-by: Stefan Büringer buringerst@vmware.com

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Part of #11240

Signed-off-by: Stefan Büringer buringerst@vmware.com
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 30, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/needs-area PR is missing an area label label Sep 30, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 30, 2024
@sbueringer
Copy link
Member Author

@sbueringer sbueringer added the area/machine Issues or PRs related to machine lifecycle management label Sep 30, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-area PR is missing an area label label Sep 30, 2024
@kreeuwijk
Copy link

Loving this solution for CAPI-11024, great work!

@sftim
Copy link
Contributor

sftim commented Oct 1, 2024

For more on the label vs. annotation thinking, see kubernetes/kubernetes#127247 (comment)

@sftim
Copy link
Contributor

sftim commented Oct 2, 2024

In general in CAPI I think we can only define annotations in our namespace.

Kubernetes is open source; anyone can propose registering an annotation (all it needs is a PR against https://kubernetes.io/docs/reference/labels-annotations-taints/)

@sbueringer
Copy link
Member Author

In general in CAPI I think we can only define annotations in our namespace.

Kubernetes is open source; anyone can propose registering an annotation (all it needs is a PR against https://kubernetes.io/docs/reference/labels-annotations-taints/)

I'm aware it's open source 😀. I assumed that these would typically be defined through KEPs

@kreeuwijk
Copy link

kreeuwijk commented Oct 7, 2024

There is one more use case that we've seen with a customer, I was wondering if this could be facilitated by MachineDrainRules as well. We had a customer that ran gaming sessions (Minecraft) in Kubernetes, so a pod would represent an active game in progress. When the customer wanted to e.g. upgrade Kubernetes, they needed to change the upgrade strategy in the MachineDeployment from RollingUpdate to OnDelete, to prevent CAPI from just draining a node at random, killing the game sessions that were in progress on that node. With OnDelete, they could manually cordon a node, wait for game sessions on it to naturally end and then manually delete the Machine object to deprovision the node. CAPI would then install a fresh one. The customer had to go through all their 100 nodes this way, which was not a fun exercise.

If MachineDrainRules supported something like a "WaitForCompletion" mode of draining (for pods matching certain labels), a customer like this could configure a rule that made CAPI wait until pods matching the "wait" labels are naturally terminated. This would enable a node drain to function as follows:

  • Cordon the node so it doesn't retrieve new workloads
  • Optionally drain certain pods that are allowed to be evicted at this time (based on priority)
  • Wait until pods matching the "wait" labels are naturally terminated
  • Drain remaining pods (based on priority)
  • Drain complete, node can be deprovisioned.

While this could make repaves take a long time when such a wait rule is configured, it is still much preferable over having to manage the drains entirely manually. There could even be a wait timeout setting that sets a maximum amount of time that the system will wait for such pods to terminate naturally, before draining them anyway.

@chrischdi
Copy link
Member

chrischdi commented Oct 7, 2024

There is one more use case that we've seen with a customer, I was wondering if this could be facilitated by MachineDrainRules as well. We had a customer that ran gaming sessions (Minecraft) in Kubernetes, so a pod would represent an active game in progress. When the customer wanted to e.g. upgrade Kubernetes, they needed to change the upgrade strategy in the MachineDeployment from RollingUpdate to OnDelete, to prevent CAPI from just draining a node at random, killing the game sessions that were in progress on that node. With OnDelete, they could manually cordon a node, wait for game sessions on it to naturally end and then manually delete the Machine object to deprovision the node. CAPI would then install a fresh one. The customer had to go through all their 100 nodes this way, which was not a fun exercise.

If MachineDrainRules supported something like a "WaitForCompletion" mode of draining (for pods matching certain labels), a customer like this could configure a rule that made CAPI wait until pods matching the "wait" labels are naturally terminated. This would enable a node drain to function as follows:

  • Cordon the node so it doesn't retrieve new workloads
  • Optionally drain certain pods that are allowed to be evicted at this time (based on priority)
  • Wait until pods matching the "wait" labels are naturally terminated
  • Drain remaining pods (based on priority)
  • Drain complete, node can be deprovisioned.

While this could make repaves take a long time when such a wait rule is configured, it is still much preferable over having to manage the drains entirely manually. There could even be a wait timeout setting that sets a maximum amount of time that the system will wait for such pods to terminate naturally, before draining them anyway.

Doesn't this already work for upgrades?

  • Start the upgrade
  • The feature to cordon nodes which are outdated (CAPI controller should taint outdated nodes with PreferNoSchedule #7043) will add a taint so new pods should get scheduled on new nodes
  • Drain gets done, use PDB's to prevent eviction of "gaming session" pods, other's get evicted already.
  • Drain succeeds when the pods are completed (because they don't need to get evicted anymore)

@kreeuwijk
Copy link

I suppose using PDBs would indeed work to make the drain process wait for the pods to terminate naturally. Thanks for the tip 👍🏻

@sbueringer sbueringer force-pushed the pr-machine-drain-rules-proposal branch from 511ddd2 to aa3ea2b Compare October 8, 2024 17:28
@sbueringer
Copy link
Member Author

All findings should be addressed via aa3ea2b

PTAL :)

@fabriziopandini
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 9, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: c816aab3eef91dbcb5aa0b278bf841bf06e5fa66

@fabriziopandini
Copy link
Member

As per Oct 9th office hours, lazy consensus deadline set for next Friday (Oct 18th)

@enxebre
Copy link
Member

enxebre commented Oct 17, 2024

/lgtm

Signed-off-by: Stefan Büringer buringerst@vmware.com
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 18, 2024
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@sbueringer sbueringer added lgtm "Looks good to me", indicates that a PR is ready to be merged. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. labels Oct 18, 2024
@JoelSpeed
Copy link
Contributor

/lgtm

@fabriziopandini
Copy link
Member

Great to see we reached consensus for this first iteration!
/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 18, 2024
@k8s-ci-robot k8s-ci-robot merged commit 9c06a95 into kubernetes-sigs:main Oct 18, 2024
17 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.9 milestone Oct 18, 2024
@sbueringer sbueringer deleted the pr-machine-drain-rules-proposal branch October 18, 2024 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/machine Issues or PRs related to machine lifecycle management cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants