Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 machine: prevent error spamming for NodeOutdatedTaint if objects are not found #11148

Merged

Conversation

chrischdi
Copy link
Member

What this PR does / why we need it:

This PR reduces spammy output especially during background deletion of objects like a whole cluster.

Example from https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api/11146/pull-cluster-api-e2e-blocking-main/1832066840088547328/artifacts/clusters/bootstrap/logs/capi-system/capi-controller-manager/capi-controller-manager-5f5f6bcf4c-4zpjc/manager.log :

Example: owner reference is gone:

{"ts":1725634615581.657,"caller":"controller/controller.go:316","msg":"Reconciler error","controller":"machine","controllerGroup":"cluster.x-k8s.io","controllerKind":"Machine","Machine":{"name":"quick-start-0qi0bi-md-0-kskml-96tqh-jg4tt","namespace":"quick-start-ev3dh5"},"namespace":"quick-start-ev3dh5","name":"quick-start-0qi0bi-md-0-kskml-96tqh-jg4tt","reconcileID":"5a5d16c2-944d-4bbb-ae38-670497f71998","err":"failed to reconcile Node quick-start-0qi0bi-md-0-kskml-96tqh-jg4tt: failed to check if Node quick-start-0qi0bi-md-0-kskml-96tqh-jg4tt is outdated: failed to find MachineSet owner reference for Machine quick-start-ev3dh5/quick-start-0qi0bi-md-0-kskml-96tqh-jg4tt","errCauses":[{"error":"failed to reconcile Node quick-start-0qi0bi-md-0-kskml-96tqh-jg4tt: failed to check if Node quick-start-0qi0bi-md-0-kskml-96tqh-jg4tt is outdated: failed to find MachineSet owner reference for Machine quick-start-ev3dh5/quick-start-0qi0bi-md-0-kskml-96tqh-jg4tt","errorVerbose":"failed to find MachineSet owner reference for Machine quick-start-ev3dh5/quick-start-0qi0bi-md-0-kskml-96tqh-jg4tt\nsigs.k8s.io/cluster-api/internal/controllers/machine.getOwnerMachineSetObjectKey\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller_noderef.go:363\nsigs.k8s.io/cluster-api/internal/controllers/machine.shouldNodeHaveOutdatedTaint\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller_noderef.go:323\nsigs.k8s.io/cluster-api/internal/controllers/machine.(*Reconciler).patchNode\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller_noderef.go:299\nsigs.k8s.io/cluster-api/internal/controllers/machine.(*Reconciler).reconcileNode\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller_noderef.go:139\nsigs.k8s.io/cluster-api/internal/controllers/machine.(*Reconciler).reconcile\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller.go:316\nsigs.k8s.io/cluster-api/internal/controllers/machine.(*Reconciler).Reconcile\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller.go:240\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:224\nruntime.goexit\n\truntime/asm_amd64.s:1695\nfailed to check if Node quick-start-0qi0bi-md-0-kskml-96tqh-jg4tt is outdated\nsigs.k8s.io/cluster-api/internal/controllers/machine.(*Reconciler).patchNode\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller_noderef.go:301\nsigs.k8s.io/cluster-api/internal/controllers/machine.(*Reconciler).reconcileNode\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller_noderef.go:139\nsigs.k8s.io/cluster-api/internal/controllers/machine.(*Reconciler).reconcile\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller.go:316\nsigs.k8s.io/cluster-api/internal/controllers/machine.(*Reconciler).Reconcile\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller.go:240\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:224\nruntime.goexit\n\truntime/asm_amd64.s:1695\nfailed to reconcile Node quick-start-0qi0bi-md-0-kskml-96tqh-jg4tt\nsigs.k8s.io/cluster-api/internal/controllers/machine.(*Reconciler).reconcileNode\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller_noderef.go:140\nsigs.k8s.io/cluster-api/internal/controllers/machine.(*Reconciler).reconcile\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller.go:316\nsigs.k8s.io/cluster-api/internal/controllers/machine.(*Reconciler).Reconcile\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller.go:240\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:224\nruntime.goexit\n\truntime/asm_amd64.s:1695"}]}

Example MachineDeployment gone:

{"ts":1725634762385.2092,"caller":"controller/controller.go:316","msg":"Reconciler error","controller":"machine","controllerGroup":"cluster.x-k8s.io","controllerKind":"Machine","Machine":{"name":"quick-start-0qi0bi-md-0-kskml-96tqh-jg4tt","namespace":"quick-start-ev3dh5"},"namespace":"quick-start-ev3dh5","name":"quick-start-0qi0bi-md-0-kskml-96tqh-jg4tt","reconcileID":"42367c14-daab-42f7-9a3f-df199c620093","err":"failed to reconcile Node quick-start-0qi0bi-md-0-kskml-96tqh-jg4tt: failed to check if Node quick-start-0qi0bi-md-0-kskml-96tqh-jg4tt is outdated: MachineDeployment.cluster.x-k8s.io \"quick-start-0qi0bi-md-0-kskml\" not found","errCauses":[{"error":"failed to reconcile Node quick-start-0qi0bi-md-0-kskml-96tqh-jg4tt: failed to check if Node quick-start-0qi0bi-md-0-kskml-96tqh-jg4tt is outdated: MachineDeployment.cluster.x-k8s.io \"quick-start-0qi0bi-md-0-kskml\" not found","errorVerbose":"MachineDeployment.cluster.x-k8s.io \"quick-start-0qi0bi-md-0-kskml\" not found\nfailed to check if Node quick-start-0qi0bi-md-0-kskml-96tqh-jg4tt is outdated\nsigs.k8s.io/cluster-api/internal/controllers/machine.(*Reconciler).patchNode\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller_noderef.go:301\nsigs.k8s.io/cluster-api/internal/controllers/machine.(*Reconciler).reconcileNode\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller_noderef.go:139\nsigs.k8s.io/cluster-api/internal/controllers/machine.(*Reconciler).reconcile\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller.go:316\nsigs.k8s.io/cluster-api/internal/controllers/machine.(*Reconciler).Reconcile\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller.go:240\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:224\nruntime.goexit\n\truntime/asm_amd64.s:1695\nfailed to reconcile Node quick-start-0qi0bi-md-0-kskml-96tqh-jg4tt\nsigs.k8s.io/cluster-api/internal/controllers/machine.(*Reconciler).reconcileNode\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller_noderef.go:140\nsigs.k8s.io/cluster-api/internal/controllers/machine.(*Reconciler).reconcile\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller.go:316\nsigs.k8s.io/cluster-api/internal/controllers/machine.(*Reconciler).Reconcile\n\tsigs.k8s.io/cluster-api/internal/controllers/machine/machine_controller.go:240\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.19.0/pkg/internal/controller/controller.go:224\nruntime.goexit\n\truntime/asm_amd64.s:1695"}]}

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

/area machine

@k8s-ci-robot k8s-ci-robot added area/machine Issues or PRs related to machine lifecycle management cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 6, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 6, 2024
@chrischdi
Copy link
Member Author

/assign sbueringer fabriziopandini

@chrischdi chrischdi changed the title machine: add test coverage for shouldNodeHaveOutdatedTaint 🌱 machine: add test coverage for shouldNodeHaveOutdatedTaint Sep 6, 2024
@sbueringer
Copy link
Member

The PR title is a bit misleading :)

@chrischdi chrischdi changed the title 🌱 machine: add test coverage for shouldNodeHaveOutdatedTaint 🌱 machine: prevent error spamming for NodeOutdatedTaint if objects are not found Sep 6, 2024
@chrischdi
Copy link
Member Author

The PR title is a bit misleading :)

Damn.. renamed it -> rebase -> refresh page -> lost my title...

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 6, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 0b18acee362a0e507069bed629192386e81d2dec

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 9, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 9, 2024
@sbueringer sbueringer added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Sep 10, 2024
Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last findings - I think

Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last nit from my side. Otherwise lgtm

Thx for improving this!!

@sbueringer
Copy link
Member

sbueringer commented Sep 12, 2024

/assign @fabriziopandini

(for a final review as well)

@chrischdi
Copy link
Member Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 12, 2024
@sbueringer
Copy link
Member

Thank you!!

/lgtm
/test pull-cluster-api-e2e-main

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 12, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 5493554a929abbbcdf750017665daf46ba015447

@fabriziopandini
Copy link
Member

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 19, 2024
@k8s-ci-robot k8s-ci-robot merged commit f9372a1 into kubernetes-sigs:main Sep 19, 2024
19 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.9 milestone Sep 19, 2024
@sbueringer
Copy link
Member

/cherry-pick release-1.8

@k8s-infra-cherrypick-robot

@sbueringer: new pull request created: #11199

In response to this:

/cherry-pick release-1.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/machine Issues or PRs related to machine lifecycle management cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants