Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbound resource creation within managed topologies #10275

Closed
mnaser opened this issue Mar 14, 2024 · 9 comments · Fixed by #10277
Closed

Unbound resource creation within managed topologies #10275

mnaser opened this issue Mar 14, 2024 · 9 comments · Fixed by #10277
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@mnaser
Copy link

mnaser commented Mar 14, 2024

What steps did you take and what happened?

When using managed topologies, we've ran into an issue where if the user creates a Cluster with some invalid value that ends up failing validation down the line, CAPI will keep creating resources non-stop.

For example:

status:
  conditions:
  - lastTransitionTime: "2024-02-29T14:07:45Z"
    message: 'error reconciling the Cluster topology: failed to create KubeadmControlPlane.controlplane.cluster.x-k8s.io:
      FieldValueForbidden: spec.kubeadmConfigSpec.format: Forbidden: can be set only
      if the KubeadmBootstrapFormatIgnition feature gate is enabled FieldValueForbidden:
      spec.kubeadmConfigSpec.ignition: Forbidden: can be set only if the KubeadmBootstrapFormatIgnition
      feature gate is enabled'
    reason: TopologyReconcileFailed
    severity: Error
    status: "False"
    type: TopologyReconciled
  observedGeneration: 1
  phase: Pending

As it keeps trying again and again, it will keep spawning more InfrastructureCluster resources, in my case, got up to ~5k and ~11k at some point.

What did you expect to happen?

CAPI should try and figure out if there is an existing InfrastructureCluster tied to this, if there is, then just re-use that instead of creating it again.

Cluster API version

1.6.0

Kubernetes version

1.28

Anything else you would like to add?

Happy to assist to fixing it if we're pointed where to go.

Label(s) to be applied

/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 14, 2024
@sbueringer
Copy link
Member

@fabriziopandini could this be a bad edge case around the cluster shim?

@fabriziopandini
Copy link
Member

fabriziopandini commented Mar 15, 2024

Not sure, but I will try to reproduce next week

@fabriziopandini
Copy link
Member

/triage accepted
/assign

The issue isn't in the cluster shim, but in the fact that we reconcile the cluster object (with the infrastructure Ref) only if both Infra cluster and control plane are created successfully. This leads the next read desired state to believe there is no infra cluster and thus to re-create it.

I'm going to send a PR to fix this

Note: cluster shim is going to clean up all those objects as soon as both Infra cluster and control plane are reconciled successfully (as per design)

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 18, 2024
@fabriziopandini
Copy link
Member

@mnaser it will be great if you can help in validating the fix PR

@chrischdi
Copy link
Member

/reopen

Let's close this after cherry-picks to release-1.5 and release-1.6 are done.

@k8s-ci-robot k8s-ci-robot reopened this Mar 27, 2024
@k8s-ci-robot
Copy link
Contributor

@chrischdi: Reopened this issue.

In response to this:

/reopen

Let's close this after cherry-picks to release-1.5 and release-1.6 are done.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fabriziopandini
Copy link
Member

1.6 backport #10326

@fabriziopandini
Copy link
Member

1.5 backport #10347

@killianmuldoon
Copy link
Contributor

I think this can now be closed - backports are merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants