From c04a9d7dea046f3755a902e278a279f018e9e734 Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Wed, 17 Jul 2024 17:11:41 +0200 Subject: [PATCH 01/22] First draft --- .../improve-status-in-CAPI-resources.md | 921 ++++++++++++++++++ 1 file changed, 921 insertions(+) create mode 100644 docs/proposals/improve-status-in-CAPI-resources.md diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md new file mode 100644 index 000000000000..e17791bdfa4d --- /dev/null +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -0,0 +1,921 @@ + +--- +title: Proposal Template +authors: +- "@fabriziopandini" +reviewers: +- "add" +creation-date: 2024-07-17 +last-updated: 2024-07-17 +status: provisional +see-also: +- ... +--- + +# Improving status in CAPI resources + +## Table of Contents + +- [Improving status in CAPI resources](#improving-status-in-capi-resources) + - [Table of Contents](#table-of-contents) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals/Future Work](#non-goalsfuture-work) + - [Proposal](#proposal) + - [Readiness and Availability](#readiness-and-availability) + - [Transition to K8s API conventions aligned conditions](#transition-to-k8s-api-conventions-aligned-conditions) + - [Changes to Machine resource](#changes-to-machine-resource) + - [Machine Status](#machine-status) + - [Machine (New)Conditions](#machine-newconditions) + - [Machine Spec](#machine-spec) + - [Machine Print columns](#machine-print-columns) + - [Changes to MachineSet resource](#changes-to-machineset-resource) + - [MachineSet Status](#machineset-status) + - [MachineSet (New)Conditions](#machineset-newconditions) + - [MachineSet Print columns](#machineset-print-columns) + - [Changes to MachineDeployment resource](#changes-to-machinedeployment-resource) + - [MachineDeployment Status](#machinedeployment-status) + - [MachineDeployment (New)Conditions](#machinedeployment-newconditions) + - [MachineDeployment Print columns](#machinedeployment-print-columns) + - [Changes to Cluster resource](#changes-to-cluster-resource) + - [Cluster Status](#cluster-status) + - [Cluster (New)Conditions](#cluster-newconditions) + - [Cluster Spec](#cluster-spec) + - [Cluster Print columns](#cluster-print-columns) + - [Changes to KubeadmControlPlane (KCP) resource](#changes-to-kubeadmcontrolplane-kcp-resource) + - [KubeadmControlPlane Status](#kubeadmcontrolplane-status) + - [KubeadmControlPlane (New)Conditions](#kubeadmcontrolplane-newconditions) + - [KubeadmControlPlane Print columns](#kubeadmcontrolplane-print-columns) + - [Changes to MachinePool resource](#changes-to-machinepool-resource) + - [Changes to Cluster API contract](#changes-to-cluster-api-contract) + +# Summary + +This documents defines how status in CAPI resources is going to evolve in the v1beta2 API version, with the goal of +improving usability and consistency across different resources in CAPI and with the rest of the ecosystem. + +# Motivation + +The Cluster API community recognize that nowadays Cluster API and Kubernetes users are rightfully focused on +building higher systems and great applications on top on those platforms, which is great. + +However, as a consequence of this shifted focus, most of the users don’t have time to become deep expert of Cluster API +like the first wave of adopters, and also Cluster API maintainers would like they don’t have to. + +The effect of the trend above is the blurring of the lines not only between different Cluster API components, but also +between Cluster API, core Kubernetes and a few other broadly adopted tools like Helm or Flux (and to some extents, +also with many others awesome tools in the ecosystem). + +This is why Cluster API status must become simpler to understand for users, and also more consistent not only across +different CAPI resources, but with Kubernetes core and ideally with the entire ecosystem. + +### Goals + +- Review and standardize the usage of the concept of readiness across Cluster API resources. + - Drop or amend improper usage of readiness + - Make the concept of Machine readiness extensible, thus allowing providers or external system to inject their readiness checks. +- Review and standardize the usage of the concept of availability across Cluster API resources. + - Make the concept of Cluster Availability extensible, thus allowing providers or external system to inject their availability checks. +- Bubble up more information about both CP and worker Machines, ensuring consistent way across Cluster API resources. + - Standardize replica counters and bubble them up to the Cluster resource. + - Standardize control plane, MachineDeployment, Machine pool availability, and bubble them up to the Cluster resource. +- Introduce missing signals about connectivity to workload clusters, thus enabling to mark all the conditions + depending on such connectivity being working with status Unknown after a certain amount of time. +- Introduce a cleaner signal about Cluster API resources lifecycle transitions, e.g. scaling up or updating. +- Ensure everything in status can be used as a signal informing monitoring tools/automation on top of Cluster + about lifecycle transitions/state of the Cluster and the underlying components as well. + +### Non-Goals/Future Work + +- Resolving all the idiosyncrasies that exists in Cluster API, core Kubernetes, the rest of the ecosystem. + (Let’s stay focused in Cluster API and keep improving incrementally). + +## Proposal + +This proposal groups a set of changes to status fields in Cluster API resources. + +Some of those changes could be considered straight forward, e.g. + +- K8s API conventions suggest to deprecate and remove `phase` fields from status, Cluster API is going to align to this recommendation + (and improve Conditions to provide similar or even a better info as a replacement). +- K8s resources do not have a concept similar to "terminal feature" existing in Cluster API resources, and users approaching + the project are struggling with this idea; in some cases also provider's implementers are struggling with it. + Accordingly, Cluster API resources are dropping `FailureReason` and `FailureMessage` fields (terminal failures should be surfaced using + conditions, like any other error/warning/message) +- Bubble up more information about both CP and worker Machines to the Cluster level. + +Some other changes requires a little bit more context, which is provided in following paragraphs: + +- Review and standardize the usage of the concept of readiness and availability to align to K8s API conventions / + conditions used in core K8s objects like `Pod`, `Node`, `Deployment`, `ReplicaSet` etc. +- Transition to K8s API conventions fully aligned conditions types/condition management (and thus deprecation of + the Cluster API "custom" guidelines for conditions). + +The last set of changes is a consequence of the above changes, or small improvements to address feedback received +over time; changes in this group will be detailed case by case in the following paragraphs, a few examples: + +- Change the semantic of ReadyReplica counters to use Machine's Ready condition instead of Node's Ready condition. + (so everywhere Ready is used for a Machine it always means the same thing) +- Add a new condition monitoring the status of the connectivity to workload clusters (`RemoteConnectionProbe`). + +In order to keep making progress on this proposal, the fist iteration will be focused on + +- Machines +- MachineSets +- MachineDeployments +- MachinePools +- KubeadmControlPlane (ControlPlanes) +- Clusters + +Other resources will be added as soon as there will be agreement on the general direction. + +Overall, the union of all those changes, is expected to greatly improve status fields, conditions, replica counters +and print columns. + +Those improvements are expected to provide benefit to users interacting with the systems, using monitoring tools, and +building higher level systems or products on top of Cluster API. + +### Readiness and Availability + +The [condition CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20200506-conditions.md) in Cluster API introduced very strict requirements about “Ready” condition, mandating it +to exists on all resources and also mandating that Ready must be computed as the summary of all the to other existing +conditions. + +However, over time Cluster API maintainers recognized several limitations of the “one fit all”, strict approach. + +e.g. Higher level abstractions in Cluster API are designed to remain operational during lifecycle operations, +for instance a Machine deployment is operational even if is rolling out. + +But the use cases above where hard to combine with the strict requirement to have all the conditions true, and +as a result today Cluster APi resources barely have conditions surfacing that lifecycle operations are happening, or where +those condition are defined they have a semantic which is not easy to understand, like e.g. 'Resized' or 'MachinesSpecUpToDate'. + +e.g. when you look at higher level abstractions in Cluster API like Clusters, MachineDeployments and ControlPlanes, readiness +might be confusing, because those resources usually accept a certain degree of not readiness, e.g. MachineDeployments are +usually ok even if a few machine is not ready (up to MaxUnavailable). + +In order to address thi problem, Cluster API is going to align to K8s API conventions. As a consequence, the “Ready” +condition won't be required anymore to exists on all the resources, nor when it exists, it will be required to include +all the existing conditions in the ready summary. + +As a consequence, we will continue to use the ready condition *only* where it makes sense, and with a well-defined +semantic that conveys important information to the users (vs applying "blindly" the same formula everywhere). + +The most important effect of this change is the definition of a new semantic for the Machine's “Ready” condition, that +will now clearly represent the "machine can host workloads" (prior art Kubernetes nodes are ready when "node can host pods"). +To improve the benefit of this change: + +- This proposal is ensuring that whenever Machine ready is used, it always means the same thing (e.g. replica counters) +- This proposal is also changing contract fields where ready was used improperly to represent + initial provisioning (k8s API conventions suggest to use ready only for long-running process). + +All in all, Machine's Ready should be much more clear, consistent, intuitive after proposed changes. +But there is more. + +This proposal is also dropping Ready condition from higher level abstractions in Cluster API. + +Instead, where not already present, this proposal is introducing a new Available condition that better represents +the fact that those objects are operational even if there is a certain degree of not readiness in the system +or if lifecycle operations are happening (prior art Available condition in K8s Deployments). + +Last but not least: + +- With the changes to the semantic of Ready and Available conditions, it is now possible to add conditions about + surfacing that lifecycle operations are happening, e.g. scaling up. +- As suggested by K8s API conventions, this proposal is also making sure all conditions are consistent and have + uniform meaning across all resource types +- Additionally, we are enforcing the same consistency for replica counters and other status fields. + +### Transition to K8s API conventions aligned conditions + +K8s is undergoing an effort of standardizing usage of conditions across all resource types, and the transition to +the v1beta2 API version is a great opportunity for Cluster API to align to this effort. + +The value of this transition is substantial, because the differences that exists today's are really confusing for users; +those differences are also making it harder for ecosystem tools to build on top of Cluster API, and in some cases +even confusing new (and old) contributors. + +With this proposal Cluster API will close the gap with K8s API conventions in regard to: +- Polarity: Condition type names should make sense for humans; neither positive nor negative polarity can be recommended + as a general rule (already implemented by [#10550](https://github.com/kubernetes-sigs/cluster-api/pull/10550)) +- Use of the Reason field is required (currently in Cluster API reasons is added only when condition are false) +- Controllers should apply their conditions to a resource the first time they visit the resource, even if the status is Unknown. + (currently Cluster API controllers add conditions at different stages of the reconcile loops) +- Cluster API is also dropping its own Condition type and start using metav1.Conditions from the Kubernetes API. + +The last point have also another implication, which is the removal of the Severity field which is currently used +to determine priority when merging conditions. + +TODO Document how we are going to replace severity (currently prototyping) + +### Changes to Machine resource + +#### Machine Status + +Following changes are implemented to Machine's status: + +- Disambiguate usage of ready term by renaming fields used for the provisioning workflow +- Align to K8s API conventions by deprecating `Phase` and corresponding `LastUpdated` +- Remove `FailureReason` and `FailureMessage` to get rid of the confusing concept of terminal failures +- Transition to new, improved, K8s API conventions aligned conditions + +Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); +After golang types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. + +```golang +type MachineStatus struct { + + // Initialization provides observations of the Machine initialization process. + // NOTE: fields in this struct are part of the Cluster API contract and are used to orchestrate initial Machine provisioning. + // The value of those fields is never updated after provisioning is completed. + // Use conditions to monitor the operational state of the Machine's BootstrapSecret. + // +optional + Initialization *MachineInitializationStatus `json:"initialization,omitempty"` + + // Represents the observations of a Machine's current state. + // +listType=map + // +listMapKey=type + Conditions []metav1.Condition `json:"conditions,omitempty"` + + // Other fields... + // NOTE: `Phase`, `LastUpdated`, `FailureReason`, `FailureMessage` fields won't be there anymore +} + +// MachineInitializationStatus provides observations of the Machine initialization process. +type MachineInitializationStatus struct { + + // BootstrapSecretCreated is true when the bootstrap provider reports that the Machine's boostrap secret is created. + // NOTE: this field is part of the Cluster API contract, and it is used to orchestrate initial Machine provisioning. + // The value of this field is never updated after provisioning is completed. + // Use conditions to monitor the operational state of the Machine's BootstrapSecret. + // +optional + BootstrapSecretCreated bool `json:"bootstrapSecretCreated"` + + // InfrastructureProvisioned is true when the infrastructure provider reports that the Machine's infrastructure is fully provisioned. + // NOTE: this field is part of the Cluster API contract, and it is used to orchestrate initial Machine provisioning. + // The value of this field is never updated after provisioning is completed. + // Use conditions to monitor the operational state of the Machine's infrastructure. + // +optional + InfrastructureProvisioned bool `json:"infrastructureProvisioned"` +} +``` + +| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | +|-------------------------------|----------------------------------------------------------|--------------------------------------------| +| | `Initialization` (new) | `Initialization` | +| `BootstrapReady` | `Initialization.BootstrapSecretCreated` (renamed) | `Initialization.BootstrapSecretCreated` | +| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | +| | `BackCompatibilty` (new) | (removed) | +| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | +| `LastUpdated` (deprecated) | `BackCompatibilty.LastUpdated` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExprimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| other fields... | other fields... | other fields... | + +Notes: +- The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). + Fields in this struct are used for supporting down conversions, thus providing users relying on v1beta1 APIs additional buffer time to pick up the new changes. + +##### Machine (New)Conditions + +| Condition | Note | +|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `Available` | True if at the machine is Ready for at least MinReady seconds, as defined by the Machine's owner resource | +| `Ready` | True if Machine's `BootstrapSecretReady`, `InfrastructureReady`, `NodeHealthy` and `HealthCheckSucceeded` (if present) are true; if other conditions are defined in `spec.readinessGates` are defined, those conditions should be true as well for the Machine to be ready. | +| `UpToDate` | True if the Machine spec matches the spec of the Machine's owner resource, e.g KubeadmControlPlane or MachineDeployment | +| `BootstrapConfigReady` | Mirrors the corresponding condition from the Machine's BootstrapConfig resource | +| `InfrastructureReady` | Mirrors the corresponding condition from the Machine's Infrastructure resource | +| `NodeReady` | True if the Machine's Node is ready | +| `NodeHealthy` | True if the Machine's Node is ready and it does not report MemoryPressure, DiskPressure and PIDPressure | +| `HealthCheckSucceeded` | True if MHC instances targeting this machine reports the Machine is healthy according to the definition of healthy present in the spec of the Machine Health Check object | +| `OwnerRemediated` | | +| `Deleted` | True if Machine is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | +| `Paused` | True if the Machine or the Cluster it belongs to are paused | + +> To better evaluate proposed changes, below you can find the list of current Machine's conditions: +> Ready, InfrastructureReady, NodeHealthy, PreDrainDeleteHookSucceeded, VolumeDetachSucceeded, DrainingSucceeded. +> Additionally: +> - The MachineHealthCheck controller adds the HealthCheckSucceeded and the OwnerRemediated conditions. +> - The KubeadmControlPlane adds the ApiServerPodHealthy, ControllerManagerPodHealthy, SchedulerPodHealthy, EtcdPodHealthy, EtcdMemberHealthy conditions. + +Notes: +- This proposal introduces a mechanism for extending the meaning of Machine Readiness, `ReadinessGates` (see [changes to Machine.Spec](#machine-spec)). +- While `Ready` is the main signal for machines operational state, higher level abstractions in Cluster API like e.g. + MachineDeployment are relying on the concept of Machine's `Availability`, which can be seen as readiness + stability. + In order to standardize this concept across different higher level abstractions, this proposal is surfacing `Availability` + condition at Machine level as well as adding a new `MinReadySeconds` field (see [changes to Machine.Spec](#machine-spec)) + that will be used to compute this condition. +- Similarly, this proposal is standardizing the concept of Machine's `UpToDate`, however in this case it will be up to + the Machine's owner controllers to set this condition. +- Conditions like `NodeReady` and `NodeHealthy` which depends on the connection to the remote cluster will take benefit + of the new `RemoteConnectionProbe` condition at cluster level (see [Cluster (New)Conditions](#cluster-newconditions)); + more specifically those condition should be set to `Unknown` after the cluster Probe fails + (or after whatever period is defined in the `--remote-conditions-grace-period` flag) +- `HealthCheckSucceeded` and `OwnerRemediated` (or `ExternalRemediationRequestAvailable`) are set by the MachineHealthCheck controller in case a resource instance targets the machine. +- KubeadmControlPlane also adds additional conditions to Machines, but those conditions are not included in the table above + for sake of simplicity (however they are documented in the KubeadmControlPlane paragraph). + +TODO: think carefully at remote conditions becoming unknown, this could block a few operations ... + +#### Machine Spec + +Machine's spec is going to be improved to allow 3rd party to extend the semantic of the new Machine's `Ready` condition +as well to standardize the concept of Machine's `Availability`. + +Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); +After golang types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. + +```go +type MachineSpec struct { + + // MinReadySeconds is the minimum number of seconds for which a Node for a newly created machine should be ready before considering the replica available. + // Defaults to 0 (machine will be considered available as soon as the Node is ready) + // +optional + MinReadySeconds int32 `json:"minReadySeconds,omitempty"` + + // If specified, all readiness gates will be evaluated for Machine readiness. + // A Machine is ready when `InfrastructureReady`, `NodeHealthy` and `HealthCheckSucceeded` (if present) are "True"; + // if other conditions are defined this field, those conditions should be "True" as well for the Machine to be ready. + // +optional + // +listType=map + // +listMapKey=conditionType + ReadinessGates []MachineReadinessGate `json:"readinessGates,omitempty"` + + // Other fields... +} + +// MachineReadinessGate contains the reference to a Machine condition to be used as readiness gates. +type MachineReadinessGate struct { + // ConditionType refers to a condition in the Machine's condition list with matching type. + // Note: Both Cluster API conditions or conditions added by 3rd party controller can be used as readiness gates. + ConditionType string `json:"conditionType"` +} +``` + +| v1beta1 (current) | v1Beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | +|------------------------|-----------------------------|-------------------------------------| +| `ReadinessGates` (new) | `ReadinessGates` | `ReadinessGates` | +| other fields... | other fields... | other fields... | + +Notes: +- Both `MinReadySeconds` and `ReadinessGates` should be treated as other in-place propagated fields (changing this should not trigger rollouts). +- Similarly to Pod's `ReadinessGates`, also Machine's `ReadinessGates` accept only conditions with positive polarity; + The Cluster API project might revisit this in future to stay aligned with Kubernetes or if there are use cases justifying this change. + +#### Machine Print columns + +| Current | To be | +|-------------------|-------------------------------| +| `NAME` | `NAME` | +| `CLUSTER` | `CLUSTER` | +| `NODE NAME` | `PAUSED` (new) (*) | +| `PROVIDER ID` | `NODE NAME` | +| `PHASE` (deleted) | `PROVIDER ID` | +| `AGE` | `READY` (new) | +| `VERSION` | `AVAILABLE` (new) | +| | `UP TO DATE` (new) | +| | `AGE` | +| | `OS-IMAGE` (new) (*) | +| | `KERNEL-VERSION` (new) (*) | +| | `CONTAINER-RUNTIME` (new) (*) | + +TODO: figure out if can `INTERNAL-IP` (new) (*), `EXTERNAL-IP` after `VERSION` / before `OS-IMAGE`? (similar to Nodes...). +might be something like `$.status.addresses[?(@.type == 'InternalIP')].address` works, but not sure what happens if there are 0 or more addresses... + Stefan +1 if possible + +(*) visible only when using `kubectl get -o wide` + +### Changes to MachineSet resource + +#### MachineSet Status + +Following changes are implemented to MachineSet's status: + +- Update `ReadyReplicas` counter to use the same semantic Machine's `Ready` (today it is computed a Machines with Node Ready) condition and add missing `UpToDateReplicas`. +- Remove `FailureReason` and `FailureMessage` to get rid of the confusing concept of terminal failures +- Transition to new, improved, K8s API conventions aligned conditions + +Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); +After golang types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. + +```golang +type MachineSetStatus struct { + + // The number of ready replicas for this MachineSet. A machine is considered ready when Machine's Ready condition is true. + // +optional + ReadyReplicas int32 `json:"readyReplicas"` + + // The number of up-to-date replicas for this MachineSet. A machine is considered up-to-date when Machine's UpToDate condition is true. + // +optional + UpToDateReplicas int32 `json:"upToDateReplicas"` + + // Represents the observations of a MachineSet's current state. + // +listType=map + // +listMapKey=type + Conditions []metav1.Condition `json:"conditions,omitempty"` + + // Other fields... + // NOTE: `FailureReason`, `FailureMessage` fields won't be there anymore +} +``` + +| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | +|----------------------------------|----------------------------------------------------------|-------------------------------------| +| | `BackCompatibilty` (new) | (removed) | +| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | +| `ExprimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExprimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| `UpToDateReplicas` (new) | `UpToDateReplicas` | `UpToDateReplicas` | +| other fields... | other fields... | other fields... | + +Notes: +- The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). + Fields in this struct are used for supporting down conversions, thus providing users relying on v1beta1 APIs additional buffer time to pick up the new changes. +- This proposal is using `UpToDateReplicas` instead of `UpdatedReplicas`; This is a deliberated choice to avoid + confusion between update (any change) and upgrade (change of the Kubernetes versions). +- Also `AvailableReplicas` will determine Machine's availability by reading Machine.Available condition instead of + computing availability as of today, however in this case the semantic of the field is not changed + +TODO: check `FullyLabeledReplicas`, do we still need it? + +#### MachineSet (New)Conditions + +| Condition | Note | +|------------------|----------------------------------------------------------------------------------------------------------------| +| `ReplicaFailure` | This condition surfaces issues on creating Machines, if any. | +| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `UpToDate` | True if all the Machines controlled by this MachineSet are up to date (replicas = upToDate replicas) | +| `Remediating` | True if there is at least one Machine controlled by this MachineSet is not passing health checks | +| `Deleted` | True if MachineSet is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | +| `Paused` | True if this MachineSet or the Cluster it belongs to are paused | + +> To better evaluate proposed changes, below you can find the list of current MachineSet's conditions: +> Ready, MachinesCreated, Resized, MachinesReady. + +Notes: +- MachineSet conditions are intentionally mostly consistent with MachineDeployment conditions to help users troubleshooting . +- MachineSet is considered as a sort of implementation detail of MachineDeployments, so it doesn't have its own concept of availability. + Similarly, this proposal is dropping the notion of MachineSet readiness because it is preferred to let users focusing on Machines readiness. +- `Remediating` for older MachineSet sets will report that remediation will happen as part of the regular rollout. +- `UpToDate` condition initially will be `false` for older MachineSet, `true` for the current MachineSet; however in + the future the latter might evolve in case Cluster API will start supporting in-place upgrades. + +#### MachineSet Print columns + +| Current | To be | +|---------------|-------------------------| +| `NAME` | `NAME` | +| `CLUSTER` | `CLUSTER` | +| `DESIRED` (*) | `PAUSED` (new) (*) | +| `REPLICAS` | `DESIRED` | +| `READY` | `CURRENT` (renamed) (*) | +| `AVAILABLE` | `READY` (updated) | +| `AGE` | `AVAILABLE` (updated) | +| `VERSION` | `UP-TO-DATE` (new) | +| | `AGE` | +| | `VERSION` | + +(*) visible only when using `kubectl get -o wide` + +Notes: +- In k8s Deployment and ReplicaSet have different print columns for replica counters; this proposal enforces replicas + counter columns consistent across all resources. + +### Changes to MachineDeployment resource + +#### MachineDeployment Status + +Following changes are implemented to MachineDeployment's status: + +- Align `UpdatedReplicas` to use Machine's `UpToDate` condition (and rename it accordingly) +- Align to K8s API conventions by deprecating `Phase` +- Remove `FailureReason` and `FailureMessage` to get rid of the confusing concept of terminal failures +- Transition to new, improved, K8s API conventions aligned conditions + +Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); +After golang types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. + +```golang +type MachineDeploymentStatus struct { + + // The number of up-to-date replicas targeted by this deployment. + // +optional + UpToDateReplicas int32 `json:"upToDateReplicas"` + + // Represents the observations of a MachineDeployment's current state. + // +listType=map + // +listMapKey=type + Conditions []metav1.Condition `json:"conditions,omitempty"` + + // Other fields... + // NOTE: `Phase`, `FailureReason`, `FailureMessage` fields won't be there anymore +} +``` + +| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | +|-------------------------------|----------------------------------------------------------|-------------------------------------| +| `UpdatedReplicas` | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | +| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExprimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| other fields... | other fields... | other fields... | + +Notes: +- The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). + Fields in this struct are used for supporting down conversions, thus providing users relying on v1beta1 APIs additional buffer time to pick up the new changes. + +#### MachineDeployment (New)Conditions + +| Condition | Note | +|------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `Available` | True if the MachineDeployment has minimum availability according to parameters specified in the deployment strategy, e.g. If using RollingUpgrade strategy, availableReplicas must be greater or equal than desired replicas - MaxUnavailable replicas | +| `ReplicaFailure` | This condition surfaces issues on creating Machines controlled by this MachineDeployment, if any. | +| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `UpToDate` | True if all the Machines controlled by this MachineDeployment are up to date (replicas = upToDate replicas) | +| `Remediating` | True if there is at least one machine controlled by this MachineDeployment is not passing health checks | +| `Deleted` | True if MachineDeployment is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | +| `Paused` | True if this MachineDeployment or the Cluster it belongs to are paused | + +> To better evaluate proposed changes, below you can find the list of current MachineDeployment's conditions: +> Ready, Available. + +#### MachineDeployment Print columns + +| Current | To be | +|-------------------------|------------------------| +| `NAME` | `NAME` | +| `CLUSTER` | `CLUSTER` | +| `DESIRED` (*) | `PAUSED` (new) (*) | +| `REPLICAS` | `DESIRED` | +| `READY` | `CURRENT` (*) | +| `UPDATED` (renamed) | `READY` | +| `UNAVAILABLE` (deleted) | `AVAILABLE` (new) | +| `PHASE` (deleted) | `UP-TO-DATE` (renamed) | +| `AGE` | `AGE` | +| `VERSION` | `VERSION` | + +TODO: consider if to add Machine deployment `AVAILABLE`, but we should find a way to differentiate from `AVAILABLE` replicas + Stefan +1 to have AVAILABLE, not sure if we can have two columns with the same header + +(*) visible only when using `kubectl get -o wide` + +### Changes to Cluster resource + +#### Cluster Status + +Following changes are implemented to Cluster's status: + +- Disambiguate usage of ready term by renaming fields used for the provisioning workflow +- Align to K8s API conventions by deprecating `Phase` and corresponding `LastUpdated` +- Remove `FailureReason` and `FailureMessage` to get rid of the confusing concept of terminal failures +- Transition to new, improved, K8s API conventions aligned conditions +- Add replica counters to surface status of Machines belonging to this Cluster +- Surface information about ControlPlane connection heartbeat (see new conditions) + +Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); +After golang types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. + +```golang +type ClusterStatus struct { + + // Initialization provides observations of the Cluster initialization process. + // NOTE: fields in this struct are part of the Cluster API contract and are used to orchestrate initial Cluster provisioning. + // The value of those fields is never updated after provisioning is completed. + // Use conditions to monitor the operational state of the Cluster's BootstrapSecret. + // +optional + Initialization *MachineInitializationStatus `json:"initialization,omitempty"` + + // Represents the observations of a Cluster's current state. + // +listType=map + // +listMapKey=type + Conditions []metav1.Condition `json:"conditions,omitempty"` + + // ControlPlane groups all the observations about Cluster's ControlPlane current state. + // +optional + ControlPlane ClusterControlPlaneStatus `json:"controlPlane,omitempty"` + + // Workers groups all the observations about Cluster's Workers current state. + // +optional + Workers ClusterControlPlaneStatus `json:"workers,omitempty"` + + // other fields +} + +// ClusterInitializationStatus provides observations of the Cluster initialization process. +type ClusterInitializationStatus struct { + + // InfrastructureProvisioned is true when the infrastructure provider reports that Cluster's infrastructure is fully provisioned. + // NOTE: this field is part of the Cluster API contract, and it is used to orchestrate provisioning. + // The value of this field is never updated after provisioning is completed. + // +optional + InfrastructureProvisioned bool `json:"infrastructureProvisioned"` + + // ControlPlaneInitialized denotes when the control plane is functional enough to accept requests. + // This information is usually used as a signal for starting all the provisioning operations that depends on + // a functional API server, but do not require a full HA control plane to exists, like e.g. join worker Machines, + // install core addons like CNI, CPI, CSI etc. + // NOTE: this field is part of the Cluster API contract, and it is used to orchestrate provisioning. + // The value of this field is never updated after provisioning is completed. + // +optional + ControlPlaneInitialized bool `json:"controlPlaneInitialized"` +} + +// ClusterControlPlaneStatus groups all the observations about control plane current state. +type ClusterControlPlaneStatus struct { + // Total number of desired control plane machines in this cluster. + // +optional + DesiredReplicas int32 `json:"desiredReplicas"` + + // Total number of non-terminated control plane machines in this cluster. + // +optional + Replicas int32 `json:"replicas"` + + // The number of up-to-date control plane machines in this cluster. + // +optional + UpToDateReplicas int32 `json:"upToDateReplicas"` + + // Total number of ready control plane machines in this cluster. + // +optional + ReadyReplicas int32 `json:"readyReplicas"` + + // Total number of available control plane machines in this cluster. + // +optional + AvailableReplicas int32 `json:"availableReplicas"` + + // Total number of unavailable control plane machines in this cluster. + // +optional + UnavailableReplicas int32 `json:"unavailableReplicas"` +} + +// WorkersPlaneStatus groups all the observations about workers current state. +type WorkersPlaneStatus struct { + // Total number of desired worker machines in this cluster. + // +optional + DesiredReplicas int32 `json:"desiredReplicas"` + + // Total number of non-terminated worker machines in this cluster. + // +optional + Replicas int32 `json:"replicas"` + + // The number of up-to-date worker machines in this cluster. + // +optional + UpToDateReplicas int32 `json:"upToDateReplicas"` + + // Total number of ready worker machines in this cluster. + // +optional + ReadyReplicas int32 `json:"readyReplicas"` + + // Total number of available worker machines in this cluster. + // +optional + AvailableReplicas int32 `json:"availableReplicas"` + + // Total number of unavailable worker machines in this cluster. + // +optional + UnavailableReplicas int32 `json:"unavailableReplicas"` +} +``` + +// TODO: check about "non-terminated" for replicas fields. + +| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | +|------------------------------------------|----------------------------------------------------------|--------------------------------------------| +| | `Initialization` (new) | `Initialization` | +| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | +| `ControlPlaneReady` | `Initialization.ControlPlaneInitialized` (renamed) | `Initialization.ControlPlaneInitialized` | +| | `BackCompatibilty` (new) | (removed) | +| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| `ControlPlane` (new) | `ControlPlane` | `ControlPlane` | +| `ControlPlane.DesiredReplicas` (new) | `ControlPlane.DesiredReplicas` | `ControlPlane.DesiredReplicas` | +| `ControlPlane.Replicas` (new) | `ControlPlane.Replicas` | `ControlPlane.Replicas` | +| `ControlPlane.ReadyReplicas` (new) | `ControlPlane.ReadyReplicas` | `ControlPlane.ReadyReplicas` | +| `ControlPlane.UpToDateReplicas` (new) | `ControlPlane.UpToDateReplicas` | `ControlPlane.UpToDateReplicas` | +| `ControlPlane.AvailableReplicas` (new) | `ControlPlane.AvailableReplicas` | `ControlPlane.AvailableReplicas` | +| `ControlPlane.UnavailableReplicas` (new) | `ControlPlane.UnavailableReplicas` | `ControlPlane.UnavailableReplicas` | +| `Workers` (new) | `Workers` | `Workers` | +| `Workers.DesiredReplicas` (new) | `Workers.DesiredReplicas` | `Workers.DesiredReplicas` | +| `Workers.Replicas` (new) | `Workers.Replicas` | `Workers.Replicas` | +| `Workers.ReadyReplicas` (new) | `Workers.ReadyReplicas` | `Workers.ReadyReplicas` | +| `Workers.UpToDateReplicas` (new) | `Workers.UpToDateReplicas` | `Workers.UpToDateReplicas` | +| `Workers.AvailableReplicas` (new) | `Workers.AvailableReplicas` | `Workers.AvailableReplicas` | +| `Workers.UnavailableReplicas` (new) | `Workers.UnavailableReplicas` | `Workers.UnavailableReplicas` | +| other fields... | other fields... | other fields... | + +notes: +- The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). + Fields in this struct are used for supporting down conversions, thus providing users relying on v1beta1 APIs additional buffer time to pick up the new changes. + +##### Cluster (New)Conditions + +| Condition | Note | +|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `Available` | True if Cluster `RemoteConnectionProbe` is true, if Cluster's control plane `Available` condition is true, if all MachineDeployment and MachinePool's `Available` condition are true; if conditions are defined in `spec.availabilityGates`, those conditions should be true as well for the Cluster to be available. | +| `ControlPlaneInitialized` | True when the Cluster's control plane is functional enough to accept requests. This information is usually used as a signal for starting all the provisioning operations that depends on a functional API server, but do not require a full HA control plane to exists. | +| `RemoteConnectionProbe` | True when control plane can be reached; in case of connection problems, the condition turns to false only if the the cluster cannot be reached for 40s after the first connection problem is detected (or whatever period is defined in the `--cluster-probe-grace-period` flag) the cluster cannot be reached | +| `ControlPlaneAvailable` | Mirror of Cluster's control plane `Available` condition | +| `WorkersAvaiable` | Summary of MachineDeployment and MachinePool's `Available` condition | +| `TopologyReconciled` | | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `UpToDate` | True if all the Machines controlled by this Cluster are up to date (replicas = upToDate replicas) | +| `Remediating` | True if there is at least one machine controlled by this Cluster is not passing health checks | +| `Deleted` | True if Cluster is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | +| `Paused` | True if Cluster and all the resources being part of it are paused | + +> To better evaluate proposed changes, below you can find the list of current Cluster's conditions: +> Ready, InfrastructureReady, ControlPlaneReady, ControlPlaneInitialized, TopologyReconciled + +Notes: +- `TopologyReconciled` exists only for classy clusters; this condition is managed by the topology reconciler. +- Cluster API is going to maintain a `lastRemoteConnectionProbeTime` and use it in combination with the + `--cluster-probe-grace-period` flag to avoid flakes on `RemoteConnectionProbe`. +- Similarly to `lastHeartbeatTime` in Kubernetes conditions, also `lastControlPlaneProbeTime` will not surface on the + API in order to avoid costly, continuous reconcile events. + +#### Cluster Spec + +Cluster's spec is going to be improved to allow 3rd party to extend the semantic of the new Cluster's `Available` condition. + +Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); +After golang types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. + +| v1beta1 (current) | v1Beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | +|---------------------------|-----------------------------|-------------------------------------| +| `AvailabilityGates` (new) | `AvailabilityGates` | `AvailabilityGates` | +| other fields... | other fields... | other fields... | + +```golang +type ClusterSpec struct { + // If specified, all availability gates will be evaluated for Cluster readiness. + // A Cluster is available when True if Cluster `ControlHeartbeat` and `TopologyReconciled` are true, if Cluster's + // control plane `Available` condition is true, if all worker resource's `Available` condition are true; + // if conditions are defined in `spec.availabilityGates` are defined, those conditions should be true as well. + // +optional + // +listType=map + // +listMapKey=conditionType + AvailabilityGates []ClusterAvailabilityGate `json:"availabilityGates,omitempty"` + + // Other fields... +} + +// ClusterAvailabilityGate contains the reference to a Cluster condition to be used as availability gates. +type ClusterAvailabilityGate struct { + // ConditionType refers to a condition in the Cluster's condition list with matching type. + // Note: Both Cluster API conditions or conditions added by 3rd party controller can be used as availability gates. + ConditionType string `json:"conditionType"` +} +``` + +Notes: +- Similarly to Pod's `ReadinessGates`, also Machine's `AvailabilityGates` accept only conditions with positive polarity; + The Cluster API project might revisit this in future to stay aligned with Kubernetes or if there are use cases justifying this change. +- In future the Cluster API project might consider ways to make `AvailabilityGates` configurable at ClusterClass level, but + this can be implemented as a follow-up. + +#### Cluster Print columns + +| Current | To be | +|-------------------|-----------------------| +| `NAME` | `NAME` | +| `CLUSTER CLASS` | `CLUSTER CLASS` | +| `PHASE` (deleted) | `PAUSED` (new) (*) | +| `AGE` | `AVAILABLE` (new) | +| `VERSION` | `CP_DESIRED` (new) | +| | `CP_CURRENT`(new) (*) | +| | `CP_READY` (new) (*) | +| | `CP_AVAILABLE` (new) | +| | `CP_UP_TO_DATE` (new) | +| | `W_DESIRED` (new) | +| | `W_CURRENT`(new) (*) | +| | `W_READY` (new) (*) | +| | `W_AVAILABLE` (new) | +| | `W_UP_TO_DATE` (new) | +| | `AGE` | +| | `VERSION` | + +(*) visible only when using `kubectl get -o wide` + +### Changes to KubeadmControlPlane (KCP) resource + +#### KubeadmControlPlane Status + +Following changes are implemented to MachineSet's status: + +- TODO: figure out what to do with contract fields + conditions +- Update `ReadyReplicas` counter to use the same semantic Machine's `Ready` condition and add missing `UpToDateReplicas`. +- Remove `FailureReason` and `FailureMessage` to get rid of the confusing concept of terminal failures +- Transition to new, improved, K8s API conventions aligned conditions + +Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); +After golang types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. + +```golang +type KubeadmControlPlaneStatus struct { + + // The number of ready replicas for this ControlPlane. A machine is considered ready when Machine's Ready condition is true. + // Note: In the v1beta1 API version a Machine was counted as ready when the node hosted on the Machine was ready, thus + // generating confusion for users looking at the Machine.Ready condition. + // +optional + ReadyReplicas int32 `json:"readyReplicas"` + + // The number of available replicas targeted by this ControlPlane. + // +optional + AvailableReplicas int32 `json:"availableReplicas"` + + // The number of up-to-date replicas targeted by this ControlPlane. + // +optional + UpToDateReplicas int32 `json:"upToDateReplicas"` + + // Represents the observations of a ControlPlane's current state. + // +listType=map + // +listMapKey=type + Conditions []metav1.Condition `json:"conditions,omitempty"` + + // Other fields... + // NOTE: `Ready`, `FailureReason`, `FailureMessage` fields won't be there anymore +} +``` + +| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | +|-----------------------------------|----------------------------------------------------------|-------------------------------------| +| `Ready` (deprecated) | `Ready` (deprecated) | (removed) | +| | `BackCompatibilty` (new) | (removed) | +| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | +| `ExperimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | +| `UpdatedReplicas` | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | +| `AvailableReplicas` (new) | `AvailableReplicas` | `AvailableReplicas` | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| other fields... | other fields... | other fields... | + +TODO: double check usages of status.ready. + +#### KubeadmControlPlane (New)Conditions + +| Condition | Note | +|---------------------------------|-------------------------------------------------------------------------------------------------------------------------| +| `Available` | True if the control plane can be reached and there is etcd quorum, and `CertificatesAvailable` is true | +| `CertificatesAvailable` | True if all the cluster certificates exist. | +| `ReplicaFailure` | This condition surfaces issues on creating Machines controlled by this KubeadmControlPlane, if any. | +| `Initialized` | True ControlPlaneComponentsHealthy. | +| `ControlPlaneComponentsHealthy` | This condition surfaces detail of issues on the controlled machines, if any. | +| `EtcdClusterHealthy` | This condition surfaces detail of issues on the controlled machines, if any. | +| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `UpToDate` | True if all the Machines controlled by this ControlPlane are up to date | +| `Remediating` | True if there is at least one machine controlled by this KubeadmControlPlane is not passing health checks | +| `Deleted` | True if KubeadmControlPlane is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | +| `Paused` | True if this resource or the Cluster it belongs to are paused | + +> To better evaluate proposed changes, below you can find the list of current KubeadmControlPlane's conditions: +> Ready, CertificatesAvailable, MachinesCreated, Available, MachinesSpecUpToDate, Resized, MachinesReady, +> ControlPlaneComponentsHealthy, EtcdClusterHealthy. + +Notes: +- `ControlPlaneComponentsHealthy` and `EtcdClusterHealthy` have a very strict semantic: everything should be ok for the condition to be true; + This means it is expected those condition to flick while performing lifecycle operations; over time we might consider changes to make + those conditions to distinguish more accurately health issues vs "expected" temporary unavailability. + +#### KubeadmControlPlane Print columns + +| Current | To be | +|--------------------------|------------------------| +| `NAME` | `NAME` | +| `CLUSTER` | `CLUSTER` | +| `DESIRED` (*) | `PAUSED` (new) (*) | +| `REPLICAS` | `INITIALIZED` (new) | +| `READY` | `DESIRED` | +| `UPDATED` (renamed) | `CURRENT` (*) | +| ``UNAVAILABLE` (deleted) | `READY` | +| `PHASE` (deleted) | `AVAILABLE` (new) | +| `AGE` | `UP-TO-DATE` (renamed) | +| `VERSION` | `AGE` | +| | `VERSION` | + +(*) visible only when using `kubectl get -o wide` + +### Changes to MachinePool resource + +TODO + +### Changes to Cluster API contract + +TODO From dc8bd3166b147440d92761bcec7a7a3b92aa8d09 Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Mon, 22 Jul 2024 14:19:49 +0200 Subject: [PATCH 02/22] First round of comments --- .../improve-status-in-CAPI-resources.md | 85 ++++++++++++------- 1 file changed, 52 insertions(+), 33 deletions(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index e17791bdfa4d..8d8a58bc12a3 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -52,23 +52,21 @@ see-also: # Summary -This documents defines how status in CAPI resources is going to evolve in the v1beta2 API version, with the goal of +This documents defines how status across all Cluster API resources is going to evolve in the v1beta2 API version, focusin on improving usability and consistency across different resources in CAPI and with the rest of the ecosystem. # Motivation -The Cluster API community recognize that nowadays Cluster API and Kubernetes users are rightfully focused on -building higher systems and great applications on top on those platforms, which is great. +The Cluster API community recognizes that nowadays most users are rightfully focused on +building higher level systems, offerings, and applications on these platforms. -However, as a consequence of this shifted focus, most of the users don’t have time to become deep expert of Cluster API -like the first wave of adopters, and also Cluster API maintainers would like they don’t have to. +However, as the focus shifted away, most of the users don’t have time to become deep experts on Cluster API. -The effect of the trend above is the blurring of the lines not only between different Cluster API components, but also -between Cluster API, core Kubernetes and a few other broadly adopted tools like Helm or Flux (and to some extents, -also with many others awesome tools in the ecosystem). +This trend is blurring the lines between different Cluster API components; between Cluster API and Kubernetes, and tools +like Helm, Flux, Argo, and so on. -This is why Cluster API status must become simpler to understand for users, and also more consistent not only across -different CAPI resources, but with Kubernetes core and ideally with the entire ecosystem. +This proposal focused on Cluster API's resource status which must become simpler to understand, more consistent with +Kubernetes, and ideally with the entire ecosystem. ### Goals @@ -77,9 +75,9 @@ different CAPI resources, but with Kubernetes core and ideally with the entire e - Make the concept of Machine readiness extensible, thus allowing providers or external system to inject their readiness checks. - Review and standardize the usage of the concept of availability across Cluster API resources. - Make the concept of Cluster Availability extensible, thus allowing providers or external system to inject their availability checks. -- Bubble up more information about both CP and worker Machines, ensuring consistent way across Cluster API resources. - - Standardize replica counters and bubble them up to the Cluster resource. - - Standardize control plane, MachineDeployment, Machine pool availability, and bubble them up to the Cluster resource. +- Bubble up more information about both CP and worker Machines, ensuring consistency across Cluster API resources. + - Standardize replica counters on control plane, MachineDeployment, Machine pool, and bubble them up to the Cluster resource. + - Bubble up conditions about machine readiness to control plane, MachineDeployment, Machine pool. - Introduce missing signals about connectivity to workload clusters, thus enabling to mark all the conditions depending on such connectivity being working with status Unknown after a certain amount of time. - Introduce a cleaner signal about Cluster API resources lifecycle transitions, e.g. scaling up or updating. @@ -101,9 +99,11 @@ Some of those changes could be considered straight forward, e.g. (and improve Conditions to provide similar or even a better info as a replacement). - K8s resources do not have a concept similar to "terminal feature" existing in Cluster API resources, and users approaching the project are struggling with this idea; in some cases also provider's implementers are struggling with it. - Accordingly, Cluster API resources are dropping `FailureReason` and `FailureMessage` fields (terminal failures should be surfaced using - conditions, like any other error/warning/message) -- Bubble up more information about both CP and worker Machines to the Cluster level. + Accordingly, Cluster API resources are dropping `FailureReason` and `FailureMessage` fields. + Like in K8s objects, terminal failures should be surfaced using conditions, with a well documented type/reason representing + a "terminal failure"; it is up to a consumers to treat them accordingly. +- Bubble up more information about Machines to the owner resource (control plane, MachineSet, MachineDeployment, MachinePool) + and then to the Cluster. Some other changes requires a little bit more context, which is provided in following paragraphs: @@ -148,14 +148,14 @@ e.g. Higher level abstractions in Cluster API are designed to remain operational for instance a Machine deployment is operational even if is rolling out. But the use cases above where hard to combine with the strict requirement to have all the conditions true, and -as a result today Cluster APi resources barely have conditions surfacing that lifecycle operations are happening, or where -those condition are defined they have a semantic which is not easy to understand, like e.g. 'Resized' or 'MachinesSpecUpToDate'. +as a result today Cluster API resources barely have conditions surfacing that lifecycle operations are happening, or where +those conditions are defined they have a semantic which is not easy to understand, like e.g. 'Resized' or 'MachinesSpecUpToDate'. e.g. when you look at higher level abstractions in Cluster API like Clusters, MachineDeployments and ControlPlanes, readiness might be confusing, because those resources usually accept a certain degree of not readiness, e.g. MachineDeployments are usually ok even if a few machine is not ready (up to MaxUnavailable). -In order to address thi problem, Cluster API is going to align to K8s API conventions. As a consequence, the “Ready” +In order to address this problem, Cluster API is going to align to K8s API conventions. As a consequence, the “Ready” condition won't be required anymore to exists on all the resources, nor when it exists, it will be required to include all the existing conditions in the ready summary. @@ -205,9 +205,19 @@ With this proposal Cluster API will close the gap with K8s API conventions in re - Cluster API is also dropping its own Condition type and start using metav1.Conditions from the Kubernetes API. The last point have also another implication, which is the removal of the Severity field which is currently used -to determine priority when merging conditions. +to determine priority when merging conditions into the ready summary. -TODO Document how we are going to replace severity (currently prototyping) +However, considering all the work to clean up and improve readiness and availability, now dropping the severity field +is not an issue anymore. Let's clarify this with an example: + +When Cluster API will compute Machine read there will be a very limited set of conditions +to merge (see [next paragraph](#machine-newconditions)); considering this, it will be probably simpler and more informative +for the users if we surface all the relevant messages instead of arbitrarily dropping some of them as we are doing +today by inferring merge priority from severity. + +In case someone wants a more sophisticated control over the process of merging conditions, the new version of the +condition utils in Cluster API, will allow developers to plug in custom functions to compute merge priority +for a condition, e.g. by looking at status, reason, time since the condition transitioned, etc. ### Changes to Machine resource @@ -446,16 +456,16 @@ TODO: check `FullyLabeledReplicas`, do we still need it? #### MachineSet (New)Conditions -| Condition | Note | -|------------------|----------------------------------------------------------------------------------------------------------------| -| `ReplicaFailure` | This condition surfaces issues on creating Machines, if any. | -| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | -| `ScalingUp` | True if available replicas < desired replicas | -| `ScalingDown` | True if replicas > desired replicas | -| `UpToDate` | True if all the Machines controlled by this MachineSet are up to date (replicas = upToDate replicas) | -| `Remediating` | True if there is at least one Machine controlled by this MachineSet is not passing health checks | -| `Deleted` | True if MachineSet is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | -| `Paused` | True if this MachineSet or the Cluster it belongs to are paused | +| Condition | Note | +|------------------|------------------------------------------------------------------------------------------------------------------| +| `ReplicaFailure` | This condition surfaces issues on creating a Machine replica in Kubernetes, if any. e.g. due to resource quotas. | +| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `UpToDate` | True if all the Machines controlled by this MachineSet are up to date (replicas = upToDate replicas) | +| `Remediating` | True if there is at least one Machine controlled by this MachineSet is not passing health checks | +| `Deleted` | True if MachineSet is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | +| `Paused` | True if this MachineSet or the Cluster it belongs to are paused | > To better evaluate proposed changes, below you can find the list of current MachineSet's conditions: > Ready, MachinesCreated, Resized, MachinesReady. @@ -539,7 +549,7 @@ Notes: | Condition | Note | |------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `Available` | True if the MachineDeployment has minimum availability according to parameters specified in the deployment strategy, e.g. If using RollingUpgrade strategy, availableReplicas must be greater or equal than desired replicas - MaxUnavailable replicas | -| `ReplicaFailure` | This condition surfaces issues on creating Machines controlled by this MachineDeployment, if any. | +| `ReplicaFailure` | This condition surfaces issues on creating a MachineSet replica in Kubernetes, if any. e.g. due to resource quotas. | | `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | | `ScalingUp` | True if available replicas < desired replicas | | `ScalingDown` | True if replicas > desired replicas | @@ -783,7 +793,7 @@ type ClusterAvailabilityGate struct { Notes: - Similarly to Pod's `ReadinessGates`, also Machine's `AvailabilityGates` accept only conditions with positive polarity; - The Cluster API project might revisit this in future to stay aligned with Kubernetes or if there are use cases justifying this change. + The Cluster API project might revisit this in the future to stay aligned with Kubernetes or if there are use cases justifying this change. - In future the Cluster API project might consider ways to make `AvailabilityGates` configurable at ClusterClass level, but this can be implemented as a follow-up. @@ -919,3 +929,12 @@ TODO ### Changes to Cluster API contract TODO + +## [WIP] Example use cases +NOTE: Let me know if you want to add more use cases. I will try to collect more too and add a brief explanation about how +each use case can be addressed with the improved status in CAPI resources + +As a cluster admin with MachineDeployment ownership I'd like to understand if my MD is performing a rolling upgrade and why by looking at the MD status/conditions +As a cluster admin with MachineDeployment ownership I'd like to understand why my MD rollout is blocked and why by looking at the MD status/conditions +As a cluster admin with MachineDeployment ownership I'd like to understand why Machines are failing to be available by looking at the MD status/conditions +As a cluster admin with MachineDeployment ownership I'd like to understand why Machines are stuck on deletion looking at the MD status/conditions From bdf28482d707391a475347d1dd43345c84a311d5 Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Wed, 24 Jul 2024 13:46:13 +0200 Subject: [PATCH 03/22] nits MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Stefan Büringer buringerst@vmware.com --- .../improve-status-in-CAPI-resources.md | 221 +++++++++--------- 1 file changed, 113 insertions(+), 108 deletions(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index 8d8a58bc12a3..a1f4f5e37eac 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -52,7 +52,7 @@ see-also: # Summary -This documents defines how status across all Cluster API resources is going to evolve in the v1beta2 API version, focusin on +This document defines how status across all Cluster API resources is going to evolve in the v1beta2 API version, focusing on improving usability and consistency across different resources in CAPI and with the rest of the ecosystem. # Motivation @@ -72,22 +72,22 @@ Kubernetes, and ideally with the entire ecosystem. - Review and standardize the usage of the concept of readiness across Cluster API resources. - Drop or amend improper usage of readiness - - Make the concept of Machine readiness extensible, thus allowing providers or external system to inject their readiness checks. + - Make the concept of Machine readiness extensible, thus allowing providers or external systems to inject their readiness checks. - Review and standardize the usage of the concept of availability across Cluster API resources. - - Make the concept of Cluster Availability extensible, thus allowing providers or external system to inject their availability checks. -- Bubble up more information about both CP and worker Machines, ensuring consistency across Cluster API resources. - - Standardize replica counters on control plane, MachineDeployment, Machine pool, and bubble them up to the Cluster resource. - - Bubble up conditions about machine readiness to control plane, MachineDeployment, Machine pool. + - Make the concept of Cluster availability extensible, thus allowing providers or external systems to inject their availability checks. +- Bubble up more information about both control plane and worker Machines, ensuring consistency across Cluster API resources. + - Standardize replica counters on control plane, MachineDeployment, MachinePool, and bubble them up to the Cluster resource. + - Bubble up conditions about Machine readiness to control plane, MachineDeployment, MachinePool. - Introduce missing signals about connectivity to workload clusters, thus enabling to mark all the conditions - depending on such connectivity being working with status Unknown after a certain amount of time. + depending on such connectivity with status Unknown after a certain amount of time. - Introduce a cleaner signal about Cluster API resources lifecycle transitions, e.g. scaling up or updating. -- Ensure everything in status can be used as a signal informing monitoring tools/automation on top of Cluster +- Ensure everything in status can be used as a signal informing monitoring tools/automation on top of Cluster API about lifecycle transitions/state of the Cluster and the underlying components as well. ### Non-Goals/Future Work - Resolving all the idiosyncrasies that exists in Cluster API, core Kubernetes, the rest of the ecosystem. - (Let’s stay focused in Cluster API and keep improving incrementally). + (Let’s stay focused on Cluster API and keep improving incrementally). ## Proposal @@ -96,16 +96,16 @@ This proposal groups a set of changes to status fields in Cluster API resources. Some of those changes could be considered straight forward, e.g. - K8s API conventions suggest to deprecate and remove `phase` fields from status, Cluster API is going to align to this recommendation - (and improve Conditions to provide similar or even a better info as a replacement). -- K8s resources do not have a concept similar to "terminal feature" existing in Cluster API resources, and users approaching - the project are struggling with this idea; in some cases also provider's implementers are struggling with it. + (and improve Conditions to provide similar or even better info as a replacement). +- K8s resources do not have a concept similar to "terminal failure" in Cluster API resources, and users approaching + the project are struggling with this idea. In some cases also provider's implementers are struggling with it. Accordingly, Cluster API resources are dropping `FailureReason` and `FailureMessage` fields. - Like in K8s objects, terminal failures should be surfaced using conditions, with a well documented type/reason representing - a "terminal failure"; it is up to a consumers to treat them accordingly. + Like in K8s objects, "terminal failures" should be surfaced using conditions, with a well documented type/reason representing + a "terminal failure"; it is up to consumers to treat them accordingly. There is no special treatment for these conditions within Cluster API. - Bubble up more information about Machines to the owner resource (control plane, MachineSet, MachineDeployment, MachinePool) and then to the Cluster. -Some other changes requires a little bit more context, which is provided in following paragraphs: +Some other changes require a little bit more context, which is provided in following paragraphs: - Review and standardize the usage of the concept of readiness and availability to align to K8s API conventions / conditions used in core K8s objects like `Pod`, `Node`, `Deployment`, `ReplicaSet` etc. @@ -119,7 +119,7 @@ over time; changes in this group will be detailed case by case in the following (so everywhere Ready is used for a Machine it always means the same thing) - Add a new condition monitoring the status of the connectivity to workload clusters (`RemoteConnectionProbe`). -In order to keep making progress on this proposal, the fist iteration will be focused on +In order to keep making progress on this proposal, the first iteration will be focused on: - Machines - MachineSets @@ -128,41 +128,41 @@ In order to keep making progress on this proposal, the fist iteration will be fo - KubeadmControlPlane (ControlPlanes) - Clusters -Other resources will be added as soon as there will be agreement on the general direction. +Other resources will be added as soon as there is agreement on the general direction. Overall, the union of all those changes, is expected to greatly improve status fields, conditions, replica counters and print columns. -Those improvements are expected to provide benefit to users interacting with the systems, using monitoring tools, and +Those improvements are expected to provide benefit to users interacting with the system, using monitoring tools, and building higher level systems or products on top of Cluster API. ### Readiness and Availability -The [condition CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20200506-conditions.md) in Cluster API introduced very strict requirements about “Ready” condition, mandating it -to exists on all resources and also mandating that Ready must be computed as the summary of all the to other existing +The [condition CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20200506-conditions.md) in Cluster API introduced very strict requirements about `Ready` conditions, mandating it +to exists on all resources and also mandating that `Ready` must be computed as the summary of all other existing conditions. -However, over time Cluster API maintainers recognized several limitations of the “one fit all”, strict approach. +However, over time Cluster API maintainers recognized several limitations of the “one fits all”, strict approach. -e.g. Higher level abstractions in Cluster API are designed to remain operational during lifecycle operations, -for instance a Machine deployment is operational even if is rolling out. +E.g., higher level abstractions in Cluster API are designed to remain operational during lifecycle operations, +for instance a MachineDeployment is operational even if is rolling out. -But the use cases above where hard to combine with the strict requirement to have all the conditions true, and +But the use cases above were hard to combine with the strict requirement to have all the conditions true, and as a result today Cluster API resources barely have conditions surfacing that lifecycle operations are happening, or where -those conditions are defined they have a semantic which is not easy to understand, like e.g. 'Resized' or 'MachinesSpecUpToDate'. +those conditions are defined they have a semantic which is not easy to understand, like e.g. `Resized` or `MachinesSpecUpToDate`. -e.g. when you look at higher level abstractions in Cluster API like Clusters, MachineDeployments and ControlPlanes, readiness +E.g., when you look at higher level abstractions in Cluster API like Clusters, MachineDeployments and ControlPlanes, readiness might be confusing, because those resources usually accept a certain degree of not readiness, e.g. MachineDeployments are -usually ok even if a few machine is not ready (up to MaxUnavailable). +usually ok even if a few Machines are not ready (up to `MaxUnavailable`). -In order to address this problem, Cluster API is going to align to K8s API conventions. As a consequence, the “Ready” -condition won't be required anymore to exists on all the resources, nor when it exists, it will be required to include -all the existing conditions in the ready summary. +In order to address this problem, Cluster API is going to align to K8s API conventions. As a consequence, the `Ready` +condition won't have to exist on all resources anymore. Nor when it exists, it will be required to include +all the existing conditions when calculating the `Ready` condition. As a consequence, we will continue to use the ready condition *only* where it makes sense, and with a well-defined semantic that conveys important information to the users (vs applying "blindly" the same formula everywhere). -The most important effect of this change is the definition of a new semantic for the Machine's “Ready” condition, that +The most important effect of this change is the definition of a new semantic for the Machine's `Ready` condition, that will now clearly represent the "machine can host workloads" (prior art Kubernetes nodes are ready when "node can host pods"). To improve the benefit of this change: @@ -170,19 +170,19 @@ To improve the benefit of this change: - This proposal is also changing contract fields where ready was used improperly to represent initial provisioning (k8s API conventions suggest to use ready only for long-running process). -All in all, Machine's Ready should be much more clear, consistent, intuitive after proposed changes. +All in all, Machine's Ready concept should be much more clear, consistent, intuitive after proposed changes. But there is more. -This proposal is also dropping Ready condition from higher level abstractions in Cluster API. +This proposal is also dropping the `Ready` condition from higher level abstractions in Cluster API. -Instead, where not already present, this proposal is introducing a new Available condition that better represents -the fact that those objects are operational even if there is a certain degree of not readiness in the system -or if lifecycle operations are happening (prior art Available condition in K8s Deployments). +Instead, where not already present, this proposal is introducing a new `Available` condition that better represents +the fact that those objects are operational even if there is a certain degree of not readiness / disruption in the system +or if lifecycle operations are happening (prior art `Available` condition in K8s Deployments). Last but not least: -- With the changes to the semantic of Ready and Available conditions, it is now possible to add conditions about - surfacing that lifecycle operations are happening, e.g. scaling up. +- With the changes to the semantic of `Ready` and `Available` conditions, it is now possible to add conditions to + surface ongoing lifecycle operations, e.g. scaling up. - As suggested by K8s API conventions, this proposal is also making sure all conditions are consistent and have uniform meaning across all resource types - Additionally, we are enforcing the same consistency for replica counters and other status fields. @@ -199,24 +199,24 @@ even confusing new (and old) contributors. With this proposal Cluster API will close the gap with K8s API conventions in regard to: - Polarity: Condition type names should make sense for humans; neither positive nor negative polarity can be recommended as a general rule (already implemented by [#10550](https://github.com/kubernetes-sigs/cluster-api/pull/10550)) -- Use of the Reason field is required (currently in Cluster API reasons is added only when condition are false) -- Controllers should apply their conditions to a resource the first time they visit the resource, even if the status is Unknown. +- Use of the `Reason` field is required (currently in Cluster API reasons is added only when condition are false) +- Controllers should apply their conditions to a resource the first time they visit the resource, even if the status is `Unknown`. (currently Cluster API controllers add conditions at different stages of the reconcile loops) -- Cluster API is also dropping its own Condition type and start using metav1.Conditions from the Kubernetes API. +- Cluster API is also dropping its own `Condition` type and will start using `metav1.Conditions` from the Kubernetes API. -The last point have also another implication, which is the removal of the Severity field which is currently used +The last point also has another implication, which is the removal of the `Severity` field which is currently used to determine priority when merging conditions into the ready summary. -However, considering all the work to clean up and improve readiness and availability, now dropping the severity field +However, considering all the work to clean up and improve readiness and availability, now dropping the `Severity` field is not an issue anymore. Let's clarify this with an example: -When Cluster API will compute Machine read there will be a very limited set of conditions -to merge (see [next paragraph](#machine-newconditions)); considering this, it will be probably simpler and more informative -for the users if we surface all the relevant messages instead of arbitrarily dropping some of them as we are doing -today by inferring merge priority from severity. +When Cluster API will compute Machine `Ready` there will be a very limited set of conditions +to merge (see [next paragraph](#machine-newconditions)). Considering this, it will be probably simpler and more informative +for users if we surface all relevant messages instead of arbitrarily dropping some of them as we are doing +today by inferring merge priority from the `Severity` field. In case someone wants a more sophisticated control over the process of merging conditions, the new version of the -condition utils in Cluster API, will allow developers to plug in custom functions to compute merge priority +condition utils in Cluster API will allow developers to plug in custom functions to compute merge priority for a condition, e.g. by looking at status, reason, time since the condition transitioned, etc. ### Changes to Machine resource @@ -231,25 +231,26 @@ Following changes are implemented to Machine's status: - Transition to new, improved, K8s API conventions aligned conditions Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); -After golang types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. +Below the Go types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. ```golang type MachineStatus struct { // Initialization provides observations of the Machine initialization process. - // NOTE: fields in this struct are part of the Cluster API contract and are used to orchestrate initial Machine provisioning. + // NOTE: Fields in this struct are part of the Cluster API contract and are used to orchestrate initial Machine provisioning. // The value of those fields is never updated after provisioning is completed. - // Use conditions to monitor the operational state of the Machine's BootstrapSecret. + // Use conditions to monitor the operational state of the Machine. // +optional Initialization *MachineInitializationStatus `json:"initialization,omitempty"` - // Represents the observations of a Machine's current state. + // Conditions represent the observations of a Machine's current state. + // +optional // +listType=map // +listMapKey=type Conditions []metav1.Condition `json:"conditions,omitempty"` // Other fields... - // NOTE: `Phase`, `LastUpdated`, `FailureReason`, `FailureMessage` fields won't be there anymore + // NOTE: `Phase`, `LastUpdated`, `FailureReason`, `FailureMessage`, `BootstrapReady`, `InfrastructureReady` fields won't be there anymore } // MachineInitializationStatus provides observations of the Machine initialization process. @@ -263,7 +264,7 @@ type MachineInitializationStatus struct { BootstrapSecretCreated bool `json:"bootstrapSecretCreated"` // InfrastructureProvisioned is true when the infrastructure provider reports that the Machine's infrastructure is fully provisioned. - // NOTE: this field is part of the Cluster API contract, and it is used to orchestrate initial Machine provisioning. + // NOTE: this field is part of the Cluster API contract, and it is used to orchestrate initial Machine provisioning. // The value of this field is never updated after provisioning is completed. // Use conditions to monitor the operational state of the Machine's infrastructure. // +optional @@ -271,19 +272,19 @@ type MachineInitializationStatus struct { } ``` -| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | -|-------------------------------|----------------------------------------------------------|--------------------------------------------| -| | `Initialization` (new) | `Initialization` | -| `BootstrapReady` | `Initialization.BootstrapSecretCreated` (renamed) | `Initialization.BootstrapSecretCreated` | -| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | -| | `BackCompatibilty` (new) | (removed) | -| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | -| `LastUpdated` (deprecated) | `BackCompatibilty.LastUpdated` (renamed) (deprecated) | (removed) | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExprimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| other fields... | other fields... | other fields... | +| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | +|--------------------------------|----------------------------------------------------------|--------------------------------------------| +| | `Initialization` (new) | `Initialization` | +| `BootstrapReady` | `Initialization.BootstrapSecretCreated` (renamed) | `Initialization.BootstrapSecretCreated` | +| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | +| | `BackCompatibilty` (new) | (removed) | +| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | +| `LastUpdated` (deprecated) | `BackCompatibilty.LastUpdated` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| other fields... | other fields... | other fields... | Notes: - The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). @@ -291,40 +292,41 @@ Notes: ##### Machine (New)Conditions -| Condition | Note | -|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `Available` | True if at the machine is Ready for at least MinReady seconds, as defined by the Machine's owner resource | -| `Ready` | True if Machine's `BootstrapSecretReady`, `InfrastructureReady`, `NodeHealthy` and `HealthCheckSucceeded` (if present) are true; if other conditions are defined in `spec.readinessGates` are defined, those conditions should be true as well for the Machine to be ready. | -| `UpToDate` | True if the Machine spec matches the spec of the Machine's owner resource, e.g KubeadmControlPlane or MachineDeployment | -| `BootstrapConfigReady` | Mirrors the corresponding condition from the Machine's BootstrapConfig resource | -| `InfrastructureReady` | Mirrors the corresponding condition from the Machine's Infrastructure resource | -| `NodeReady` | True if the Machine's Node is ready | -| `NodeHealthy` | True if the Machine's Node is ready and it does not report MemoryPressure, DiskPressure and PIDPressure | -| `HealthCheckSucceeded` | True if MHC instances targeting this machine reports the Machine is healthy according to the definition of healthy present in the spec of the Machine Health Check object | -| `OwnerRemediated` | | -| `Deleted` | True if Machine is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | -| `Paused` | True if the Machine or the Cluster it belongs to are paused | +| Condition | Note | +|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `Available` | True if at the machine is Ready for at least MinReady seconds, as defined by the Machine's minReadySeconds field | +| `Ready` | True if Machine's `BootstrapSecretReady`, `InfrastructureReady`, `NodeHealthy` and `HealthCheckSucceeded` (if present) are true; if other conditions are defined in `spec.readinessGates`, those conditions should be true as well for the Machine to be ready. | +| `UpToDate` | True if the Machine spec matches the spec of the Machine's owner resource, e.g KubeadmControlPlane or MachineDeployment | +| `BootstrapConfigReady` | Mirrors the corresponding condition from the Machine's BootstrapConfig resource | +| `InfrastructureReady` | Mirrors the corresponding condition from the Machine's Infrastructure resource | +| `NodeReady` | True if the Machine's Node is ready | +| `NodeHealthy` | True if the Machine's Node is ready and it does not report MemoryPressure, DiskPressure and PIDPressure | +| `HealthCheckSucceeded` | True if MHC instances targeting this machine report the Machine is healthy according to the definition of healthy present in the spec of the Machine Health Check object | +| `OwnerRemediated` | | +| `Deleted` | True if Machine is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | +| `Paused` | True if the Machine or the Cluster it belongs to are paused | > To better evaluate proposed changes, below you can find the list of current Machine's conditions: -> Ready, InfrastructureReady, NodeHealthy, PreDrainDeleteHookSucceeded, VolumeDetachSucceeded, DrainingSucceeded. +> Ready, InfrastructureReady, BootstrapReady, NodeHealthy, PreDrainDeleteHookSucceeded, VolumeDetachSucceeded, DrainingSucceeded. > Additionally: > - The MachineHealthCheck controller adds the HealthCheckSucceeded and the OwnerRemediated conditions. -> - The KubeadmControlPlane adds the ApiServerPodHealthy, ControllerManagerPodHealthy, SchedulerPodHealthy, EtcdPodHealthy, EtcdMemberHealthy conditions. +> - The KubeadmControlPlane adds the APIServerPodHealthy, ControllerManagerPodHealthy, SchedulerPodHealthy, EtcdPodHealthy, EtcdMemberHealthy conditions. Notes: -- This proposal introduces a mechanism for extending the meaning of Machine Readiness, `ReadinessGates` (see [changes to Machine.Spec](#machine-spec)). +- This proposal introduces a mechanism for extending the meaning of Machine readiness, `ReadinessGates` (see [changes to Machine.Spec](#machine-spec)). - While `Ready` is the main signal for machines operational state, higher level abstractions in Cluster API like e.g. MachineDeployment are relying on the concept of Machine's `Availability`, which can be seen as readiness + stability. In order to standardize this concept across different higher level abstractions, this proposal is surfacing `Availability` condition at Machine level as well as adding a new `MinReadySeconds` field (see [changes to Machine.Spec](#machine-spec)) that will be used to compute this condition. -- Similarly, this proposal is standardizing the concept of Machine's `UpToDate`, however in this case it will be up to +- Similarly, this proposal is standardizing the concept of Machine's `UpToDate` condition, however in this case it will be up to the Machine's owner controllers to set this condition. -- Conditions like `NodeReady` and `NodeHealthy` which depends on the connection to the remote cluster will take benefit - of the new `RemoteConnectionProbe` condition at cluster level (see [Cluster (New)Conditions](#cluster-newconditions)); - more specifically those condition should be set to `Unknown` after the cluster Probe fails +- Conditions like `NodeReady` and `NodeHealthy` which depend on the connection to the remote cluster will benefit + from the new `RemoteConnectionProbe` condition at cluster level (see [Cluster (New)Conditions](#cluster-newconditions)); + more specifically those condition should be set to `Unknown` after the cluster probe fails (or after whatever period is defined in the `--remote-conditions-grace-period` flag) -- `HealthCheckSucceeded` and `OwnerRemediated` (or `ExternalRemediationRequestAvailable`) are set by the MachineHealthCheck controller in case a resource instance targets the machine. +- `HealthCheckSucceeded` and `OwnerRemediated` (or `ExternalRemediationRequestAvailable`) conditions are set by the + MachineHealthCheck controller in case a MachineHealthCheck targets the machine. - KubeadmControlPlane also adds additional conditions to Machines, but those conditions are not included in the table above for sake of simplicity (however they are documented in the KubeadmControlPlane paragraph). @@ -332,23 +334,23 @@ TODO: think carefully at remote conditions becoming unknown, this could block a #### Machine Spec -Machine's spec is going to be improved to allow 3rd party to extend the semantic of the new Machine's `Ready` condition -as well to standardize the concept of Machine's `Availability`. +Machine's spec is going to be improved to allow 3rd party components to extend the semantic of the new Machine's `Ready` condition +as well as to standardize the concept of Machine's `Availability`. Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); -After golang types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. +Below the Go types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. ```go type MachineSpec struct { - - // MinReadySeconds is the minimum number of seconds for which a Node for a newly created machine should be ready before considering the replica available. + + // MinReadySeconds is the minimum number of seconds for which a Machine should be ready before considering the replica available. // Defaults to 0 (machine will be considered available as soon as the Node is ready) // +optional MinReadySeconds int32 `json:"minReadySeconds,omitempty"` // If specified, all readiness gates will be evaluated for Machine readiness. // A Machine is ready when `InfrastructureReady`, `NodeHealthy` and `HealthCheckSucceeded` (if present) are "True"; - // if other conditions are defined this field, those conditions should be "True" as well for the Machine to be ready. + // if other conditions are defined in this field, those conditions should be "True" as well for the Machine to be ready. // +optional // +listType=map // +listMapKey=conditionType @@ -409,7 +411,7 @@ Following changes are implemented to MachineSet's status: - Transition to new, improved, K8s API conventions aligned conditions Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); -After golang types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. +Below the Go types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. ```golang type MachineSetStatus struct { @@ -463,7 +465,7 @@ TODO: check `FullyLabeledReplicas`, do we still need it? | `ScalingUp` | True if available replicas < desired replicas | | `ScalingDown` | True if replicas > desired replicas | | `UpToDate` | True if all the Machines controlled by this MachineSet are up to date (replicas = upToDate replicas) | -| `Remediating` | True if there is at least one Machine controlled by this MachineSet is not passing health checks | +| `Remediating` | True if there is at least one Machine controlled by this MachineSet that is not passing health checks | | `Deleted` | True if MachineSet is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | | `Paused` | True if this MachineSet or the Cluster it belongs to are paused | @@ -576,7 +578,7 @@ Notes: | `AGE` | `AGE` | | `VERSION` | `VERSION` | -TODO: consider if to add Machine deployment `AVAILABLE`, but we should find a way to differentiate from `AVAILABLE` replicas +TODO: consider if to add MachineDeployment `AVAILABLE`, but we should find a way to differentiate from `AVAILABLE` replicas Stefan +1 to have AVAILABLE, not sure if we can have two columns with the same header (*) visible only when using `kubectl get -o wide` @@ -595,7 +597,7 @@ Following changes are implemented to Cluster's status: - Surface information about ControlPlane connection heartbeat (see new conditions) Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); -After golang types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. +Below the Go types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. ```golang type ClusterStatus struct { @@ -618,7 +620,7 @@ type ClusterStatus struct { // Workers groups all the observations about Cluster's Workers current state. // +optional - Workers ClusterControlPlaneStatus `json:"workers,omitempty"` + Workers *ClusterControlPlaneStatus `json:"workers,omitempty"` // other fields } @@ -736,7 +738,8 @@ notes: |---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `Available` | True if Cluster `RemoteConnectionProbe` is true, if Cluster's control plane `Available` condition is true, if all MachineDeployment and MachinePool's `Available` condition are true; if conditions are defined in `spec.availabilityGates`, those conditions should be true as well for the Cluster to be available. | | `ControlPlaneInitialized` | True when the Cluster's control plane is functional enough to accept requests. This information is usually used as a signal for starting all the provisioning operations that depends on a functional API server, but do not require a full HA control plane to exists. | -| `RemoteConnectionProbe` | True when control plane can be reached; in case of connection problems, the condition turns to false only if the the cluster cannot be reached for 40s after the first connection problem is detected (or whatever period is defined in the `--cluster-probe-grace-period` flag) the cluster cannot be reached | +| `RemoteConnectionProbe` | True when control plane can be reached; in case of connection problems, the condition turns to false only if the the cluster cannot be reached for 40s after the first connection problem is detected (or whatever period is defined in the `--remote-connection-grace-period` flag) the cluster cannot be reached | +| `InfrastructureReady` | Mirror of Cluster's infrastructure `Ready` condition | | `ControlPlaneAvailable` | Mirror of Cluster's control plane `Available` condition | | `WorkersAvaiable` | Summary of MachineDeployment and MachinePool's `Available` condition | | `TopologyReconciled` | | @@ -753,8 +756,8 @@ notes: Notes: - `TopologyReconciled` exists only for classy clusters; this condition is managed by the topology reconciler. - Cluster API is going to maintain a `lastRemoteConnectionProbeTime` and use it in combination with the - `--cluster-probe-grace-period` flag to avoid flakes on `RemoteConnectionProbe`. -- Similarly to `lastHeartbeatTime` in Kubernetes conditions, also `lastControlPlaneProbeTime` will not surface on the + `--remote-connection-grace-period` flag to avoid flakes on `RemoteConnectionProbe`. +- Similarly to `lastHeartbeatTime` in Kubernetes conditions, also `lastRemoteConnectionProbeTime` will not surface on the API in order to avoid costly, continuous reconcile events. #### Cluster Spec @@ -771,10 +774,12 @@ After golang types, you can find a summary table that also shows how changes wil ```golang type ClusterSpec struct { - // If specified, all availability gates will be evaluated for Cluster readiness. - // A Cluster is available when True if Cluster `ControlHeartbeat` and `TopologyReconciled` are true, if Cluster's - // control plane `Available` condition is true, if all worker resource's `Available` condition are true; - // if conditions are defined in `spec.availabilityGates` are defined, those conditions should be true as well. + // AvailabilityGates specifies additional conditions to include when evaluating Cluster availability. + // A Cluster is available if: + // * Cluster's `RemoteConnectionProbe` and `TopologyReconciled` conditions are true and + // * the control plane `Available` condition is true and + // * all worker resource's `Available` conditions are true and + // * all conditions defined in AvailabilityGates are true as well. // +optional // +listType=map // +listMapKey=conditionType @@ -824,7 +829,7 @@ Notes: #### KubeadmControlPlane Status -Following changes are implemented to MachineSet's status: +Following changes are implemented to KubeadmControlPlane's status: - TODO: figure out what to do with contract fields + conditions - Update `ReadyReplicas` counter to use the same semantic Machine's `Ready` condition and add missing `UpToDateReplicas`. @@ -832,7 +837,7 @@ Following changes are implemented to MachineSet's status: - Transition to new, improved, K8s API conventions aligned conditions Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); -After golang types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. +Below the Go types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. ```golang type KubeadmControlPlaneStatus struct { From a147b122698bd5f55635c5bc23ce237479a8bdc8 Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Wed, 24 Jul 2024 13:49:52 +0200 Subject: [PATCH 04/22] small improvements to the API and to replica counters transition MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Stefan Büringer buringerst@vmware.com --- .../improve-status-in-CAPI-resources.md | 83 +++++++++++-------- 1 file changed, 48 insertions(+), 35 deletions(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index a1f4f5e37eac..d5c40b08672b 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -419,12 +419,17 @@ type MachineSetStatus struct { // The number of ready replicas for this MachineSet. A machine is considered ready when Machine's Ready condition is true. // +optional ReadyReplicas int32 `json:"readyReplicas"` + + // The number of available replicas for this MachineSet. A machine is considered available when Machine's Available condition is true. + // +optional + AvailableReplicas int32 `json:"availableReplicas"` // The number of up-to-date replicas for this MachineSet. A machine is considered up-to-date when Machine's UpToDate condition is true. // +optional UpToDateReplicas int32 `json:"upToDateReplicas"` // Represents the observations of a MachineSet's current state. + // +optional // +listType=map // +listMapKey=type Conditions []metav1.Condition `json:"conditions,omitempty"` @@ -434,17 +439,19 @@ type MachineSetStatus struct { } ``` -| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | -|----------------------------------|----------------------------------------------------------|-------------------------------------| -| | `BackCompatibilty` (new) | (removed) | -| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | -| `ExprimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExprimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| `UpToDateReplicas` (new) | `UpToDateReplicas` | `UpToDateReplicas` | -| other fields... | other fields... | other fields... | +| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | +|--------------------------------------|-------------------------------------------------------------|-------------------------------------| +| `ExprimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | +| `ExprimentalAvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | +| | `BackCompatibilty` (new) | (removed) | +| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | +| `AvailableReplicas` (deprecated) | `BackCompatibilty.AvailableReplicas` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| `UpToDateReplicas` (new) | `UpToDateReplicas` | `UpToDateReplicas` | +| other fields... | other fields... | other fields... | Notes: - The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). @@ -518,11 +525,20 @@ After golang types, you can find a summary table that also shows how changes wil ```golang type MachineDeploymentStatus struct { + // The number of ready replicas for this MachineDeployment. A machine is considered ready when Machine's Ready condition is true. + // +optional + ReadyReplicas int32 `json:"readyReplicas"` + + // The number of available replicas for this MachineDeployment. A machine is considered available when Machine's Available condition is true. + // +optional + AvailableReplicas int32 `json:"availableReplicas"` + // The number of up-to-date replicas targeted by this deployment. // +optional UpToDateReplicas int32 `json:"upToDateReplicas"` // Represents the observations of a MachineDeployment's current state. + // +optional // +listType=map // +listMapKey=type Conditions []metav1.Condition `json:"conditions,omitempty"` @@ -532,15 +548,20 @@ type MachineDeploymentStatus struct { } ``` -| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | -|-------------------------------|----------------------------------------------------------|-------------------------------------| -| `UpdatedReplicas` | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | -| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExprimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| other fields... | other fields... | other fields... | +| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | +|--------------------------------------|-------------------------------------------------------------|-------------------------------------| +| `UpdatedReplicas` | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | +| `ExprimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | +| `ExprimentalAvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | +| | `BackCompatibilty` (new) | (removed) | +| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | +| `AvailableReplicas` (deprecated) | `BackCompatibilty.AvailableReplicas` (renamed) (deprecated) | (removed) | +| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| other fields... | other fields... | other fields... | Notes: - The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). @@ -610,13 +631,14 @@ type ClusterStatus struct { Initialization *MachineInitializationStatus `json:"initialization,omitempty"` // Represents the observations of a Cluster's current state. + // +optional // +listType=map // +listMapKey=type Conditions []metav1.Condition `json:"conditions,omitempty"` // ControlPlane groups all the observations about Cluster's ControlPlane current state. // +optional - ControlPlane ClusterControlPlaneStatus `json:"controlPlane,omitempty"` + ControlPlane *ClusterControlPlaneStatus `json:"controlPlane,omitempty"` // Workers groups all the observations about Cluster's Workers current state. // +optional @@ -665,10 +687,6 @@ type ClusterControlPlaneStatus struct { // Total number of available control plane machines in this cluster. // +optional AvailableReplicas int32 `json:"availableReplicas"` - - // Total number of unavailable control plane machines in this cluster. - // +optional - UnavailableReplicas int32 `json:"unavailableReplicas"` } // WorkersPlaneStatus groups all the observations about workers current state. @@ -692,10 +710,6 @@ type WorkersPlaneStatus struct { // Total number of available worker machines in this cluster. // +optional AvailableReplicas int32 `json:"availableReplicas"` - - // Total number of unavailable worker machines in this cluster. - // +optional - UnavailableReplicas int32 `json:"unavailableReplicas"` } ``` @@ -718,14 +732,12 @@ type WorkersPlaneStatus struct { | `ControlPlane.ReadyReplicas` (new) | `ControlPlane.ReadyReplicas` | `ControlPlane.ReadyReplicas` | | `ControlPlane.UpToDateReplicas` (new) | `ControlPlane.UpToDateReplicas` | `ControlPlane.UpToDateReplicas` | | `ControlPlane.AvailableReplicas` (new) | `ControlPlane.AvailableReplicas` | `ControlPlane.AvailableReplicas` | -| `ControlPlane.UnavailableReplicas` (new) | `ControlPlane.UnavailableReplicas` | `ControlPlane.UnavailableReplicas` | | `Workers` (new) | `Workers` | `Workers` | | `Workers.DesiredReplicas` (new) | `Workers.DesiredReplicas` | `Workers.DesiredReplicas` | | `Workers.Replicas` (new) | `Workers.Replicas` | `Workers.Replicas` | | `Workers.ReadyReplicas` (new) | `Workers.ReadyReplicas` | `Workers.ReadyReplicas` | | `Workers.UpToDateReplicas` (new) | `Workers.UpToDateReplicas` | `Workers.UpToDateReplicas` | | `Workers.AvailableReplicas` (new) | `Workers.AvailableReplicas` | `Workers.AvailableReplicas` | -| `Workers.UnavailableReplicas` (new) | `Workers.UnavailableReplicas` | `Workers.UnavailableReplicas` | | other fields... | other fields... | other fields... | notes: @@ -776,9 +788,9 @@ After golang types, you can find a summary table that also shows how changes wil type ClusterSpec struct { // AvailabilityGates specifies additional conditions to include when evaluating Cluster availability. // A Cluster is available if: - // * Cluster's `RemoteConnectionProbe` and `TopologyReconciled` conditions are true and - // * the control plane `Available` condition is true and - // * all worker resource's `Available` conditions are true and + // * Cluster's `RemoteConnectionProbe` and `TopologyReconciled` conditions are true and + // * the control plane `Available` condition is true and + // * all worker resource's `Available` conditions are true and // * all conditions defined in AvailabilityGates are true as well. // +optional // +listType=map @@ -857,6 +869,7 @@ type KubeadmControlPlaneStatus struct { UpToDateReplicas int32 `json:"upToDateReplicas"` // Represents the observations of a ControlPlane's current state. + // +optional // +listType=map // +listMapKey=type Conditions []metav1.Condition `json:"conditions,omitempty"` @@ -889,7 +902,7 @@ TODO: double check usages of status.ready. | `Available` | True if the control plane can be reached and there is etcd quorum, and `CertificatesAvailable` is true | | `CertificatesAvailable` | True if all the cluster certificates exist. | | `ReplicaFailure` | This condition surfaces issues on creating Machines controlled by this KubeadmControlPlane, if any. | -| `Initialized` | True ControlPlaneComponentsHealthy. | +| `Initialized` | True if ControlPlaneComponentsHealthy. | | `ControlPlaneComponentsHealthy` | This condition surfaces detail of issues on the controlled machines, if any. | | `EtcdClusterHealthy` | This condition surfaces detail of issues on the controlled machines, if any. | | `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | From ca54d2cda3e391c3933c68ba39d699a905e87f95 Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Sat, 27 Jul 2024 15:49:03 +0200 Subject: [PATCH 05/22] Add changes to MachinePool resources --- .../improve-status-in-CAPI-resources.md | 133 +++++++++++++++++- 1 file changed, 132 insertions(+), 1 deletion(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index d5c40b08672b..ce489b2a06d3 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -942,7 +942,138 @@ Notes: ### Changes to MachinePool resource -TODO +#### MachinePool Status + +Following changes are implemented to MachinePool's status: + +- Disambiguate the usage of the ready term by renaming fields used for the initial provisioning workflow +- Update `ReadyReplicas` counter to use the same semantic Machine's `Ready` condition and add missing `UpToDateReplicas`. +- Align Machine pools replica counters to other CAPI resources +- Align to K8s API conventions by deprecating `Phase` +- Remove `FailureReason` and `FailureMessage` to get rid of the confusing concept of terminal failures +- Transition to new, improved, K8s API conventions aligned conditions + +Below you can find the relevant fields in MachinePool Status v1beta2, after v1beta1 removal (end state); +Below the Go types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. + +```golang +type MachinePoolStatus struct { + + // The number of ready replicas for this MachinePool. A machine is considered ready when Machine's Ready condition is true. + // +optional + ReadyReplicas int32 `json:"readyReplicas"` + + // The number of available replicas for this MachinePool. A machine is considered available when Machine's Available condition is true. + // +optional + AvailableReplicas int32 `json:"availableReplicas"` + + // The number of up-to-date replicas targeted by this MachinePool. + // +optional + UpToDateReplicas int32 `json:"upToDateReplicas"` + + // Initialization provides observations of the MachinePool initialization process. + // NOTE: Fields in this struct are part of the Cluster API contract and are used to orchestrate initial MachinePool provisioning. + // The value of those fields is never updated after provisioning is completed. + // Use conditions to monitor the operational state of the MachinePool. + // +optional + Initialization *MachineInitializationStatus `json:"initialization,omitempty"` + + // Conditions represent the observations of a MachinePool's current state. + // +optional + // +listType=map + // +listMapKey=type + Conditions []metav1.Condition `json:"conditions,omitempty"` + + // Other fields... + // NOTE: `Phase`, `FailureReason`, `FailureMessage`, `BootstrapReady`, `InfrastructureReady` fields won't be there anymore +} + +// MachinePoolInitializationStatus provides observations of the MachinePool initialization process. +type MachinePoolInitializationStatus struct { + + // BootstrapDataSecretCreated is true when the bootstrap provider reports that the MachinePool's boostrap data secret is created. + // NOTE: this field is part of the Cluster API contract, and it is used to orchestrate initial MachinePool provisioning. + // The value of this field is never updated after provisioning is completed. + // Use conditions to monitor the operational state of the MachinePool's BootstrapSecret. + // +optional + BootstrapDataSecretCreated bool `json:"bootstrapDataSecretCreated"` + + // InfrastructureProvisioned is true when the infrastructure provider reports that the MachinePool's infrastructure is fully provisioned. + // NOTE: this field is part of the Cluster API contract, and it is used to orchestrate initial MachinePool provisioning. + // The value of this field is never updated after provisioning is completed. + // Use conditions to monitor the operational state of the MachinePool's infrastructure. + // +optional + InfrastructureProvisioned bool `json:"infrastructureProvisioned"` +} +``` + +| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | +|--------------------------------------|-------------------------------------------------------------|---------------------------------------------------| +| | `Initialization` (new) | `Initialization` | +| `BootstrapReady` | `Initialization.BootstrapDataSecretCreated` (renamed) | `Initialization.BootstrapDataSecretCreated` | +| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | +| `UpdatedReplicas` (new) | `UpToDateReplicas` | `UpToDateReplicas` | +| `ExprimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | +| `ExprimentalAvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | +| | `BackCompatibilty` (new) | (removed) | +| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | +| `AvailableReplicas` (deprecated) | `BackCompatibilty.AvailableReplicas` (renamed) (deprecated) | (removed) | +| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| other fields... | other fields... | other fields... | + +Notes: +- The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). + Fields in this struct are used for supporting down conversions, thus providing users relying on v1beta1 APIs additional buffer time to pick up the new changes. + +##### MachinePool (New)Conditions + +| Condition | Note | +|------------------------|-------------------------------------------------------------------------------------------------------------------| +| `Available` | True when `InfrastructureReady` and available replicas >= desired replicas (see notes below) | +| `BootstrapConfigReady` | Mirrors the corresponding condition from the MachinePool's BootstrapConfig resource | +| `InfrastructureReady` | Mirrors the corresponding condition from the MachinePool's Infrastructure resource | +| `ReplicaFailure` | This condition surfaces issues on creating a Machines replica in Kubernetes, if any. e.g. due to resource quotas. | +| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `UpToDate` | True if all the Machines controlled by this MachinePool are up to date (replicas = upToDate replicas) | +| `Remediating` | True if there is at least one machine controlled by this MachinePool is not passing health checks | +| `Deleted` | True if MachinePool is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | +| `Paused` | True if this MachinePool or the Cluster it belongs to are paused | + +> To better evaluate proposed changes, below you can find the list of current MachinePool's conditions: +> Ready, BootstrapReady, InfrastructureReady, ReplicasReady. + +Notes: +- Conditions like `ScalingUp`, `ScalingDown`, `Remediating` are intended to provide visibility on the corresponding lifecycle operation. + e.g. If the scaling up operation is being blocked by a machine having issues while deleting, this should surface with a reason/message in + the `ScalingDown` condition. +- As of today MachinePool does not have a notion similar to MachineDeployment's MaxUnavailability. + +#### MachinePool Print columns + +| Current | To be | +|-------------------|------------------------| +| `NAME` | `NAME` | +| `CLUSTER` | `CLUSTER` | +| `DESIRED` (*) | `PAUSED` (new) (*) | +| `REPLICAS` | `DESIRED` | +| `PHASE` (deleted) | `CURRENT` (*) | +| `AGE` | `READY` | +| `VERSION` | `AVAILABLE` (new) | +| | `UP-TO-DATE` (renamed) | +| | `AGE` | +| | `VERSION` | + +(*) visible only when using `kubectl get -o wide` + +Notes: +- Print columns are not subject to any deprecation rule, so it will be possible to iteratively improve them without waiting for the next API version. +- During the implementation we are going to verify if the resulting layout and eventually make final adjustments to the column list. ### Changes to Cluster API contract From 3e95a56a70da40f5ffdeb31290636493a96309a3 Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Sat, 27 Jul 2024 15:51:05 +0200 Subject: [PATCH 06/22] Add Changes to Cluster API contract --- .../improve-status-in-CAPI-resources.md | 153 +++++++++++++++++- 1 file changed, 151 insertions(+), 2 deletions(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index ce489b2a06d3..45e789ca8743 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -84,10 +84,12 @@ Kubernetes, and ideally with the entire ecosystem. - Ensure everything in status can be used as a signal informing monitoring tools/automation on top of Cluster API about lifecycle transitions/state of the Cluster and the underlying components as well. -### Non-Goals/Future Work +### Non-Goals - Resolving all the idiosyncrasies that exists in Cluster API, core Kubernetes, the rest of the ecosystem. (Let’s stay focused on Cluster API and keep improving incrementally). +- To change how the Cluster API contract with infrastructure, bootstrap and control providers currently works + (by using status fields). ## Proposal @@ -1077,7 +1079,154 @@ Notes: ### Changes to Cluster API contract -TODO +The Cluster API contract defines a set of rules a provider is expected to comply with in order to interact with Cluster API. + +When the v1beta2 API will be released (tentative Q1 2025), also the Cluster API contract will be bumped to v1beta2. + +As defined at the beginning of this document, this proposal is not going to change how the Cluster API contract +with infrastructure, bootstrap and control providers currently works (by using status fields). + +Similarly, this proposal is not going to change the fact that the Cluster API contract do not require providers to implement +conditions, even if this is recommended because conditions greatly improve user's experience. + +However, this proposal is introducing a few changes into the v1beta2 version of the Cluster API contract in order to: +- Disambiguate the usage of the ready term by renaming fields used for the initial provisioning workflow +- Remove `failureReason` and `failureMessage`. + +What is worth to notice is that for the first time in the history of the project, this proposal is introducing +a mechanism that allows providers to adapt to new contract incrementally, more specifically: + +- Providers won't be required to synchronize their changes to adapt to the Cluster API v1beta2 contract with the + Cluster API's v1beta2 release. + +- Each provider can implement changes described in the following paragraphs at its own pace, but the transition + _must be completed_ before v1beta1 removal (tentative Q1 2026). + +- Starting from the CAPI release when v1beta1 removal will happen (tentative Q1 2026), providers which are implementing + the v1beta1 contract will stop to work (they will work only with older versions of Cluster API). + +Additionally: + +- Providers implementing conditions won't be required to do the transition from custom Cluster API custom Condition type + to Kubernetes metav1.Conditions type (but this transition is recommended because it improves the consistency of each provider + with Kubernetes, Cluster API, the ecosystem). + +- However, providers choosing to keep using Cluster API custom conditions should be aware that starting from the + CAPI release when v1beta1 removal will happen (tentative Q1 2026), the Cluster API project will remove the + cluster API condition type, the `util\conditions` package, the code handling conditions in `util\patch.Helper`, + everything related to custom cluster API condition type. + (in other words, Cluster API custom condition must be replaced by provider's own custom conditions). + +#### Contract for infrastructure providers + +Note: given that the contract only defines expected names for fields in a resources at yaml/json level, we are +using those in this paragraph (instead of golang field names). + +##### InfrastructureCluster + +Following changes are planned for the contract for the InfrastructureCluster resource: + +- Disambiguate the usage of the ready term by renaming fields used for the initial provisioning workflow + - Rename `status.ready` into `status.initialization.provisioned`. +- Remove `failureReason` and `failureMessage`. + +| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | +|-----------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------| +| `status.ready`, required | `status.ready` (deprecated), one of `status.ready` or `status.initialization.provisioned` required | (removed) | +| | `status.initialization.provisioned` (new), one of `status.ready` or `status.initialization.provisioned` required | `status.initialization.provisioned` | +| `status.conditions[Ready]`, optional with fall back on `status.ready` | `status.conditions[Ready]`, optional with fall back on `status.ready` or `status.initialization.provisioned` | `status.conditions[Ready]`, optional with fall back on `status.initialization.provisioned` | +| `status.failureReason`, optional | `status.failureReason` (deprecated), optional | (removed) | +| `status.failureMessage`, optional | `status.failureMessage` (deprecated), optional | (removed) | +| other fields/rules... | other fields/rules... | | + +Notes: +- InfrastructureCluster's `status.initialization.provisioned` will surface into Cluster's `status.initialization.infrastructureProvisioned` field. +- InfrastructureCluster's `status.initialization.provisioned` must signal the completion of the initial provisioning of the cluster infrastructure. + The value of this field should never be updated after provisioning is completed, and Cluster API will ignore any changes to it. +- InfrastructureCluster's `status.conditions[Ready]` will surface into Cluster's `status.conditions[InfrastructureReady]` condition. +- InfrastructureCluster's `status.conditions[Ready]` must surface issues during the entire lifecycle of the InfrastructureCluster + (both during initial InfrastructureCluster provisioning and after the initial provisioning is completed). + +##### InfrastructureMachine + +Following changes are planned for the contract for the InfrastructureMachine resource: + +- Disambiguate the usage of the ready term by renaming fields used for the initial provisioning workflow + - Rename `status.ready` into `status.initialization.provisioned`. +- Remove `failureReason` and `failureMessage`. + +| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | +|-----------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------| +| `status.ready`, required | `status.ready` (deprecated), one of `status.ready` or `status.initialization.provisioned` required | (removed) | +| | `status.initialization.provisioned` (new), one of `status.ready` or `status.initialization.provisioned` required | `status.initialization.provisioned` | +| `status.conditions[Ready]`, optional with fall back on `status.ready` | `status.conditions[Ready]`, optional with fall back on `status.ready` or `status.initialization.provisioned` | `status.conditions[Ready]`, optional with fall back on `status.initialization.provisioned` | +| `status.failureReason`, optional | `status.failureReason` (deprecated), optional | (removed) | +| `status.failureMessage`, optional | `status.failureMessage` (deprecated), optional | (removed) | +| other fields/rules... | other fields/rules... | | + +Notes: +- InfrastructureMachine's `status.initialization.provisioned` will surface into Machine's `status.initialization.infrastructureProvisioned` field. +- InfrastructureMachine's `status.initialization.provisioned` must signal the completion of the initial provisioning of the machine infrastructure. + The value of this field should never be updated after provisioning is completed, and Cluster API will ignore any changes to it. +- InfrastructureMachine's `status.conditions[Ready]` will surface into Machine's `status.conditions[InfrastructureReady]` condition. +- InfrastructureMachine's `status.conditions[Ready]` must surface issues during the entire lifecycle of the Machine + (both during initial InfrastructureCluster provisioning and after the initial provisioning is completed). + +#### Contract for bootstrap providers + +Following changes are planned for the contract for the BootstrapConfig resource: + +- Disambiguate the usage of the ready term by renaming fields used for the initial provisioning workflow + - Rename `status.ready` into `status.initialization.dataSecretCreated`. +- Remove `failureReason` and `failureMessage`. + +| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | +|-----------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------| +| `status.ready`, required | `status.ready` (deprecated), one of `status.ready` or `status.initialization.dataSecretCreated`, required | (removed) | +| | `status.initialization.dataSecretCreated` (new), one of `status.ready` or `status.initialization.dataSecretCreated`, required | `status.initialization.dataSecretCreated`, required | +| `status.conditions[Ready]`, optional with fall back on `status.ready` | `status.conditions[Ready]`, optional with fall back on `status.ready` or `status.initialization.dataSecretCreated` set | `status.conditions[Ready]`, optional with fall back on `status.initialization.DataSecretCreated` set | +| `status.failureReason`, optional | `status.failureReason` (deprecated), optional | (removed) | +| `status.failureMessage`, optional | `status.failureMessage` (deprecated), optional | (removed) | +| other fields/rules... | other fields/rules... | | + +Notes: +- BootstrapConfig's `status.initialization.dataSecretCreated` will surface into Machine's `status.initialization.BootstrapDataSecretCreated` field. +- BootstrapConfig's `status.initialization.dataSecretCreated` must signal the completion of the initial provisioning of the bootstrap data secret. + The value of this field should never be updated after provisioning is completed, and Cluster API will ignore any changes to it. +- BootstrapConfig's `status.conditions[Ready]` will surface into Machine's `status.conditions[BootstrapConfigReady]` condition. +- BootstrapConfig's `status.conditions[Ready]` must surface issues during the entire lifecycle of the BootstrapConfig + (both during initial InfrastructureCluster provisioning and after the initial provisioning is completed). + +#### Contract for control plane Providers + +Following changes are planned for the contract for the ControlPlane resource: + +- Disambiguate the usage of the ready term by renaming fields used for the initial provisioning workflow + - Remove `status.ready` (`status.ready` is a redundant signal of the control plane being initialized). + - Rename `status.initialized` into `status.initialization.controlPlaneInitialized`. +- Remove `failureReason` and `failureMessage`. + +| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | +|-----------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------| +| `status.ready`, required | `status.ready` (deprecated), one of `status.ready` or `status.initialization.controlPlaneInitialized` required | (removed) | +| `status.initialized`, required | `status.initialization.controlPlaneInitialized` (renamed), one of `status.ready` or `status.initialization.controlPlaneInitialized` required | `status.initialization.controlPlaneInitialized`, required | +| `status.conditions[Ready]`, optional with fall back on `status.ready` | `status.backCompatibilty.conditions[Ready]` (renamed, deprecated), optional with fall back on `status.ready` or `status.Initializiation.ControlPlaneInitialized` set | (removed) | +| | `status.conditions[Available]` (new), optional with fall back optional with fall back on `status.ready` or `status.Initializiation.ControlPlaneInitialized` set | `status.conditions[Available]`, optional with fall back on `status.initializiation.controlPlaneInitialized` | +| `status.failureReason`, optional | `status.failureReason` (deprecated), optional | (removed) | +| `status.failureMessage`, optional | `status.failureMessage` (deprecated), optional | (removed) | +| other fields/rules... | other fields/rules... | | + +Notes: +- ControlPlane's `status.initialization.controlPlaneInitialized` will surface into Cluster's `staus.initialization.controlPlaneInitialized` field; also, + the fact that the control plane is available to receive requests will be recorded in Cluster's `status.conditions[ControlPlaneInitialized]` condition. + The value of this field should never be updated after provisioning is completed, and Cluster API will ignore any changes to it. +- The new ControlPlane's `status.conditions[Available]` condition must surface control plane availability, e.g. the ability to + accept and process API server call, having etcd quorum etc. +- It is up to each control plane provider to determine what could impact the overall availability in their own + specific control plane implementation. +- As a general guideline, control plane providers implementing solutions with redundant instances of Kubernetes control plane components, + should not consider the temporary unavailability of one of those instances as relevant for the overall control plane availability. + e.g. one kube-apiserver over three down, should not impact the overall control plane availability. ## [WIP] Example use cases NOTE: Let me know if you want to add more use cases. I will try to collect more too and add a brief explanation about how From 295e32e9ca2c80637ebba645d46dd11bd44368a0 Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Sat, 27 Jul 2024 15:54:54 +0200 Subject: [PATCH 07/22] Add missing paragraphs from the proposal template --- .../improve-status-in-CAPI-resources.md | 100 ++++++++++++++++-- 1 file changed, 93 insertions(+), 7 deletions(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index 45e789ca8743..6198b4dd41a4 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -1,15 +1,23 @@ - --- title: Proposal Template authors: - "@fabriziopandini" reviewers: -- "add" +- "@neolit123" +- "@enxebre" +- "@JoelSpeed" +- "@vincepri" +- "@sbueringer" +- "@chrischdi" +- "@peterochodo" +- "@zjs" creation-date: 2024-07-17 -last-updated: 2024-07-17 -status: provisional +last-updated: 2024-07-27 +status: implementable see-also: -- ... +- [proposal about custom Cluster API conditions (superseed by this document)](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20200506-conditions.md#the-ready-condition) +- [Kubernetes API guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md) +- [Kubernetes API deprecation rules](https://kubernetes.io/docs/reference/using-api/deprecation-policy/#fields-of-rest-resources) --- # Improving status in CAPI resources @@ -21,7 +29,7 @@ see-also: - [Summary](#summary) - [Motivation](#motivation) - [Goals](#goals) - - [Non-Goals/Future Work](#non-goalsfuture-work) + - [Non-Goals](#non-goals) - [Proposal](#proposal) - [Readiness and Availability](#readiness-and-availability) - [Transition to K8s API conventions aligned conditions](#transition-to-k8s-api-conventions-aligned-conditions) @@ -48,7 +56,21 @@ see-also: - [KubeadmControlPlane (New)Conditions](#kubeadmcontrolplane-newconditions) - [KubeadmControlPlane Print columns](#kubeadmcontrolplane-print-columns) - [Changes to MachinePool resource](#changes-to-machinepool-resource) + - [MachinePool Status](#machinepool-status) + - [MachinePool (New)Conditions](#machinepool-newconditions) + - [MachinePool Print columns](#machinepool-print-columns) - [Changes to Cluster API contract](#changes-to-cluster-api-contract) + - [Contract for infrastructure providers](#contract-for-infrastructure-providers) + - [InfrastructureCluster](#infrastructurecluster) + - [InfrastructureMachine](#infrastructuremachine) + - [Contract for bootstrap providers](#contract-for-bootstrap-providers) + - [Contract for control plane Providers](#contract-for-control-plane-providers) + - [\[WIP\] Example use cases](#wip-example-use-cases) + - [Security Model](#security-model) + - [Risks and Mitigations](#risks-and-mitigations) + - [Alternatives](#alternatives) + - [Upgrade Strategy](#upgrade-strategy) + - [Implementation History](#implementation-history) # Summary @@ -1228,7 +1250,7 @@ Notes: should not consider the temporary unavailability of one of those instances as relevant for the overall control plane availability. e.g. one kube-apiserver over three down, should not impact the overall control plane availability. -## [WIP] Example use cases +### [WIP] Example use cases NOTE: Let me know if you want to add more use cases. I will try to collect more too and add a brief explanation about how each use case can be addressed with the improved status in CAPI resources @@ -1236,3 +1258,67 @@ As a cluster admin with MachineDeployment ownership I'd like to understand if my As a cluster admin with MachineDeployment ownership I'd like to understand why my MD rollout is blocked and why by looking at the MD status/conditions As a cluster admin with MachineDeployment ownership I'd like to understand why Machines are failing to be available by looking at the MD status/conditions As a cluster admin with MachineDeployment ownership I'd like to understand why Machines are stuck on deletion looking at the MD status/conditions + +### Security Model + +This proposal does not impact Cluster API security model. + +### Risks and Mitigations + +_Like any API change, this proposal will have impact on Cluster API users_ + +Mitigations: + +This proposal abides to Kubernetes deprecation rules, and it also ensures isomorphic conversions to/from v1beta1 APIs +can be supported (until v1beta1 removal, tentative Q1 2026). + +On top of that, a few design decisions have been made with the specific intent to further minimize impact on +users and providers e.g. +- The decision to keep `BackCompatibility` fields in v1beta2 API (until v1beta1 removal, tentative Q1 2026). +- The decision to allow providers to adopt the Cluster API v1beta2 contract at their own pace (transition _must be completed_ + before v1beta1 removal, tentative Q1 2026). + +All in all, those decisions are consistent with the fact that in Cluster API we are already treating our APIs +(and the Cluster API contract) as fully graduated APIs no matter if they are still beta. + +_This proposal requires a considerable amount of work, and it can be risky to implement this in a single release cycle_ + +This proposal intentionally highlights changes that can be implemented before the actual work for the v1beta2 API version starts. + +Those changes not only allow will users to take benefits from this work ASAP, but also provides a way to split the work +across more than one release cycle (tentatively two release cycles). + +## Alternatives + +_Keep Cluster API custom condition types, eventually improve them incrementally_ + +This idea was considered, but ultimately discarded because the end state we are aiming for is to align to Kubernetes. +Therefore, the sooner, the better, and the opportunity materialized when discussing the scope for v1beta2 API version. + +_Implement down conversion instead of maintaining `BackCompatibility` fields_ + +This idea was considered, but discarded because the constraint of ensuring down conversion for every new field/condition +would have prevented this proposal from designing the ideal target state we are aiming to. + +Additionally, the idea of dropping all the existing status fields/conditions in the new v1beta2 API (by supporting down conversion), +was considered negatively because it implies a sudden, big change both for users and providers. + +Instead, we would like to minimize impacts on users and providers by preserving old fields in `BackCompatibility` until v1beta1 removal, +which is ultimately the same process suggested for removal of API fields from graduated APIs. + +Note: There will still be some impacts because `BackCompatibility` fields will be in a different location from where the +original fields was, but this should be easier to handle than being forced to immediately adapt the new status fields/conditions. + +## Upgrade Strategy + +Transition from v1beta1 API/contract to v1beta2 contract is detailed in previous paragraphs. Notably: +- Isomorphic conversions to/from v1beta1 APIs are supported until v1beta1 removal, as required by Kubernetes deprecation rules. +- Providers will be allowed to adopt the Cluster API v1beta2 contract at their own pace (transition _must be completed_ + before v1beta1 removal). + +## Implementation History + +- [x] 07/17/2024: Open proposal PR, still WIP +- [x] 07/17/2024: Present proposal at a [community meeting](https://www.youtube.com/watch?v=frCg522ZfRQ) + - [10000 feet overview](https://docs.google.com/presentation/d/1hhgCufOIuqHz6YR_RUPGo0uTjfm5YafjCb6JHY1_clY/edit?usp=sharing) +- [x] MM/DD/YYYY: Remove WIP from the proposal PR From 69906c2e4ec72aac0398b9dcaaf48f9800638e95 Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Sat, 27 Jul 2024 16:08:35 +0200 Subject: [PATCH 08/22] More nits, feedbacks, small improvements --- .../improve-status-in-CAPI-resources.md | 385 ++++++++++-------- 1 file changed, 223 insertions(+), 162 deletions(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index 6198b4dd41a4..f03f932a3065 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -87,23 +87,25 @@ However, as the focus shifted away, most of the users don’t have time to becom This trend is blurring the lines between different Cluster API components; between Cluster API and Kubernetes, and tools like Helm, Flux, Argo, and so on. -This proposal focused on Cluster API's resource status which must become simpler to understand, more consistent with +This proposal focused on Cluster API's resource status which must become simpler to understand, more consistent with Kubernetes, and ideally with the entire ecosystem. ### Goals - Review and standardize the usage of the concept of readiness across Cluster API resources. - - Drop or amend improper usage of readiness - - Make the concept of Machine readiness extensible, thus allowing providers or external systems to inject their readiness checks. + - Drop or amend improper usage of readiness + - Make the concept of Machine readiness extensible, thus allowing providers or external systems to inject their readiness checks. - Review and standardize the usage of the concept of availability across Cluster API resources. - - Make the concept of Cluster availability extensible, thus allowing providers or external systems to inject their availability checks. + - Make the concept of Cluster availability extensible, thus allowing providers or external systems to inject their availability checks. - Bubble up more information about both control plane and worker Machines, ensuring consistency across Cluster API resources. - - Standardize replica counters on control plane, MachineDeployment, MachinePool, and bubble them up to the Cluster resource. - - Bubble up conditions about Machine readiness to control plane, MachineDeployment, MachinePool. + - Bubble up conditions about Machine readiness to control plane, MachineDeployment, MachinePool. + - Standardize replica counters on control plane, MachineDeployment, MachinePool. + - Ensure the Cluster resource will have enough information about controlled objects, which is crucial for users + relying on managed topologies (where the Cluster resource is the single point of control for the entire hierarchy of objects) - Introduce missing signals about connectivity to workload clusters, thus enabling to mark all the conditions depending on such connectivity with status Unknown after a certain amount of time. - Introduce a cleaner signal about Cluster API resources lifecycle transitions, e.g. scaling up or updating. -- Ensure everything in status can be used as a signal informing monitoring tools/automation on top of Cluster API +- Ensure everything in status can be used as a signal informing monitoring tools/automation on top of Cluster API about lifecycle transitions/state of the Cluster and the underlying components as well. ### Non-Goals @@ -117,6 +119,12 @@ Kubernetes, and ideally with the entire ecosystem. This proposal groups a set of changes to status fields in Cluster API resources. +Proposed changes are designed to introduce benefits for Cluster API users as soon as possible, but considering the +API deprecations rules, it is required to go through a multi-step transition to reach the desired shape of the API resources. +Such transition is detailed in the following paragraphs. + +At high level, proposed changes to status fields to status fields can be grouped in three set of changes: + Some of those changes could be considered straight forward, e.g. - K8s API conventions suggest to deprecate and remove `phase` fields from status, Cluster API is going to align to this recommendation @@ -131,8 +139,8 @@ Some of those changes could be considered straight forward, e.g. Some other changes require a little bit more context, which is provided in following paragraphs: -- Review and standardize the usage of the concept of readiness and availability to align to K8s API conventions / - conditions used in core K8s objects like `Pod`, `Node`, `Deployment`, `ReplicaSet` etc. +- Review and standardize the usage of the concept of readiness and availability to align to K8s API conventions / + conditions used in core K8s objects like `Pod`, `Node`, `Deployment`, `ReplicaSet` etc. - Transition to K8s API conventions fully aligned conditions types/condition management (and thus deprecation of the Cluster API "custom" guidelines for conditions). @@ -154,15 +162,15 @@ In order to keep making progress on this proposal, the first iteration will be f Other resources will be added as soon as there is agreement on the general direction. -Overall, the union of all those changes, is expected to greatly improve status fields, conditions, replica counters +Overall, the union of all those changes, is expected to greatly improve status fields, conditions, replica counters and print columns. -Those improvements are expected to provide benefit to users interacting with the system, using monitoring tools, and +Those improvements are expected to provide benefit to users interacting with the system, using monitoring tools, and building higher level systems or products on top of Cluster API. ### Readiness and Availability -The [condition CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20200506-conditions.md) in Cluster API introduced very strict requirements about `Ready` conditions, mandating it +The [condition CAEP](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20200506-conditions.md) in Cluster API introduced very strict requirements about `Ready` conditions, mandating it to exists on all resources and also mandating that `Ready` must be computed as the summary of all other existing conditions. @@ -187,11 +195,11 @@ As a consequence, we will continue to use the ready condition *only* where it ma semantic that conveys important information to the users (vs applying "blindly" the same formula everywhere). The most important effect of this change is the definition of a new semantic for the Machine's `Ready` condition, that -will now clearly represent the "machine can host workloads" (prior art Kubernetes nodes are ready when "node can host pods"). +will now clearly represent the "machine can host workloads" (prior art Kubernetes nodes are ready when "node can host pods"). To improve the benefit of this change: - This proposal is ensuring that whenever Machine ready is used, it always means the same thing (e.g. replica counters) -- This proposal is also changing contract fields where ready was used improperly to represent +- This proposal is also changing contract fields where ready was used improperly to represent initial provisioning (k8s API conventions suggest to use ready only for long-running process). All in all, Machine's Ready concept should be much more clear, consistent, intuitive after proposed changes. @@ -211,21 +219,25 @@ Last but not least: uniform meaning across all resource types - Additionally, we are enforcing the same consistency for replica counters and other status fields. -### Transition to K8s API conventions aligned conditions +### Transition to Kubernetes API conventions aligned conditions -K8s is undergoing an effort of standardizing usage of conditions across all resource types, and the transition to -the v1beta2 API version is a great opportunity for Cluster API to align to this effort. +Kubernetes is undergoing a long term effort of standardizing usage of conditions across all resource types, and the +transition to the v1beta2 API version is a great opportunity for Cluster API to align to this effort. The value of this transition is substantial, because the differences that exists today's are really confusing for users; those differences are also making it harder for ecosystem tools to build on top of Cluster API, and in some cases even confusing new (and old) contributors. -With this proposal Cluster API will close the gap with K8s API conventions in regard to: +With this proposal Cluster API will close the gap with Kubernetes API conventions in regard to: - Polarity: Condition type names should make sense for humans; neither positive nor negative polarity can be recommended as a general rule (already implemented by [#10550](https://github.com/kubernetes-sigs/cluster-api/pull/10550)) - Use of the `Reason` field is required (currently in Cluster API reasons is added only when condition are false) - Controllers should apply their conditions to a resource the first time they visit the resource, even if the status is `Unknown`. - (currently Cluster API controllers add conditions at different stages of the reconcile loops) + (currently Cluster API controllers add conditions at different stages of the reconcile loops). Please note that: + - If more than one controller add conditions to same resources, conditions managed by the different controllers will be + applied at different time. + - Kubernetes API conventions account for exceptions to this rule; for known conditions, the absence of a condition status should + be interpreted the same as `Unknown`, and typically indicates that reconciliation has not yet finished. - Cluster API is also dropping its own `Condition` type and will start using `metav1.Conditions` from the Kubernetes API. The last point also has another implication, which is the removal of the `Severity` field which is currently used @@ -249,7 +261,7 @@ for a condition, e.g. by looking at status, reason, time since the condition tra Following changes are implemented to Machine's status: -- Disambiguate usage of ready term by renaming fields used for the provisioning workflow +- Disambiguate the usage of the ready term by renaming fields used for the initial provisioning workflow - Align to K8s API conventions by deprecating `Phase` and corresponding `LastUpdated` - Remove `FailureReason` and `FailureMessage` to get rid of the confusing concept of terminal failures - Transition to new, improved, K8s API conventions aligned conditions @@ -296,23 +308,23 @@ type MachineInitializationStatus struct { } ``` -| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | -|--------------------------------|----------------------------------------------------------|--------------------------------------------| -| | `Initialization` (new) | `Initialization` | -| `BootstrapReady` | `Initialization.BootstrapSecretCreated` (renamed) | `Initialization.BootstrapSecretCreated` | -| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | -| | `BackCompatibilty` (new) | (removed) | -| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | -| `LastUpdated` (deprecated) | `BackCompatibilty.LastUpdated` (renamed) (deprecated) | (removed) | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| other fields... | other fields... | other fields... | +| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | +|--------------------------------|----------------------------------------------------------|---------------------------------------------------| +| | `Initialization` (new) | `Initialization` | +| `BootstrapReady` | `Initialization.BootstrapDataSecretCreated` (renamed) | `Initialization.BootstrapDataSecretCreated` | +| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | +| | `BackCompatibilty` (new) | (removed) | +| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | +| `LastUpdated` (deprecated) | `BackCompatibilty.LastUpdated` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| other fields... | other fields... | other fields... | Notes: - The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). - Fields in this struct are used for supporting down conversions, thus providing users relying on v1beta1 APIs additional buffer time to pick up the new changes. + Fields in this struct are used for supporting down conversions, thus providing users relying on v1beta1 APIs additional buffer time to pick up the new changes. ##### Machine (New)Conditions @@ -338,7 +350,7 @@ Notes: Notes: - This proposal introduces a mechanism for extending the meaning of Machine readiness, `ReadinessGates` (see [changes to Machine.Spec](#machine-spec)). -- While `Ready` is the main signal for machines operational state, higher level abstractions in Cluster API like e.g. +- While `Ready` is the main signal for machines operational state, higher level abstractions in Cluster API like e.g. MachineDeployment are relying on the concept of Machine's `Availability`, which can be seen as readiness + stability. In order to standardize this concept across different higher level abstractions, this proposal is surfacing `Availability` condition at Machine level as well as adding a new `MinReadySeconds` field (see [changes to Machine.Spec](#machine-spec)) @@ -349,13 +361,11 @@ Notes: from the new `RemoteConnectionProbe` condition at cluster level (see [Cluster (New)Conditions](#cluster-newconditions)); more specifically those condition should be set to `Unknown` after the cluster probe fails (or after whatever period is defined in the `--remote-conditions-grace-period` flag) -- `HealthCheckSucceeded` and `OwnerRemediated` (or `ExternalRemediationRequestAvailable`) conditions are set by the +- `HealthCheckSucceeded` and `OwnerRemediated` (or `ExternalRemediationRequestAvailable`) conditions are set by the MachineHealthCheck controller in case a MachineHealthCheck targets the machine. - KubeadmControlPlane also adds additional conditions to Machines, but those conditions are not included in the table above for sake of simplicity (however they are documented in the KubeadmControlPlane paragraph). -TODO: think carefully at remote conditions becoming unknown, this could block a few operations ... - #### Machine Spec Machine's spec is going to be improved to allow 3rd party components to extend the semantic of the new Machine's `Ready` condition @@ -375,6 +385,15 @@ type MachineSpec struct { // If specified, all readiness gates will be evaluated for Machine readiness. // A Machine is ready when `InfrastructureReady`, `NodeHealthy` and `HealthCheckSucceeded` (if present) are "True"; // if other conditions are defined in this field, those conditions should be "True" as well for the Machine to be ready. + // + // This field can be used e.g. + // - By cluster API control plane providers willing to extend the semantic of the ready condition for the machine they + // control, like the kubeadm control provider adding readinessGates for the APIServerPodHealthy, SchedulerPodHealthy conditions, etc. + // - By external controllers, e.g. responsible to install special software/hardware on the machines and willing + // to include the status of those components into readinessGates (by surfacing new conditions on Machines and + // adding them to ReadinessGates). + // + // responsible to install special software/hardware on the machines doing the same even if they are not actual CAPI controllers // +optional // +listType=map // +listMapKey=conditionType @@ -391,14 +410,14 @@ type MachineReadinessGate struct { } ``` -| v1beta1 (current) | v1Beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | -|------------------------|-----------------------------|-------------------------------------| -| `ReadinessGates` (new) | `ReadinessGates` | `ReadinessGates` | -| other fields... | other fields... | other fields... | +| v1beta1 (current) | v1Beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | +|------------------------|-----------------------------|---------------------------------------------------| +| `ReadinessGates` (new) | `ReadinessGates` | `ReadinessGates` | +| other fields... | other fields... | other fields... | Notes: - Both `MinReadySeconds` and `ReadinessGates` should be treated as other in-place propagated fields (changing this should not trigger rollouts). -- Similarly to Pod's `ReadinessGates`, also Machine's `ReadinessGates` accept only conditions with positive polarity; +- Similarly to Pod's `ReadinessGates`, also Machine's `ReadinessGates` accept only conditions with positive polarity; The Cluster API project might revisit this in future to stay aligned with Kubernetes or if there are use cases justifying this change. #### Machine Print columns @@ -418,12 +437,14 @@ Notes: | | `KERNEL-VERSION` (new) (*) | | | `CONTAINER-RUNTIME` (new) (*) | -TODO: figure out if can `INTERNAL-IP` (new) (*), `EXTERNAL-IP` after `VERSION` / before `OS-IMAGE`? (similar to Nodes...). -might be something like `$.status.addresses[?(@.type == 'InternalIP')].address` works, but not sure what happens if there are 0 or more addresses... - Stefan +1 if possible - (*) visible only when using `kubectl get -o wide` +Notes: +- Note: print columns are not subject to API guarantee, so we are free to iteratively improve this anytime. +- During the implementation we are going to verify if the resulting layout and eventually make final adjustments to the column list. +- During the implementation we are going to explore if it is possible to add `INTERNAL-IP` (new) (*), `EXTERNAL-IP` after `VERSION` / before `OS-IMAGE`? + might be something like `$.status.addresses[?(@.type == 'InternalIP')].address` works + ### Changes to MachineSet resource #### MachineSet Status @@ -444,9 +465,9 @@ type MachineSetStatus struct { // +optional ReadyReplicas int32 `json:"readyReplicas"` - // The number of available replicas for this MachineSet. A machine is considered available when Machine's Available condition is true. - // +optional - AvailableReplicas int32 `json:"availableReplicas"` + // The number of available replicas for this MachineSet. A machine is considered available when Machine's Available condition is true. + // +optional + AvailableReplicas int32 `json:"availableReplicas"` // The number of up-to-date replicas for this MachineSet. A machine is considered up-to-date when Machine's UpToDate condition is true. // +optional @@ -463,30 +484,28 @@ type MachineSetStatus struct { } ``` -| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | -|--------------------------------------|-------------------------------------------------------------|-------------------------------------| -| `ExprimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | -| `ExprimentalAvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | -| | `BackCompatibilty` (new) | (removed) | -| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | -| `AvailableReplicas` (deprecated) | `BackCompatibilty.AvailableReplicas` (renamed) (deprecated) | (removed) | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| `UpToDateReplicas` (new) | `UpToDateReplicas` | `UpToDateReplicas` | -| other fields... | other fields... | other fields... | +| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | +|---------------------------------------|-------------------------------------------------------------|---------------------------------------------------| +| `ExperimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | +| `ExperimentalAvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | +| | `BackCompatibilty` (new) | (removed) | +| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | +| `AvailableReplicas` (deprecated) | `BackCompatibilty.AvailableReplicas` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| `UpToDateReplicas` (new) | `UpToDateReplicas` | `UpToDateReplicas` | +| other fields... | other fields... | other fields... | Notes: - The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). Fields in this struct are used for supporting down conversions, thus providing users relying on v1beta1 APIs additional buffer time to pick up the new changes. -- This proposal is using `UpToDateReplicas` instead of `UpdatedReplicas`; This is a deliberated choice to avoid +- This proposal is using `UpToDateReplicas` instead of `UpdatedReplicas`; This is a deliberated choice to avoid confusion between update (any change) and upgrade (change of the Kubernetes versions). - Also `AvailableReplicas` will determine Machine's availability by reading Machine.Available condition instead of computing availability as of today, however in this case the semantic of the field is not changed -TODO: check `FullyLabeledReplicas`, do we still need it? - #### MachineSet (New)Conditions | Condition | Note | @@ -504,12 +523,16 @@ TODO: check `FullyLabeledReplicas`, do we still need it? > Ready, MachinesCreated, Resized, MachinesReady. Notes: +- Conditions like `ScalingUp`, `ScalingDown`, `Remediating` are intended to provide visibility on the corresponding lifecycle operation. + e.g. If the scaling down operation is being blocked by a machine having issues while deleting, this should surface with a reason/message in + the `ScalingDown` condition. - MachineSet conditions are intentionally mostly consistent with MachineDeployment conditions to help users troubleshooting . - MachineSet is considered as a sort of implementation detail of MachineDeployments, so it doesn't have its own concept of availability. Similarly, this proposal is dropping the notion of MachineSet readiness because it is preferred to let users focusing on Machines readiness. -- `Remediating` for older MachineSet sets will report that remediation will happen as part of the regular rollout. -- `UpToDate` condition initially will be `false` for older MachineSet, `true` for the current MachineSet; however in - the future the latter might evolve in case Cluster API will start supporting in-place upgrades. +- When implementing this proposal `UpToDate` condition will be `false` for older MachineSet, `true` for the current MachineSet; + in the future this might change in case Cluster API will start supporting in-place upgrades. +- `Remediating` for older MachineSets will report that remediation will happen as part of the regular rollout (Cluster API + do not remediate machines on old machine sets, because those machines are already scheduled for deletion). #### MachineSet Print columns @@ -529,6 +552,9 @@ Notes: (*) visible only when using `kubectl get -o wide` Notes: +- Print columns are not subject to any deprecation rule, so it is possible to iteratively improve print columns without waiting for the next API version. +- During the implementation we are going to verify if the resulting layout and eventually make final adjustments to the column list. +- During the implementation we should consider if to add columns for bootstrapRef and infraRef resource (same could apply to other resources), - In k8s Deployment and ReplicaSet have different print columns for replica counters; this proposal enforces replicas counter columns consistent across all resources. @@ -572,20 +598,20 @@ type MachineDeploymentStatus struct { } ``` -| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | -|--------------------------------------|-------------------------------------------------------------|-------------------------------------| -| `UpdatedReplicas` | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | -| `ExprimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | -| `ExprimentalAvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | -| | `BackCompatibilty` (new) | (removed) | -| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | -| `AvailableReplicas` (deprecated) | `BackCompatibilty.AvailableReplicas` (renamed) (deprecated) | (removed) | -| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| other fields... | other fields... | other fields... | +| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | +|---------------------------------------|-------------------------------------------------------------|---------------------------------------------------| +| `UpdatedReplicas` | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | +| `ExperimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | +| `ExperimentalAvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | +| | `BackCompatibilty` (new) | (removed) | +| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | +| `AvailableReplicas` (deprecated) | `BackCompatibilty.AvailableReplicas` (renamed) (deprecated) | (removed) | +| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| other fields... | other fields... | other fields... | Notes: - The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). @@ -608,6 +634,11 @@ Notes: > To better evaluate proposed changes, below you can find the list of current MachineDeployment's conditions: > Ready, Available. +Notes: +- Conditions like `ScalingUp`, `ScalingDown`, `Remediating` are intended to provide visibility on the corresponding lifecycle operation. + e.g. If the scaling down operation is being blocked by a machine having issues while deleting, this should surface as a reason/message in + the `ScalingDown` condition. + #### MachineDeployment Print columns | Current | To be | @@ -623,18 +654,19 @@ Notes: | `AGE` | `AGE` | | `VERSION` | `VERSION` | -TODO: consider if to add MachineDeployment `AVAILABLE`, but we should find a way to differentiate from `AVAILABLE` replicas - Stefan +1 to have AVAILABLE, not sure if we can have two columns with the same header - (*) visible only when using `kubectl get -o wide` +Notes: +- Print columns are not subject to any deprecation rule, so it is possible to iteratively improve print columns without waiting for the next API version. +- During the implementation we are going to verify if the resulting layout and eventually make final adjustments to the column list. + ### Changes to Cluster resource #### Cluster Status Following changes are implemented to Cluster's status: -- Disambiguate usage of ready term by renaming fields used for the provisioning workflow +- Disambiguate the usage of the ready term by renaming fields used for the initial provisioning workflow - Align to K8s API conventions by deprecating `Phase` and corresponding `LastUpdated` - Remove `FailureReason` and `FailureMessage` to get rid of the confusing concept of terminal failures - Transition to new, improved, K8s API conventions aligned conditions @@ -652,7 +684,7 @@ type ClusterStatus struct { // The value of those fields is never updated after provisioning is completed. // Use conditions to monitor the operational state of the Cluster's BootstrapSecret. // +optional - Initialization *MachineInitializationStatus `json:"initialization,omitempty"` + Initialization *ClusterInitializationStatus `json:"initialization,omitempty"` // Represents the observations of a Cluster's current state. // +optional @@ -666,7 +698,7 @@ type ClusterStatus struct { // Workers groups all the observations about Cluster's Workers current state. // +optional - Workers *ClusterControlPlaneStatus `json:"workers,omitempty"` + Workers *WorkersStatus `json:"workers,omitempty"` // other fields } @@ -713,8 +745,8 @@ type ClusterControlPlaneStatus struct { AvailableReplicas int32 `json:"availableReplicas"` } -// WorkersPlaneStatus groups all the observations about workers current state. -type WorkersPlaneStatus struct { +// WorkersStatus groups all the observations about workers current state. +type WorkersStatus struct { // Total number of desired worker machines in this cluster. // +optional DesiredReplicas int32 `json:"desiredReplicas"` @@ -737,32 +769,30 @@ type WorkersPlaneStatus struct { } ``` -// TODO: check about "non-terminated" for replicas fields. - -| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | -|------------------------------------------|----------------------------------------------------------|--------------------------------------------| -| | `Initialization` (new) | `Initialization` | -| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | -| `ControlPlaneReady` | `Initialization.ControlPlaneInitialized` (renamed) | `Initialization.ControlPlaneInitialized` | -| | `BackCompatibilty` (new) | (removed) | -| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| `ControlPlane` (new) | `ControlPlane` | `ControlPlane` | -| `ControlPlane.DesiredReplicas` (new) | `ControlPlane.DesiredReplicas` | `ControlPlane.DesiredReplicas` | -| `ControlPlane.Replicas` (new) | `ControlPlane.Replicas` | `ControlPlane.Replicas` | -| `ControlPlane.ReadyReplicas` (new) | `ControlPlane.ReadyReplicas` | `ControlPlane.ReadyReplicas` | -| `ControlPlane.UpToDateReplicas` (new) | `ControlPlane.UpToDateReplicas` | `ControlPlane.UpToDateReplicas` | -| `ControlPlane.AvailableReplicas` (new) | `ControlPlane.AvailableReplicas` | `ControlPlane.AvailableReplicas` | -| `Workers` (new) | `Workers` | `Workers` | -| `Workers.DesiredReplicas` (new) | `Workers.DesiredReplicas` | `Workers.DesiredReplicas` | -| `Workers.Replicas` (new) | `Workers.Replicas` | `Workers.Replicas` | -| `Workers.ReadyReplicas` (new) | `Workers.ReadyReplicas` | `Workers.ReadyReplicas` | -| `Workers.UpToDateReplicas` (new) | `Workers.UpToDateReplicas` | `Workers.UpToDateReplicas` | -| `Workers.AvailableReplicas` (new) | `Workers.AvailableReplicas` | `Workers.AvailableReplicas` | -| other fields... | other fields... | other fields... | +| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | +|----------------------------------------|----------------------------------------------------------|---------------------------------------------------| +| | `Initialization` (new) | `Initialization` | +| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | +| `ControlPlaneReady` | `Initialization.ControlPlaneInitialized` (renamed) | `Initialization.ControlPlaneInitialized` | +| | `BackCompatibilty` (new) | (removed) | +| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| `ControlPlane` (new) | `ControlPlane` | `ControlPlane` | +| `ControlPlane.DesiredReplicas` (new) | `ControlPlane.DesiredReplicas` | `ControlPlane.DesiredReplicas` | +| `ControlPlane.Replicas` (new) | `ControlPlane.Replicas` | `ControlPlane.Replicas` | +| `ControlPlane.ReadyReplicas` (new) | `ControlPlane.ReadyReplicas` | `ControlPlane.ReadyReplicas` | +| `ControlPlane.UpToDateReplicas` (new) | `ControlPlane.UpToDateReplicas` | `ControlPlane.UpToDateReplicas` | +| `ControlPlane.AvailableReplicas` (new) | `ControlPlane.AvailableReplicas` | `ControlPlane.AvailableReplicas` | +| `Workers` (new) | `Workers` | `Workers` | +| `Workers.DesiredReplicas` (new) | `Workers.DesiredReplicas` | `Workers.DesiredReplicas` | +| `Workers.Replicas` (new) | `Workers.Replicas` | `Workers.Replicas` | +| `Workers.ReadyReplicas` (new) | `Workers.ReadyReplicas` | `Workers.ReadyReplicas` | +| `Workers.UpToDateReplicas` (new) | `Workers.UpToDateReplicas` | `Workers.UpToDateReplicas` | +| `Workers.AvailableReplicas` (new) | `Workers.AvailableReplicas` | `Workers.AvailableReplicas` | +| other fields... | other fields... | other fields... | notes: - The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). @@ -777,7 +807,7 @@ notes: | `RemoteConnectionProbe` | True when control plane can be reached; in case of connection problems, the condition turns to false only if the the cluster cannot be reached for 40s after the first connection problem is detected (or whatever period is defined in the `--remote-connection-grace-period` flag) the cluster cannot be reached | | `InfrastructureReady` | Mirror of Cluster's infrastructure `Ready` condition | | `ControlPlaneAvailable` | Mirror of Cluster's control plane `Available` condition | -| `WorkersAvaiable` | Summary of MachineDeployment and MachinePool's `Available` condition | +| `WorkersAvailable` | Summary of MachineDeployment and MachinePool's `Available` condition | | `TopologyReconciled` | | | `ScalingUp` | True if available replicas < desired replicas | | `ScalingDown` | True if replicas > desired replicas | @@ -790,10 +820,13 @@ notes: > Ready, InfrastructureReady, ControlPlaneReady, ControlPlaneInitialized, TopologyReconciled Notes: +- Conditions like `ScalingUp`, `ScalingDown`, `Remediating` are intended to provide visibility on the corresponding lifecycle operation. + e.g. If the scaling down operation is being blocked by a machine having issues while deleting, this should surface as a reason/message in + the `ScalingDown` condition. - `TopologyReconciled` exists only for classy clusters; this condition is managed by the topology reconciler. - Cluster API is going to maintain a `lastRemoteConnectionProbeTime` and use it in combination with the `--remote-connection-grace-period` flag to avoid flakes on `RemoteConnectionProbe`. -- Similarly to `lastHeartbeatTime` in Kubernetes conditions, also `lastRemoteConnectionProbeTime` will not surface on the +- Similarly to `lastHeartbeatTime` in Kubernetes conditions, also `lastRemoteConnectionProbeTime` will not surface on the API in order to avoid costly, continuous reconcile events. #### Cluster Spec @@ -803,10 +836,10 @@ Cluster's spec is going to be improved to allow 3rd party to extend the semantic Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); After golang types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. -| v1beta1 (current) | v1Beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | -|---------------------------|-----------------------------|-------------------------------------| -| `AvailabilityGates` (new) | `AvailabilityGates` | `AvailabilityGates` | -| other fields... | other fields... | other fields... | +| v1beta1 (current) | v1Beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | +|---------------------------|-----------------------------|---------------------------------------------------| +| `AvailabilityGates` (new) | `AvailabilityGates` | `AvailabilityGates` | +| other fields... | other fields... | other fields... | ```golang type ClusterSpec struct { @@ -833,7 +866,7 @@ type ClusterAvailabilityGate struct { ``` Notes: -- Similarly to Pod's `ReadinessGates`, also Machine's `AvailabilityGates` accept only conditions with positive polarity; +- Similarly to Pod's `ReadinessGates`, also Cluster's `AvailabilityGates` accept only conditions with positive polarity; The Cluster API project might revisit this in the future to stay aligned with Kubernetes or if there are use cases justifying this change. - In future the Cluster API project might consider ways to make `AvailabilityGates` configurable at ClusterClass level, but this can be implemented as a follow-up. @@ -861,13 +894,19 @@ Notes: (*) visible only when using `kubectl get -o wide` +Notes: +- Print columns are not subject to any deprecation rule, so it is possible to iteratively improve print columns without waiting for the next API version. +- During the implementation we are going to verify if the resulting layout and eventually make final adjustments to the column list. + ### Changes to KubeadmControlPlane (KCP) resource +KubeadmControlPlane (KCP) is considered a reference implementation for control plane providers, so it is included in this +proposal even if it is not a core Cluster API resource. + #### KubeadmControlPlane Status Following changes are implemented to KubeadmControlPlane's status: -- TODO: figure out what to do with contract fields + conditions - Update `ReadyReplicas` counter to use the same semantic Machine's `Ready` condition and add missing `UpToDateReplicas`. - Remove `FailureReason` and `FailureMessage` to get rid of the confusing concept of terminal failures - Transition to new, improved, K8s API conventions aligned conditions @@ -899,52 +938,53 @@ type KubeadmControlPlaneStatus struct { Conditions []metav1.Condition `json:"conditions,omitempty"` // Other fields... - // NOTE: `Ready`, `FailureReason`, `FailureMessage` fields won't be there anymore + // NOTE: `Ready`, `FailureReason`, `FailureMessage` fields won't be there anymore } ``` -| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta1 removal (tentative Q1 2026) | -|-----------------------------------|----------------------------------------------------------|-------------------------------------| -| `Ready` (deprecated) | `Ready` (deprecated) | (removed) | -| | `BackCompatibilty` (new) | (removed) | -| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | -| `ExperimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | -| `UpdatedReplicas` | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | -| `AvailableReplicas` (new) | `AvailableReplicas` | `AvailableReplicas` | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| other fields... | other fields... | other fields... | +| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | +|-----------------------------------|----------------------------------------------------------|---------------------------------------------------| +| `Ready` (deprecated) | `Ready` (deprecated) | (removed) | +| | `BackCompatibilty` (new) | (removed) | +| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | +| `ExperimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | +| `UpdatedReplicas` | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | +| `AvailableReplicas` (new) | `AvailableReplicas` | `AvailableReplicas` | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| other fields... | other fields... | other fields... | TODO: double check usages of status.ready. #### KubeadmControlPlane (New)Conditions -| Condition | Note | -|---------------------------------|-------------------------------------------------------------------------------------------------------------------------| -| `Available` | True if the control plane can be reached and there is etcd quorum, and `CertificatesAvailable` is true | -| `CertificatesAvailable` | True if all the cluster certificates exist. | -| `ReplicaFailure` | This condition surfaces issues on creating Machines controlled by this KubeadmControlPlane, if any. | -| `Initialized` | True if ControlPlaneComponentsHealthy. | -| `ControlPlaneComponentsHealthy` | This condition surfaces detail of issues on the controlled machines, if any. | -| `EtcdClusterHealthy` | This condition surfaces detail of issues on the controlled machines, if any. | -| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | -| `ScalingUp` | True if available replicas < desired replicas | -| `ScalingDown` | True if replicas > desired replicas | -| `UpToDate` | True if all the Machines controlled by this ControlPlane are up to date | -| `Remediating` | True if there is at least one machine controlled by this KubeadmControlPlane is not passing health checks | -| `Deleted` | True if KubeadmControlPlane is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | -| `Paused` | True if this resource or the Cluster it belongs to are paused | +| Condition | Note | +|-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `Available` | True if the control plane can be reached, `EtcdClusterAvailable` is true, and `CertificatesAvailable` is true | +| `CertificatesAvailable` | True if all the cluster certificates exist. | +| `ReplicaFailure` | This condition surfaces issues on creating Machines controlled by this KubeadmControlPlane, if any. | +| `EtcdClusterAvailable` | This condition surfaces issues to the managed etcd cluster, if any. It is computed as aggregation of Machines's `EtcdMemberHealthy` (if not using an external etcd) conditions plus additional checks validating potential issues to etcd quorum | +| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. Please note this will include also `ApiServerPodHealthy`, `ControllerManagerPodHealthy`, `SchedulerPodHealthy`, and if not using an external etcd also `EtcdPodHealthy`, `EtcdMemberHealthy` | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `UpToDate` | True if all the Machines controlled by this ControlPlane are up to date | +| `Remediating` | True if there is at least one machine controlled by this KubeadmControlPlane is not passing health checks | +| `Deleted` | True if KubeadmControlPlane is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | +| `Paused` | True if this resource or the Cluster it belongs to are paused | > To better evaluate proposed changes, below you can find the list of current KubeadmControlPlane's conditions: > Ready, CertificatesAvailable, MachinesCreated, Available, MachinesSpecUpToDate, Resized, MachinesReady, > ControlPlaneComponentsHealthy, EtcdClusterHealthy. Notes: -- `ControlPlaneComponentsHealthy` and `EtcdClusterHealthy` have a very strict semantic: everything should be ok for the condition to be true; - This means it is expected those condition to flick while performing lifecycle operations; over time we might consider changes to make - those conditions to distinguish more accurately health issues vs "expected" temporary unavailability. +- Conditions like `ScalingUp`, `ScalingDown`, `Remediating` are intended to provide visibility on the corresponding lifecycle operation. + e.g. If the scaling down operation is being blocked by a machine having issues while deleting, this should surface as a reason/message in + the `ScalingDown` condition. +- The KubeadmControlPlane controller is going to add `ApiServerPodHealthy`, `ControllerManagerPodHealthy`, `SchedulerPodHealthy`, + `EtcdPodHealthy`, `EtcdMemberHealthy`conditions to the controller machines; those conditions will also be defined as `readinessGates` + for computing Machine's ready condition. #### KubeadmControlPlane Print columns @@ -964,6 +1004,10 @@ Notes: (*) visible only when using `kubectl get -o wide` +Notes: +- Print columns are not subject to any deprecation rule, so it is possible to iteratively improve print columns without waiting for the next API version. +- During the implementation we are going to verify if the resulting layout and eventually make final adjustments to the column list. + ### Changes to MachinePool resource #### MachinePool Status @@ -1074,7 +1118,7 @@ Notes: Notes: - Conditions like `ScalingUp`, `ScalingDown`, `Remediating` are intended to provide visibility on the corresponding lifecycle operation. - e.g. If the scaling up operation is being blocked by a machine having issues while deleting, this should surface with a reason/message in + e.g. If the scaling down operation is being blocked by a machine having issues while deleting, this should surface with a reason/message in the `ScalingDown` condition. - As of today MachinePool does not have a notion similar to MachineDeployment's MaxUnavailability. @@ -1096,7 +1140,7 @@ Notes: (*) visible only when using `kubectl get -o wide` Notes: -- Print columns are not subject to any deprecation rule, so it will be possible to iteratively improve them without waiting for the next API version. +- Print columns are not subject to any deprecation rule, so it is possible to iteratively improve print columns without waiting for the next API version. - During the implementation we are going to verify if the resulting layout and eventually make final adjustments to the column list. ### Changes to Cluster API contract @@ -1250,15 +1294,32 @@ Notes: should not consider the temporary unavailability of one of those instances as relevant for the overall control plane availability. e.g. one kube-apiserver over three down, should not impact the overall control plane availability. -### [WIP] Example use cases -NOTE: Let me know if you want to add more use cases. I will try to collect more too and add a brief explanation about how -each use case can be addressed with the improved status in CAPI resources +### Example use cases + +This paragraph is a collection of use cases for an improved status in cluster API resources and notes about how this +proposal address those use cases. As a cluster admin with MachineDeployment ownership I'd like to understand if my MD is performing a rolling upgrade and why by looking at the MD status/conditions + +> The main signal for MD is performing a rolling upgrade will be `MD.Status.Conditions[UpToDate]` false. + +> At least in the first iteration there won't be a signal at MD level about why rollout is happening, because controlled machines might +> have different reasons why they are not UpToDate (and the admin can check those conditions by looking at single machines). +> In future iterations of this proposal we might find ways to aggregate those reasons into the message for the `MD.Status.Conditions[UpToDate]` condition. + As a cluster admin with MachineDeployment ownership I'd like to understand why my MD rollout is blocked and why by looking at the MD status/conditions + +> `MD.Status.Conditions[ScalingUp]` and `MD.Status.Conditions[ScalingDown]` will give information about how the rollout is being performed, +> if there are issues creating or deleting the machines, etc. + As a cluster admin with MachineDeployment ownership I'd like to understand why Machines are failing to be available by looking at the MD status/conditions + +> `MD.Status.Conditions[MachinesReady]` condition will aggregate errors from all the Machines controlled by a MD. + As a cluster admin with MachineDeployment ownership I'd like to understand why Machines are stuck on deletion looking at the MD status/conditions +> `MD.Status.Conditions[ScalingDown]` will give information if there are issues deleting machines. + ### Security Model This proposal does not impact Cluster API security model. From 65c997c66fb3c9e96fca3e60c56be88e2a63596d Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Mon, 29 Jul 2024 18:14:38 +0200 Subject: [PATCH 09/22] More nits and cleanup MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Stefan Büringer buringerst@vmware.com Co-authored-by: Christian Schlotter christian.schlotter@broadcom.com --- .../improve-status-in-CAPI-resources.md | 305 +++++++++--------- 1 file changed, 153 insertions(+), 152 deletions(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index f03f932a3065..d2f323284f1d 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -1,5 +1,5 @@ --- -title: Proposal Template +title: Improving status in CAPI resources authors: - "@fabriziopandini" reviewers: @@ -12,10 +12,10 @@ reviewers: - "@peterochodo" - "@zjs" creation-date: 2024-07-17 -last-updated: 2024-07-27 +last-updated: 2024-07-29 status: implementable see-also: -- [proposal about custom Cluster API conditions (superseed by this document)](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20200506-conditions.md#the-ready-condition) +- [Proposal about custom Cluster API conditions (superseded by this document)](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20200506-conditions.md) - [Kubernetes API guidelines](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md) - [Kubernetes API deprecation rules](https://kubernetes.io/docs/reference/using-api/deprecation-policy/#fields-of-rest-resources) --- @@ -123,7 +123,7 @@ Proposed changes are designed to introduce benefits for Cluster API users as soo API deprecations rules, it is required to go through a multi-step transition to reach the desired shape of the API resources. Such transition is detailed in the following paragraphs. -At high level, proposed changes to status fields to status fields can be grouped in three set of changes: +At high level, proposed changes to status fields can be grouped in three sets of changes: Some of those changes could be considered straight forward, e.g. @@ -141,13 +141,13 @@ Some other changes require a little bit more context, which is provided in follo - Review and standardize the usage of the concept of readiness and availability to align to K8s API conventions / conditions used in core K8s objects like `Pod`, `Node`, `Deployment`, `ReplicaSet` etc. -- Transition to K8s API conventions fully aligned conditions types/condition management (and thus deprecation of +- Transition to K8s API conventions fully aligned condition types/condition management (and thus deprecation of the Cluster API "custom" guidelines for conditions). The last set of changes is a consequence of the above changes, or small improvements to address feedback received over time; changes in this group will be detailed case by case in the following paragraphs, a few examples: -- Change the semantic of ReadyReplica counters to use Machine's Ready condition instead of Node's Ready condition. +- Change the semantic of ReadyReplicas counters to use Machine's Ready condition instead of Node's Ready condition. (so everywhere Ready is used for a Machine it always means the same thing) - Add a new condition monitoring the status of the connectivity to workload clusters (`RemoteConnectionProbe`). @@ -162,10 +162,10 @@ In order to keep making progress on this proposal, the first iteration will be f Other resources will be added as soon as there is agreement on the general direction. -Overall, the union of all those changes, is expected to greatly improve status fields, conditions, replica counters +Overall, the union of all those changes is expected to greatly improve status fields, conditions, replica counters and print columns. -Those improvements are expected to provide benefit to users interacting with the system, using monitoring tools, and +These improvements are expected to provide benefit to users interacting with the system, using monitoring tools, and building higher level systems or products on top of Cluster API. ### Readiness and Availability @@ -177,7 +177,7 @@ conditions. However, over time Cluster API maintainers recognized several limitations of the “one fits all”, strict approach. E.g., higher level abstractions in Cluster API are designed to remain operational during lifecycle operations, -for instance a MachineDeployment is operational even if is rolling out. +for instance a MachineDeployment is operational even if it is rolling out. But the use cases above were hard to combine with the strict requirement to have all the conditions true, and as a result today Cluster API resources barely have conditions surfacing that lifecycle operations are happening, or where @@ -198,7 +198,7 @@ The most important effect of this change is the definition of a new semantic for will now clearly represent the "machine can host workloads" (prior art Kubernetes nodes are ready when "node can host pods"). To improve the benefit of this change: -- This proposal is ensuring that whenever Machine ready is used, it always means the same thing (e.g. replica counters) +- This proposal is ensuring that whenever Machine ready is used, it always means the same thing (e.g. ready replica counters) - This proposal is also changing contract fields where ready was used improperly to represent initial provisioning (k8s API conventions suggest to use ready only for long-running process). @@ -209,7 +209,7 @@ This proposal is also dropping the `Ready` condition from higher level abstracti Instead, where not already present, this proposal is introducing a new `Available` condition that better represents the fact that those objects are operational even if there is a certain degree of not readiness / disruption in the system -or if lifecycle operations are happening (prior art `Available` condition in K8s Deployments). +or if lifecycle operations are happening (prior art: `Available` condition in K8s Deployments). Last but not least: @@ -224,8 +224,8 @@ Last but not least: Kubernetes is undergoing a long term effort of standardizing usage of conditions across all resource types, and the transition to the v1beta2 API version is a great opportunity for Cluster API to align to this effort. -The value of this transition is substantial, because the differences that exists today's are really confusing for users; -those differences are also making it harder for ecosystem tools to build on top of Cluster API, and in some cases +The value of this transition is substantial, because the differences that exist today are really confusing for users. +These differences are also making it harder for ecosystem tools to build on top of Cluster API, and in some cases even confusing new (and old) contributors. With this proposal Cluster API will close the gap with Kubernetes API conventions in regard to: @@ -234,14 +234,14 @@ With this proposal Cluster API will close the gap with Kubernetes API convention - Use of the `Reason` field is required (currently in Cluster API reasons is added only when condition are false) - Controllers should apply their conditions to a resource the first time they visit the resource, even if the status is `Unknown`. (currently Cluster API controllers add conditions at different stages of the reconcile loops). Please note that: - - If more than one controller add conditions to same resources, conditions managed by the different controllers will be - applied at different time. + - If more than one controller adds conditions to the same resources, conditions managed by the different controllers will be + applied at different times. - Kubernetes API conventions account for exceptions to this rule; for known conditions, the absence of a condition status should be interpreted the same as `Unknown`, and typically indicates that reconciliation has not yet finished. - Cluster API is also dropping its own `Condition` type and will start using `metav1.Conditions` from the Kubernetes API. The last point also has another implication, which is the removal of the `Severity` field which is currently used -to determine priority when merging conditions into the ready summary. +to determine priority when merging conditions into the `Ready` summary condition. However, considering all the work to clean up and improve readiness and availability, now dropping the `Severity` field is not an issue anymore. Let's clarify this with an example: @@ -266,7 +266,7 @@ Following changes are implemented to Machine's status: - Remove `FailureReason` and `FailureMessage` to get rid of the confusing concept of terminal failures - Transition to new, improved, K8s API conventions aligned conditions -Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); +Below you can find the relevant fields in MachineStatus v1beta2, after v1beta1 removal (end state); Below the Go types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. ```golang @@ -292,12 +292,12 @@ type MachineStatus struct { // MachineInitializationStatus provides observations of the Machine initialization process. type MachineInitializationStatus struct { - // BootstrapSecretCreated is true when the bootstrap provider reports that the Machine's boostrap secret is created. + // BootstrapDataSecretCreated is true when the bootstrap provider reports that the Machine's boostrap secret is created. // NOTE: this field is part of the Cluster API contract, and it is used to orchestrate initial Machine provisioning. // The value of this field is never updated after provisioning is completed. // Use conditions to monitor the operational state of the Machine's BootstrapSecret. // +optional - BootstrapSecretCreated bool `json:"bootstrapSecretCreated"` + BootstrapDataSecretCreated bool `json:"bootstrapDataSecretCreated"` // InfrastructureProvisioned is true when the infrastructure provider reports that the Machine's infrastructure is fully provisioned. // NOTE: this field is part of the Cluster API contract, and it is used to orchestrate initial Machine provisioning. @@ -328,19 +328,19 @@ Notes: ##### Machine (New)Conditions -| Condition | Note | -|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `Available` | True if at the machine is Ready for at least MinReady seconds, as defined by the Machine's minReadySeconds field | -| `Ready` | True if Machine's `BootstrapSecretReady`, `InfrastructureReady`, `NodeHealthy` and `HealthCheckSucceeded` (if present) are true; if other conditions are defined in `spec.readinessGates`, those conditions should be true as well for the Machine to be ready. | -| `UpToDate` | True if the Machine spec matches the spec of the Machine's owner resource, e.g KubeadmControlPlane or MachineDeployment | -| `BootstrapConfigReady` | Mirrors the corresponding condition from the Machine's BootstrapConfig resource | -| `InfrastructureReady` | Mirrors the corresponding condition from the Machine's Infrastructure resource | -| `NodeReady` | True if the Machine's Node is ready | -| `NodeHealthy` | True if the Machine's Node is ready and it does not report MemoryPressure, DiskPressure and PIDPressure | -| `HealthCheckSucceeded` | True if MHC instances targeting this machine report the Machine is healthy according to the definition of healthy present in the spec of the Machine Health Check object | -| `OwnerRemediated` | | -| `Deleted` | True if Machine is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | -| `Paused` | True if the Machine or the Cluster it belongs to are paused | +| Condition | Note | +|------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `Available` | True if at the machine is Ready for at least MinReady seconds, as defined by the Machine's minReadySeconds field | +| `Ready` | True if the Machines is not deleted, Machine's `BootstrapConfigReady`, `InfrastructureReady`, `NodeHealthy` and `HealthCheckSucceeded` (if present) are true; if other conditions are defined in `spec.readinessGates`, these conditions must be true as well. | +| `UpToDate` | True if the Machine spec matches the spec of the Machine's owner resource, e.g KubeadmControlPlane or MachineDeployment | +| `BootstrapConfigReady` | Mirrors the corresponding `Ready` condition from the Machine's BootstrapConfig resource | +| `InfrastructureReady` | Mirrors the corresponding `Ready` condition from the Machine's Infrastructure resource | +| `NodeHealthy` | True if the Machine's Node is ready and it does not report MemoryPressure, DiskPressure and PIDPressure | +| `NodeReady` | True if the Machine's Node is ready | +| `HealthCheckSucceeded` | True if MHC instances targeting this machine report the Machine is healthy according to the definition of healthy present in the spec of the MachineHealthCheck object | +| `OwnerRemediated` | | +| `Deleted` | True if Machine is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | +| `Paused` | True if the Machine or the Cluster it belongs to are paused | > To better evaluate proposed changes, below you can find the list of current Machine's conditions: > Ready, InfrastructureReady, BootstrapReady, NodeHealthy, PreDrainDeleteHookSucceeded, VolumeDetachSucceeded, DrainingSucceeded. @@ -371,29 +371,28 @@ Notes: Machine's spec is going to be improved to allow 3rd party components to extend the semantic of the new Machine's `Ready` condition as well as to standardize the concept of Machine's `Availability`. -Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); +Below you can find the relevant fields in MachineSpec v1beta2, after v1beta1 removal (end state); Below the Go types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. ```go type MachineSpec struct { - // MinReadySeconds is the minimum number of seconds for which a Machine should be ready before considering the replica available. - // Defaults to 0 (machine will be considered available as soon as the Node is ready) + // MinReadySeconds is the minimum number of seconds for which a Machine should be ready before considering it available. + // Defaults to 0 (Machine will be considered available as soon as the Machine is ready) // +optional MinReadySeconds int32 `json:"minReadySeconds,omitempty"` - // If specified, all readiness gates will be evaluated for Machine readiness. - // A Machine is ready when `InfrastructureReady`, `NodeHealthy` and `HealthCheckSucceeded` (if present) are "True"; + // If specified, all conditions listed in ReadinessGates will be evaluated for Machine readiness. + // A Machine is ready when `BootstrapConfigReady`, `InfrastructureReady`, `NodeHealthy` and `HealthCheckSucceeded` (if present) are "True"; // if other conditions are defined in this field, those conditions should be "True" as well for the Machine to be ready. - // - // This field can be used e.g. - // - By cluster API control plane providers willing to extend the semantic of the ready condition for the machine they - // control, like the kubeadm control provider adding readinessGates for the APIServerPodHealthy, SchedulerPodHealthy conditions, etc. - // - By external controllers, e.g. responsible to install special software/hardware on the machines and willing - // to include the status of those components into readinessGates (by surfacing new conditions on Machines and - // adding them to ReadinessGates). - // - // responsible to install special software/hardware on the machines doing the same even if they are not actual CAPI controllers + // + // This field can be used e.g. + // - By Cluster API control plane providers to extend the semantic of the Ready condition for the Machine they + // control, like the kubeadm control provider adding ReadinessGates for the APIServerPodHealthy, SchedulerPodHealthy conditions, etc. + // - By external controllers, e.g. responsible to install special software/hardware on the Machines + // to include the status of those components into ReadinessGates (by surfacing new conditions on Machines and + // adding them to ReadinessGates). + // // +optional // +listType=map // +listMapKey=conditionType @@ -402,23 +401,24 @@ type MachineSpec struct { // Other fields... } -// MachineReadinessGate contains the reference to a Machine condition to be used as readiness gates. +// MachineReadinessGate contains the type of a Machine condition to be used as readiness gates. type MachineReadinessGate struct { // ConditionType refers to a condition in the Machine's condition list with matching type. - // Note: Both Cluster API conditions or conditions added by 3rd party controller can be used as readiness gates. + // Note: Both Cluster API conditions or conditions added by 3rd party controllers can be used as readiness gates. ConditionType string `json:"conditionType"` } ``` -| v1beta1 (current) | v1Beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | -|------------------------|-----------------------------|---------------------------------------------------| -| `ReadinessGates` (new) | `ReadinessGates` | `ReadinessGates` | -| other fields... | other fields... | other fields... | +| v1beta1 (current) | v1Beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | +|-------------------------|-----------------------------|---------------------------------------------------| +| `MinReadySeconds` (new) | `MinReadySeconds` | `MinReadySeconds` | +| `ReadinessGates` (new) | `ReadinessGates` | `ReadinessGates` | +| other fields... | other fields... | other fields... | Notes: -- Both `MinReadySeconds` and `ReadinessGates` should be treated as other in-place propagated fields (changing this should not trigger rollouts). +- Both `MinReadySeconds` and `ReadinessGates` should be treated as other in-place propagated fields (changing them should not trigger rollouts). - Similarly to Pod's `ReadinessGates`, also Machine's `ReadinessGates` accept only conditions with positive polarity; - The Cluster API project might revisit this in future to stay aligned with Kubernetes or if there are use cases justifying this change. + The Cluster API project might revisit this in the future to stay aligned with Kubernetes or if there are use cases justifying this change. #### Machine Print columns @@ -431,8 +431,9 @@ Notes: | `PHASE` (deleted) | `PROVIDER ID` | | `AGE` | `READY` (new) | | `VERSION` | `AVAILABLE` (new) | -| | `UP TO DATE` (new) | +| | `UP-TO-DATE` (new) | | | `AGE` | +| | `VERSION` | | | `OS-IMAGE` (new) (*) | | | `KERNEL-VERSION` (new) (*) | | | `CONTAINER-RUNTIME` (new) (*) | @@ -440,10 +441,10 @@ Notes: (*) visible only when using `kubectl get -o wide` Notes: -- Note: print columns are not subject to API guarantee, so we are free to iteratively improve this anytime. -- During the implementation we are going to verify if the resulting layout and eventually make final adjustments to the column list. -- During the implementation we are going to explore if it is possible to add `INTERNAL-IP` (new) (*), `EXTERNAL-IP` after `VERSION` / before `OS-IMAGE`? - might be something like `$.status.addresses[?(@.type == 'InternalIP')].address` works +- Print columns are not subject to any deprecation rule, so it is possible to iteratively improve print columns without waiting for the next API version. +- During the implementation we are going to verify the resulting layout and eventually make final adjustments to the column list. +- During the implementation we are going to explore if it is possible to add `INTERNAL-IP` (new) (*), `EXTERNAL-IP` after `VERSION` / before `OS-IMAGE`. + Might be something like `$.status.addresses[?(@.type == 'InternalIP')].address` works ### Changes to MachineSet resource @@ -451,11 +452,11 @@ Notes: Following changes are implemented to MachineSet's status: -- Update `ReadyReplicas` counter to use the same semantic Machine's `Ready` (today it is computed a Machines with Node Ready) condition and add missing `UpToDateReplicas`. +- Update `ReadyReplicas` counter to use the same semantic as Machine's `Ready` condition (today it is computed based on the Node `Ready` condition) and add missing `UpToDateReplicas`. - Remove `FailureReason` and `FailureMessage` to get rid of the confusing concept of terminal failures - Transition to new, improved, K8s API conventions aligned conditions -Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); +Below you can find the relevant fields in MachineSetStatus v1beta2, after v1beta1 removal (end state). Below the Go types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. ```golang @@ -503,8 +504,8 @@ Notes: Fields in this struct are used for supporting down conversions, thus providing users relying on v1beta1 APIs additional buffer time to pick up the new changes. - This proposal is using `UpToDateReplicas` instead of `UpdatedReplicas`; This is a deliberated choice to avoid confusion between update (any change) and upgrade (change of the Kubernetes versions). -- Also `AvailableReplicas` will determine Machine's availability by reading Machine.Available condition instead of - computing availability as of today, however in this case the semantic of the field is not changed +- Also `AvailableReplicas` will determine Machine's availability via Machine's `Available` condition instead of + computing availability as of today (based on the Node `Ready` condition) #### MachineSet (New)Conditions @@ -514,7 +515,7 @@ Notes: | `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | | `ScalingUp` | True if available replicas < desired replicas | | `ScalingDown` | True if replicas > desired replicas | -| `UpToDate` | True if all the Machines controlled by this MachineSet are up to date (replicas = upToDate replicas) | +| `UpToDate` | True if all the Machines controlled by this MachineSet are up to date (replicas = upToDateReplicas) | | `Remediating` | True if there is at least one Machine controlled by this MachineSet that is not passing health checks | | `Deleted` | True if MachineSet is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | | `Paused` | True if this MachineSet or the Cluster it belongs to are paused | @@ -526,13 +527,13 @@ Notes: - Conditions like `ScalingUp`, `ScalingDown`, `Remediating` are intended to provide visibility on the corresponding lifecycle operation. e.g. If the scaling down operation is being blocked by a machine having issues while deleting, this should surface with a reason/message in the `ScalingDown` condition. -- MachineSet conditions are intentionally mostly consistent with MachineDeployment conditions to help users troubleshooting . +- MachineSet conditions are intentionally mostly consistent with MachineDeployment conditions to help users troubleshooting. - MachineSet is considered as a sort of implementation detail of MachineDeployments, so it doesn't have its own concept of availability. - Similarly, this proposal is dropping the notion of MachineSet readiness because it is preferred to let users focusing on Machines readiness. + Similarly, this proposal is dropping the notion of MachineSet readiness because it is preferred to let users focus on Machines readiness. - When implementing this proposal `UpToDate` condition will be `false` for older MachineSet, `true` for the current MachineSet; in the future this might change in case Cluster API will start supporting in-place upgrades. - `Remediating` for older MachineSets will report that remediation will happen as part of the regular rollout (Cluster API - do not remediate machines on old machine sets, because those machines are already scheduled for deletion). + does not remediate Machines on old MachineSets, because those Machines are already scheduled for deletion). #### MachineSet Print columns @@ -553,8 +554,8 @@ Notes: Notes: - Print columns are not subject to any deprecation rule, so it is possible to iteratively improve print columns without waiting for the next API version. -- During the implementation we are going to verify if the resulting layout and eventually make final adjustments to the column list. -- During the implementation we should consider if to add columns for bootstrapRef and infraRef resource (same could apply to other resources), +- During the implementation we are going to verify the resulting layout and eventually make final adjustments to the column list. +- During the implementation we should consider if to add columns for bootstrapRef and infraRef resource (same could apply to other resources) - In k8s Deployment and ReplicaSet have different print columns for replica counters; this proposal enforces replicas counter columns consistent across all resources. @@ -564,13 +565,13 @@ Notes: Following changes are implemented to MachineDeployment's status: -- Align `UpdatedReplicas` to use Machine's `UpToDate` condition (and rename it accordingly) +- Align `UpdatedReplicas` to use Machine's `UpToDate` condition (and rename it accordingly to `UpToDateReplicas`) - Align to K8s API conventions by deprecating `Phase` - Remove `FailureReason` and `FailureMessage` to get rid of the confusing concept of terminal failures - Transition to new, improved, K8s API conventions aligned conditions -Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); -After golang types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. +Below you can find the relevant fields in MachineDeploymentStatus v1beta2, after v1beta1 removal (end state); +Below the Go types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. ```golang type MachineDeploymentStatus struct { @@ -626,8 +627,8 @@ Notes: | `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | | `ScalingUp` | True if available replicas < desired replicas | | `ScalingDown` | True if replicas > desired replicas | -| `UpToDate` | True if all the Machines controlled by this MachineDeployment are up to date (replicas = upToDate replicas) | -| `Remediating` | True if there is at least one machine controlled by this MachineDeployment is not passing health checks | +| `UpToDate` | True if all the Machines controlled by this MachineDeployment are up to date (replicas = upToDateReplicas) | +| `Remediating` | True if there is at least one Machine controlled by this MachineDeployment that is not passing health checks | | `Deleted` | True if MachineDeployment is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | | `Paused` | True if this MachineDeployment or the Cluster it belongs to are paused | @@ -658,7 +659,7 @@ Notes: Notes: - Print columns are not subject to any deprecation rule, so it is possible to iteratively improve print columns without waiting for the next API version. -- During the implementation we are going to verify if the resulting layout and eventually make final adjustments to the column list. +- During the implementation we are going to verify the resulting layout and eventually make final adjustments to the column list. ### Changes to Cluster resource @@ -673,7 +674,7 @@ Following changes are implemented to Cluster's status: - Add replica counters to surface status of Machines belonging to this Cluster - Surface information about ControlPlane connection heartbeat (see new conditions) -Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); +Below you can find the relevant fields in ClusterStatus v1beta2, after v1beta1 removal (end state); Below the Go types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. ```golang @@ -800,28 +801,28 @@ notes: ##### Cluster (New)Conditions -| Condition | Note | -|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `Available` | True if Cluster `RemoteConnectionProbe` is true, if Cluster's control plane `Available` condition is true, if all MachineDeployment and MachinePool's `Available` condition are true; if conditions are defined in `spec.availabilityGates`, those conditions should be true as well for the Cluster to be available. | -| `ControlPlaneInitialized` | True when the Cluster's control plane is functional enough to accept requests. This information is usually used as a signal for starting all the provisioning operations that depends on a functional API server, but do not require a full HA control plane to exists. | -| `RemoteConnectionProbe` | True when control plane can be reached; in case of connection problems, the condition turns to false only if the the cluster cannot be reached for 40s after the first connection problem is detected (or whatever period is defined in the `--remote-connection-grace-period` flag) the cluster cannot be reached | -| `InfrastructureReady` | Mirror of Cluster's infrastructure `Ready` condition | -| `ControlPlaneAvailable` | Mirror of Cluster's control plane `Available` condition | -| `WorkersAvailable` | Summary of MachineDeployment and MachinePool's `Available` condition | -| `TopologyReconciled` | | -| `ScalingUp` | True if available replicas < desired replicas | -| `ScalingDown` | True if replicas > desired replicas | -| `UpToDate` | True if all the Machines controlled by this Cluster are up to date (replicas = upToDate replicas) | -| `Remediating` | True if there is at least one machine controlled by this Cluster is not passing health checks | -| `Deleted` | True if Cluster is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | -| `Paused` | True if Cluster and all the resources being part of it are paused | +| Condition | Note | +|---------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `Available` | True if Cluster `RemoteConnectionProbe` is true, if Cluster's control plane `Available` condition is true, if all MachineDeployment and MachinePool's `Available` conditions are true; if conditions are defined in `spec.availabilityGates`, those conditions must be true as well. | +| `ControlPlaneInitialized` | True when the Cluster's control plane is functional enough to accept requests. This information is usually used as a signal for starting all the provisioning operations that depends on a functional API server, but do not require a full HA control plane to exists. | +| `RemoteConnectionProbe` | True when control plane can be reached; in case of connection problems, the condition turns to false only if the the cluster cannot be reached for 40s after the first connection problem is detected (or whatever period is defined in the `--remote-connection-grace-period` flag) | +| `InfrastructureReady` | Mirror of Cluster's infrastructure `Ready` condition | +| `ControlPlaneAvailable` | Mirror of Cluster's control plane `Available` condition | +| `WorkersAvailable` | Summary of MachineDeployment and MachinePool's `Available` conditions | +| `TopologyReconciled` | | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `UpToDate` | True if all the Machines controlled by this Cluster are up to date (replicas = upToDateReplicas) | +| `Remediating` | True if there is at least one Machine controlled by this Cluster that is not passing health checks | +| `Deleted` | True if Cluster is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | +| `Paused` | True if Cluster and all the resources being part of it are paused | > To better evaluate proposed changes, below you can find the list of current Cluster's conditions: > Ready, InfrastructureReady, ControlPlaneReady, ControlPlaneInitialized, TopologyReconciled Notes: - Conditions like `ScalingUp`, `ScalingDown`, `Remediating` are intended to provide visibility on the corresponding lifecycle operation. - e.g. If the scaling down operation is being blocked by a machine having issues while deleting, this should surface as a reason/message in + e.g. If the scaling down operation is being blocked by a Machine having issues while deleting, this should surface as a reason/message in the `ScalingDown` condition. - `TopologyReconciled` exists only for classy clusters; this condition is managed by the topology reconciler. - Cluster API is going to maintain a `lastRemoteConnectionProbeTime` and use it in combination with the @@ -831,15 +832,10 @@ Notes: #### Cluster Spec -Cluster's spec is going to be improved to allow 3rd party to extend the semantic of the new Cluster's `Available` condition. - -Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); -After golang types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. +Cluster's spec is going to be improved to allow 3rd parties to extend the semantic of the new Cluster's `Available` condition. -| v1beta1 (current) | v1Beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | -|---------------------------|-----------------------------|---------------------------------------------------| -| `AvailabilityGates` (new) | `AvailabilityGates` | `AvailabilityGates` | -| other fields... | other fields... | other fields... | +Below you can find the relevant fields in ClusterSpec v1beta2, after v1beta1 removal (end state); +Below the Go types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. ```golang type ClusterSpec struct { @@ -857,16 +853,21 @@ type ClusterSpec struct { // Other fields... } -// ClusterAvailabilityGate contains the reference to a Cluster condition to be used as availability gates. +// ClusterAvailabilityGate contains the type of a Cluster condition to be used as availability gate. type ClusterAvailabilityGate struct { // ConditionType refers to a condition in the Cluster's condition list with matching type. - // Note: Both Cluster API conditions or conditions added by 3rd party controller can be used as availability gates. + // Note: Both Cluster API conditions or conditions added by 3rd party controllers can be used as availability gates. ConditionType string `json:"conditionType"` } ``` +| v1beta1 (current) | v1Beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | +|---------------------------|-----------------------------|---------------------------------------------------| +| `AvailabilityGates` (new) | `AvailabilityGates` | `AvailabilityGates` | +| other fields... | other fields... | other fields... | + Notes: -- Similarly to Pod's `ReadinessGates`, also Cluster's `AvailabilityGates` accept only conditions with positive polarity; +- Similarly to Pod's `ReadinessGates`, also Cluster's `AvailabilityGates` accepts only conditions with positive polarity; The Cluster API project might revisit this in the future to stay aligned with Kubernetes or if there are use cases justifying this change. - In future the Cluster API project might consider ways to make `AvailabilityGates` configurable at ClusterClass level, but this can be implemented as a follow-up. @@ -883,12 +884,12 @@ Notes: | | `CP_CURRENT`(new) (*) | | | `CP_READY` (new) (*) | | | `CP_AVAILABLE` (new) | -| | `CP_UP_TO_DATE` (new) | +| | `CP_UP-TO-DATE` (new) | | | `W_DESIRED` (new) | | | `W_CURRENT`(new) (*) | | | `W_READY` (new) (*) | | | `W_AVAILABLE` (new) | -| | `W_UP_TO_DATE` (new) | +| | `W_UP-TO-DATE` (new) | | | `AGE` | | | `VERSION` | @@ -896,7 +897,7 @@ Notes: Notes: - Print columns are not subject to any deprecation rule, so it is possible to iteratively improve print columns without waiting for the next API version. -- During the implementation we are going to verify if the resulting layout and eventually make final adjustments to the column list. +- During the implementation we are going to verify the resulting layout and eventually make final adjustments to the column list. ### Changes to KubeadmControlPlane (KCP) resource @@ -911,7 +912,7 @@ Following changes are implemented to KubeadmControlPlane's status: - Remove `FailureReason` and `FailureMessage` to get rid of the confusing concept of terminal failures - Transition to new, improved, K8s API conventions aligned conditions -Below you can find the relevant fields in Machine Status v1beta2, after v1beta1 removal (end state); +Below you can find the relevant fields in KubeadmControlPlaneStatus v1beta2, after v1beta1 removal (end state); Below the Go types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. ```golang @@ -919,7 +920,7 @@ type KubeadmControlPlaneStatus struct { // The number of ready replicas for this ControlPlane. A machine is considered ready when Machine's Ready condition is true. // Note: In the v1beta1 API version a Machine was counted as ready when the node hosted on the Machine was ready, thus - // generating confusion for users looking at the Machine.Ready condition. + // generating confusion for users looking at the Machine Ready condition. // +optional ReadyReplicas int32 `json:"readyReplicas"` @@ -945,11 +946,11 @@ type KubeadmControlPlaneStatus struct { | v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | |-----------------------------------|----------------------------------------------------------|---------------------------------------------------| | `Ready` (deprecated) | `Ready` (deprecated) | (removed) | -| | `BackCompatibilty` (new) | (removed) | -| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | -| `ExperimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | | `UpdatedReplicas` | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | +| `ExperimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | | `AvailableReplicas` (new) | `AvailableReplicas` | `AvailableReplicas` | +| | `BackCompatibilty` (new) | (removed) | +| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | | `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | | `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | | `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | @@ -970,7 +971,7 @@ TODO: double check usages of status.ready. | `ScalingUp` | True if available replicas < desired replicas | | `ScalingDown` | True if replicas > desired replicas | | `UpToDate` | True if all the Machines controlled by this ControlPlane are up to date | -| `Remediating` | True if there is at least one machine controlled by this KubeadmControlPlane is not passing health checks | +| `Remediating` | True if there is at least one Machine controlled by this KubeadmControlPlane that is not passing health checks | | `Deleted` | True if KubeadmControlPlane is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | | `Paused` | True if this resource or the Cluster it belongs to are paused | @@ -980,11 +981,11 @@ TODO: double check usages of status.ready. Notes: - Conditions like `ScalingUp`, `ScalingDown`, `Remediating` are intended to provide visibility on the corresponding lifecycle operation. - e.g. If the scaling down operation is being blocked by a machine having issues while deleting, this should surface as a reason/message in + e.g. If the scaling down operation is being blocked by a Machine having issues while deleting, this should surface as a reason/message in the `ScalingDown` condition. - The KubeadmControlPlane controller is going to add `ApiServerPodHealthy`, `ControllerManagerPodHealthy`, `SchedulerPodHealthy`, - `EtcdPodHealthy`, `EtcdMemberHealthy`conditions to the controller machines; those conditions will also be defined as `readinessGates` - for computing Machine's ready condition. + `EtcdPodHealthy`, `EtcdMemberHealthy`conditions to the controller machines. These conditions will also be defined as `readinessGates` + for computing Machine's `Ready` condition. #### KubeadmControlPlane Print columns @@ -1006,7 +1007,7 @@ Notes: Notes: - Print columns are not subject to any deprecation rule, so it is possible to iteratively improve print columns without waiting for the next API version. -- During the implementation we are going to verify if the resulting layout and eventually make final adjustments to the column list. +- During the implementation we are going to verify the resulting layout and eventually make final adjustments to the column list. ### Changes to MachinePool resource @@ -1016,12 +1017,12 @@ Following changes are implemented to MachinePool's status: - Disambiguate the usage of the ready term by renaming fields used for the initial provisioning workflow - Update `ReadyReplicas` counter to use the same semantic Machine's `Ready` condition and add missing `UpToDateReplicas`. -- Align Machine pools replica counters to other CAPI resources +- Align MachinePools replica counters to other CAPI resources - Align to K8s API conventions by deprecating `Phase` - Remove `FailureReason` and `FailureMessage` to get rid of the confusing concept of terminal failures - Transition to new, improved, K8s API conventions aligned conditions -Below you can find the relevant fields in MachinePool Status v1beta2, after v1beta1 removal (end state); +Below you can find the relevant fields in MachinePoolStatus v1beta2, after v1beta1 removal (end state); Below the Go types, you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. ```golang @@ -1044,7 +1045,7 @@ type MachinePoolStatus struct { // The value of those fields is never updated after provisioning is completed. // Use conditions to monitor the operational state of the MachinePool. // +optional - Initialization *MachineInitializationStatus `json:"initialization,omitempty"` + Initialization *MachinePoolInitializationStatus `json:"initialization,omitempty"` // Conditions represent the observations of a MachinePool's current state. // +optional @@ -1108,8 +1109,8 @@ Notes: | `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | | `ScalingUp` | True if available replicas < desired replicas | | `ScalingDown` | True if replicas > desired replicas | -| `UpToDate` | True if all the Machines controlled by this MachinePool are up to date (replicas = upToDate replicas) | -| `Remediating` | True if there is at least one machine controlled by this MachinePool is not passing health checks | +| `UpToDate` | True if all the Machines controlled by this MachinePool are up to date (replicas = upToDateReplicas) | +| `Remediating` | True if there is at least one Machine controlled by this MachinePool that is not passing health checks | | `Deleted` | True if MachinePool is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | | `Paused` | True if this MachinePool or the Cluster it belongs to are paused | @@ -1118,7 +1119,7 @@ Notes: Notes: - Conditions like `ScalingUp`, `ScalingDown`, `Remediating` are intended to provide visibility on the corresponding lifecycle operation. - e.g. If the scaling down operation is being blocked by a machine having issues while deleting, this should surface with a reason/message in + e.g. If the scaling down operation is being blocked by a Machine having issues while deleting, this should surface with a reason/message in the `ScalingDown` condition. - As of today MachinePool does not have a notion similar to MachineDeployment's MaxUnavailability. @@ -1141,7 +1142,7 @@ Notes: Notes: - Print columns are not subject to any deprecation rule, so it is possible to iteratively improve print columns without waiting for the next API version. -- During the implementation we are going to verify if the resulting layout and eventually make final adjustments to the column list. +- During the implementation we are going to verify the resulting layout and eventually make final adjustments to the column list. ### Changes to Cluster API contract @@ -1149,10 +1150,10 @@ The Cluster API contract defines a set of rules a provider is expected to comply When the v1beta2 API will be released (tentative Q1 2025), also the Cluster API contract will be bumped to v1beta2. -As defined at the beginning of this document, this proposal is not going to change how the Cluster API contract +As written at the beginning of this document, this proposal is not going to change how the Cluster API contract with infrastructure, bootstrap and control providers currently works (by using status fields). -Similarly, this proposal is not going to change the fact that the Cluster API contract do not require providers to implement +Similarly, this proposal is not going to change the fact that the Cluster API contract does not require providers to implement conditions, even if this is recommended because conditions greatly improve user's experience. However, this proposal is introducing a few changes into the v1beta2 version of the Cluster API contract in order to: @@ -1160,7 +1161,7 @@ However, this proposal is introducing a few changes into the v1beta2 version of - Remove `failureReason` and `failureMessage`. What is worth to notice is that for the first time in the history of the project, this proposal is introducing -a mechanism that allows providers to adapt to new contract incrementally, more specifically: +a mechanism that allows providers to adapt to a new contract incrementally, more specifically: - Providers won't be required to synchronize their changes to adapt to the Cluster API v1beta2 contract with the Cluster API's v1beta2 release. @@ -1173,20 +1174,20 @@ a mechanism that allows providers to adapt to new contract incrementally, more s Additionally: -- Providers implementing conditions won't be required to do the transition from custom Cluster API custom Condition type - to Kubernetes metav1.Conditions type (but this transition is recommended because it improves the consistency of each provider - with Kubernetes, Cluster API, the ecosystem). +- Providers implementing conditions won't be required to do the transition from custom Cluster API Condition type + to Kubernetes `metav1.Conditions` type (but this transition is recommended because it improves the consistency of each provider + with Kubernetes, Cluster API and the ecosystem). - However, providers choosing to keep using Cluster API custom conditions should be aware that starting from the CAPI release when v1beta1 removal will happen (tentative Q1 2026), the Cluster API project will remove the - cluster API condition type, the `util\conditions` package, the code handling conditions in `util\patch.Helper`, - everything related to custom cluster API condition type. + Cluster API condition type, the `util/conditions` package, the code handling conditions in `util/patch.Helper` and + everything related to the custom Cluster API `v1beta.Condition` type. (in other words, Cluster API custom condition must be replaced by provider's own custom conditions). #### Contract for infrastructure providers -Note: given that the contract only defines expected names for fields in a resources at yaml/json level, we are -using those in this paragraph (instead of golang field names). +Note: given that the contract only defines expected names for fields in a resources at YAML/JSON level, we are +using these in this paragraph (instead of golang field names). ##### InfrastructureCluster @@ -1236,7 +1237,7 @@ Notes: The value of this field should never be updated after provisioning is completed, and Cluster API will ignore any changes to it. - InfrastructureMachine's `status.conditions[Ready]` will surface into Machine's `status.conditions[InfrastructureReady]` condition. - InfrastructureMachine's `status.conditions[Ready]` must surface issues during the entire lifecycle of the Machine - (both during initial InfrastructureCluster provisioning and after the initial provisioning is completed). + (both during initial InfrastructureMachine provisioning and after the initial provisioning is completed). #### Contract for bootstrap providers @@ -1250,20 +1251,20 @@ Following changes are planned for the contract for the BootstrapConfig resource: |-----------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------| | `status.ready`, required | `status.ready` (deprecated), one of `status.ready` or `status.initialization.dataSecretCreated`, required | (removed) | | | `status.initialization.dataSecretCreated` (new), one of `status.ready` or `status.initialization.dataSecretCreated`, required | `status.initialization.dataSecretCreated`, required | -| `status.conditions[Ready]`, optional with fall back on `status.ready` | `status.conditions[Ready]`, optional with fall back on `status.ready` or `status.initialization.dataSecretCreated` set | `status.conditions[Ready]`, optional with fall back on `status.initialization.DataSecretCreated` set | +| `status.conditions[Ready]`, optional with fall back on `status.ready` | `status.conditions[Ready]`, optional with fall back on `status.ready` or `status.initialization.dataSecretCreated` set | `status.conditions[Ready]`, optional with fall back on `status.initialization.dataSecretCreated` set | | `status.failureReason`, optional | `status.failureReason` (deprecated), optional | (removed) | | `status.failureMessage`, optional | `status.failureMessage` (deprecated), optional | (removed) | | other fields/rules... | other fields/rules... | | Notes: -- BootstrapConfig's `status.initialization.dataSecretCreated` will surface into Machine's `status.initialization.BootstrapDataSecretCreated` field. +- BootstrapConfig's `status.initialization.dataSecretCreated` will surface into Machine's `status.initialization.bootstrapDataSecretCreated` field. - BootstrapConfig's `status.initialization.dataSecretCreated` must signal the completion of the initial provisioning of the bootstrap data secret. The value of this field should never be updated after provisioning is completed, and Cluster API will ignore any changes to it. - BootstrapConfig's `status.conditions[Ready]` will surface into Machine's `status.conditions[BootstrapConfigReady]` condition. - BootstrapConfig's `status.conditions[Ready]` must surface issues during the entire lifecycle of the BootstrapConfig - (both during initial InfrastructureCluster provisioning and after the initial provisioning is completed). + (both during initial BootstrapConfig provisioning and after the initial provisioning is completed). -#### Contract for control plane Providers +#### Contract for control plane providers Following changes are planned for the contract for the ControlPlane resource: @@ -1272,15 +1273,15 @@ Following changes are planned for the contract for the ControlPlane resource: - Rename `status.initialized` into `status.initialization.controlPlaneInitialized`. - Remove `failureReason` and `failureMessage`. -| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | -|-----------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------| -| `status.ready`, required | `status.ready` (deprecated), one of `status.ready` or `status.initialization.controlPlaneInitialized` required | (removed) | -| `status.initialized`, required | `status.initialization.controlPlaneInitialized` (renamed), one of `status.ready` or `status.initialization.controlPlaneInitialized` required | `status.initialization.controlPlaneInitialized`, required | -| `status.conditions[Ready]`, optional with fall back on `status.ready` | `status.backCompatibilty.conditions[Ready]` (renamed, deprecated), optional with fall back on `status.ready` or `status.Initializiation.ControlPlaneInitialized` set | (removed) | -| | `status.conditions[Available]` (new), optional with fall back optional with fall back on `status.ready` or `status.Initializiation.ControlPlaneInitialized` set | `status.conditions[Available]`, optional with fall back on `status.initializiation.controlPlaneInitialized` | -| `status.failureReason`, optional | `status.failureReason` (deprecated), optional | (removed) | -| `status.failureMessage`, optional | `status.failureMessage` (deprecated), optional | (removed) | -| other fields/rules... | other fields/rules... | | +| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | +|-----------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------| +| `status.ready`, required | `status.ready` (deprecated), one of `status.ready` or `status.initialization.controlPlaneInitialized` required | (removed) | +| `status.initialized`, required | `status.initialization.controlPlaneInitialized` (renamed), one of `status.ready` or `status.initialization.controlPlaneInitialized` required | `status.initialization.controlPlaneInitialized`, required | +| `status.conditions[Ready]`, optional with fall back on `status.ready` | `status.backCompatibilty.conditions[Ready]` (renamed, deprecated), optional with fall back on `status.ready` or `status.initialization.controlPlaneInitialized` set | (removed) | +| | `status.conditions[Available]` (new), optional with fall back optional with fall back on `status.ready` or `status.initialization.controlPlaneInitialized` set | `status.conditions[Available]`, optional with fall back on `status.initializiation.controlPlaneInitialized` | +| `status.failureReason`, optional | `status.failureReason` (deprecated), optional | (removed) | +| `status.failureMessage`, optional | `status.failureMessage` (deprecated), optional | (removed) | +| other fields/rules... | other fields/rules... | | Notes: - ControlPlane's `status.initialization.controlPlaneInitialized` will surface into Cluster's `staus.initialization.controlPlaneInitialized` field; also, @@ -1296,18 +1297,18 @@ Notes: ### Example use cases -This paragraph is a collection of use cases for an improved status in cluster API resources and notes about how this +This paragraph is a collection of use cases for an improved status in Cluster API resources and notes about how this proposal address those use cases. As a cluster admin with MachineDeployment ownership I'd like to understand if my MD is performing a rolling upgrade and why by looking at the MD status/conditions -> The main signal for MD is performing a rolling upgrade will be `MD.Status.Conditions[UpToDate]` false. +> The main signal for MD is performing a rolling upgrade will be `MD.Status.Conditions[UpToDate]`. > At least in the first iteration there won't be a signal at MD level about why rollout is happening, because controlled machines might > have different reasons why they are not UpToDate (and the admin can check those conditions by looking at single machines). > In future iterations of this proposal we might find ways to aggregate those reasons into the message for the `MD.Status.Conditions[UpToDate]` condition. -As a cluster admin with MachineDeployment ownership I'd like to understand why my MD rollout is blocked and why by looking at the MD status/conditions +As a cluster admin with MachineDeployment ownership I'd like to understand why my MD rollout is blocked by looking at the MD status/conditions > `MD.Status.Conditions[ScalingUp]` and `MD.Status.Conditions[ScalingDown]` will give information about how the rollout is being performed, > if there are issues creating or deleting the machines, etc. @@ -1346,7 +1347,7 @@ _This proposal requires a considerable amount of work, and it can be risky to im This proposal intentionally highlights changes that can be implemented before the actual work for the v1beta2 API version starts. -Those changes not only allow will users to take benefits from this work ASAP, but also provides a way to split the work +Those changes will not only allow users to take benefit from this work ASAP, but also provides a way to split the work across more than one release cycle (tentatively two release cycles). ## Alternatives @@ -1364,7 +1365,7 @@ would have prevented this proposal from designing the ideal target state we are Additionally, the idea of dropping all the existing status fields/conditions in the new v1beta2 API (by supporting down conversion), was considered negatively because it implies a sudden, big change both for users and providers. -Instead, we would like to minimize impacts on users and providers by preserving old fields in `BackCompatibility` until v1beta1 removal, +Instead, we would like to minimize impact on users and providers by preserving old fields in `BackCompatibility` until v1beta1 removal, which is ultimately the same process suggested for removal of API fields from graduated APIs. Note: There will still be some impacts because `BackCompatibility` fields will be in a different location from where the From 3b1704d0b2bf4a3d22507851aecc572af9dc828f Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Mon, 29 Jul 2024 18:16:30 +0200 Subject: [PATCH 10/22] Drop replica failure, we can surface this on scaling up --- docs/proposals/improve-status-in-CAPI-resources.md | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index d2f323284f1d..f674889d8bc8 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -511,7 +511,6 @@ Notes: | Condition | Note | |------------------|------------------------------------------------------------------------------------------------------------------| -| `ReplicaFailure` | This condition surfaces issues on creating a Machine replica in Kubernetes, if any. e.g. due to resource quotas. | | `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | | `ScalingUp` | True if available replicas < desired replicas | | `ScalingDown` | True if replicas > desired replicas | @@ -622,8 +621,7 @@ Notes: | Condition | Note | |------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `Available` | True if the MachineDeployment has minimum availability according to parameters specified in the deployment strategy, e.g. If using RollingUpgrade strategy, availableReplicas must be greater or equal than desired replicas - MaxUnavailable replicas | -| `ReplicaFailure` | This condition surfaces issues on creating a MachineSet replica in Kubernetes, if any. e.g. due to resource quotas. | +| `Available` | True if the MachineDeployment has minimum availability according to parameters specified in the deployment strategy, e.g. If using RollingUpgrade strategy, availableReplicas must be greater or equal than desired replicas - MaxUnavailable replicas | | `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | | `ScalingUp` | True if available replicas < desired replicas | | `ScalingDown` | True if replicas > desired replicas | @@ -965,7 +963,6 @@ TODO: double check usages of status.ready. |-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `Available` | True if the control plane can be reached, `EtcdClusterAvailable` is true, and `CertificatesAvailable` is true | | `CertificatesAvailable` | True if all the cluster certificates exist. | -| `ReplicaFailure` | This condition surfaces issues on creating Machines controlled by this KubeadmControlPlane, if any. | | `EtcdClusterAvailable` | This condition surfaces issues to the managed etcd cluster, if any. It is computed as aggregation of Machines's `EtcdMemberHealthy` (if not using an external etcd) conditions plus additional checks validating potential issues to etcd quorum | | `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. Please note this will include also `ApiServerPodHealthy`, `ControllerManagerPodHealthy`, `SchedulerPodHealthy`, and if not using an external etcd also `EtcdPodHealthy`, `EtcdMemberHealthy` | | `ScalingUp` | True if available replicas < desired replicas | @@ -1105,7 +1102,6 @@ Notes: | `Available` | True when `InfrastructureReady` and available replicas >= desired replicas (see notes below) | | `BootstrapConfigReady` | Mirrors the corresponding condition from the MachinePool's BootstrapConfig resource | | `InfrastructureReady` | Mirrors the corresponding condition from the MachinePool's Infrastructure resource | -| `ReplicaFailure` | This condition surfaces issues on creating a Machines replica in Kubernetes, if any. e.g. due to resource quotas. | | `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | | `ScalingUp` | True if available replicas < desired replicas | | `ScalingDown` | True if replicas > desired replicas | From 108f568f619927cd306782bd873758057d14a12c Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Mon, 29 Jul 2024 18:19:11 +0200 Subject: [PATCH 11/22] Bubble up Machine Available instead of Machine Ready --- .../improve-status-in-CAPI-resources.md | 66 +++++++++---------- 1 file changed, 33 insertions(+), 33 deletions(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index f674889d8bc8..16d5a6ced5a3 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -509,15 +509,15 @@ Notes: #### MachineSet (New)Conditions -| Condition | Note | -|------------------|------------------------------------------------------------------------------------------------------------------| -| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | -| `ScalingUp` | True if available replicas < desired replicas | -| `ScalingDown` | True if replicas > desired replicas | -| `UpToDate` | True if all the Machines controlled by this MachineSet are up to date (replicas = upToDateReplicas) | -| `Remediating` | True if there is at least one Machine controlled by this MachineSet that is not passing health checks | -| `Deleted` | True if MachineSet is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | -| `Paused` | True if this MachineSet or the Cluster it belongs to are paused | +| Condition | Note | +|---------------------|----------------------------------------------------------------------------------------------------------------| +| `MachinesAvailable` | This condition surfaces detail of issues on the controlled machines, if any. | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `UpToDate` | True if all the Machines controlled by this MachineSet are up to date (replicas = upToDateReplicas) | +| `Remediating` | True if there is at least one Machine controlled by this MachineSet that is not passing health checks | +| `Deleted` | True if MachineSet is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | +| `Paused` | True if this MachineSet or the Cluster it belongs to are paused | > To better evaluate proposed changes, below you can find the list of current MachineSet's conditions: > Ready, MachinesCreated, Resized, MachinesReady. @@ -619,16 +619,16 @@ Notes: #### MachineDeployment (New)Conditions -| Condition | Note | -|------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `Available` | True if the MachineDeployment has minimum availability according to parameters specified in the deployment strategy, e.g. If using RollingUpgrade strategy, availableReplicas must be greater or equal than desired replicas - MaxUnavailable replicas | -| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | -| `ScalingUp` | True if available replicas < desired replicas | -| `ScalingDown` | True if replicas > desired replicas | -| `UpToDate` | True if all the Machines controlled by this MachineDeployment are up to date (replicas = upToDateReplicas) | -| `Remediating` | True if there is at least one Machine controlled by this MachineDeployment that is not passing health checks | -| `Deleted` | True if MachineDeployment is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | -| `Paused` | True if this MachineDeployment or the Cluster it belongs to are paused | +| Condition | Note | +|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `Available` | True if the MachineDeployment has minimum availability according to parameters specified in the deployment strategy, e.g. If using RollingUpgrade strategy, availableReplicas must be greater or equal than desired replicas - MaxUnavailable replicas | +| `MachinesAvailable` | This condition surfaces detail of issues on the controlled machines, if any. | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `UpToDate` | True if all the Machines controlled by this MachineDeployment are up to date (replicas = upToDateReplicas) | +| `Remediating` | True if there is at least one Machine controlled by this MachineDeployment that is not passing health checks | +| `Deleted` | True if MachineDeployment is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | +| `Paused` | True if this MachineDeployment or the Cluster it belongs to are paused | > To better evaluate proposed changes, below you can find the list of current MachineDeployment's conditions: > Ready, Available. @@ -964,7 +964,7 @@ TODO: double check usages of status.ready. | `Available` | True if the control plane can be reached, `EtcdClusterAvailable` is true, and `CertificatesAvailable` is true | | `CertificatesAvailable` | True if all the cluster certificates exist. | | `EtcdClusterAvailable` | This condition surfaces issues to the managed etcd cluster, if any. It is computed as aggregation of Machines's `EtcdMemberHealthy` (if not using an external etcd) conditions plus additional checks validating potential issues to etcd quorum | -| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. Please note this will include also `ApiServerPodHealthy`, `ControllerManagerPodHealthy`, `SchedulerPodHealthy`, and if not using an external etcd also `EtcdPodHealthy`, `EtcdMemberHealthy` | +| `MachinesAvailable` | This condition surfaces detail of issues on the controlled machines, if any. Please note this will include also `ApiServerPodHealthy`, `ControllerManagerPodHealthy`, `SchedulerPodHealthy`, and if not using an external etcd also `EtcdPodHealthy`, `EtcdMemberHealthy` | | `ScalingUp` | True if available replicas < desired replicas | | `ScalingDown` | True if replicas > desired replicas | | `UpToDate` | True if all the Machines controlled by this ControlPlane are up to date | @@ -1097,18 +1097,18 @@ Notes: ##### MachinePool (New)Conditions -| Condition | Note | -|------------------------|-------------------------------------------------------------------------------------------------------------------| -| `Available` | True when `InfrastructureReady` and available replicas >= desired replicas (see notes below) | -| `BootstrapConfigReady` | Mirrors the corresponding condition from the MachinePool's BootstrapConfig resource | -| `InfrastructureReady` | Mirrors the corresponding condition from the MachinePool's Infrastructure resource | -| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | -| `ScalingUp` | True if available replicas < desired replicas | -| `ScalingDown` | True if replicas > desired replicas | -| `UpToDate` | True if all the Machines controlled by this MachinePool are up to date (replicas = upToDateReplicas) | -| `Remediating` | True if there is at least one Machine controlled by this MachinePool that is not passing health checks | -| `Deleted` | True if MachinePool is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | -| `Paused` | True if this MachinePool or the Cluster it belongs to are paused | +| Condition | Note | +|------------------------|-----------------------------------------------------------------------------------------------------------------| +| `Available` | True when `InfrastructureReady` and available replicas >= desired replicas (see notes below) | +| `BootstrapConfigReady` | Mirrors the corresponding condition from the MachinePool's BootstrapConfig resource | +| `InfrastructureReady` | Mirrors the corresponding condition from the MachinePool's Infrastructure resource | +| `MachinesAvailable` | This condition surfaces detail of issues on the controlled machines, if any. | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `UpToDate` | True if all the Machines controlled by this MachinePool are up to date (replicas = upToDateReplicas) | +| `Remediating` | True if there is at least one Machine controlled by this MachinePool that is not passing health checks | +| `Deleted` | True if MachinePool is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | +| `Paused` | True if this MachinePool or the Cluster it belongs to are paused | > To better evaluate proposed changes, below you can find the list of current MachinePool's conditions: > Ready, BootstrapReady, InfrastructureReady, ReplicasReady. @@ -1311,7 +1311,7 @@ As a cluster admin with MachineDeployment ownership I'd like to understand why m As a cluster admin with MachineDeployment ownership I'd like to understand why Machines are failing to be available by looking at the MD status/conditions -> `MD.Status.Conditions[MachinesReady]` condition will aggregate errors from all the Machines controlled by a MD. +> `MD.Status.Conditions[MachinesAvailable]` condition will aggregate errors from all the Machines controlled by a MD. As a cluster admin with MachineDeployment ownership I'd like to understand why Machines are stuck on deletion looking at the MD status/conditions From 8a0b2ef2c590e98a61b79cf62e456514046c6308 Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Mon, 29 Jul 2024 20:12:10 +0200 Subject: [PATCH 12/22] Clarify timeline --- .../improve-status-in-CAPI-resources.md | 228 +++++++++--------- 1 file changed, 114 insertions(+), 114 deletions(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index 16d5a6ced5a3..c341db4c37b7 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -308,19 +308,19 @@ type MachineInitializationStatus struct { } ``` -| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | -|--------------------------------|----------------------------------------------------------|---------------------------------------------------| -| | `Initialization` (new) | `Initialization` | -| `BootstrapReady` | `Initialization.BootstrapDataSecretCreated` (renamed) | `Initialization.BootstrapDataSecretCreated` | -| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | -| | `BackCompatibilty` (new) | (removed) | -| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | -| `LastUpdated` (deprecated) | `BackCompatibilty.LastUpdated` (renamed) (deprecated) | (removed) | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| other fields... | other fields... | other fields... | +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|--------------------------------|----------------------------------------------------------|----------------------------------------------------| +| | `Initialization` (new) | `Initialization` | +| `BootstrapReady` | `Initialization.BootstrapDataSecretCreated` (renamed) | `Initialization.BootstrapDataSecretCreated` | +| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | +| | `BackCompatibilty` (new) | (removed) | +| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | +| `LastUpdated` (deprecated) | `BackCompatibilty.LastUpdated` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| other fields... | other fields... | other fields... | Notes: - The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). @@ -409,11 +409,11 @@ type MachineReadinessGate struct { } ``` -| v1beta1 (current) | v1Beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | -|-------------------------|-----------------------------|---------------------------------------------------| -| `MinReadySeconds` (new) | `MinReadySeconds` | `MinReadySeconds` | -| `ReadinessGates` (new) | `ReadinessGates` | `ReadinessGates` | -| other fields... | other fields... | other fields... | +| v1beta1 (tentative Dec 2024) | v1Beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|------------------------------|------------------------------|----------------------------------------------------| +| `MinReadySeconds` (new) | `MinReadySeconds` | `MinReadySeconds` | +| `ReadinessGates` (new) | `ReadinessGates` | `ReadinessGates` | +| other fields... | other fields... | other fields... | Notes: - Both `MinReadySeconds` and `ReadinessGates` should be treated as other in-place propagated fields (changing them should not trigger rollouts). @@ -485,19 +485,19 @@ type MachineSetStatus struct { } ``` -| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | -|---------------------------------------|-------------------------------------------------------------|---------------------------------------------------| -| `ExperimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | -| `ExperimentalAvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | -| | `BackCompatibilty` (new) | (removed) | -| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | -| `AvailableReplicas` (deprecated) | `BackCompatibilty.AvailableReplicas` (renamed) (deprecated) | (removed) | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| `UpToDateReplicas` (new) | `UpToDateReplicas` | `UpToDateReplicas` | -| other fields... | other fields... | other fields... | +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|---------------------------------------|-------------------------------------------------------------|----------------------------------------------------| +| `ExperimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | +| `ExperimentalAvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | +| | `BackCompatibilty` (new) | (removed) | +| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | +| `AvailableReplicas` (deprecated) | `BackCompatibilty.AvailableReplicas` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| `UpToDateReplicas` (new) | `UpToDateReplicas` | `UpToDateReplicas` | +| other fields... | other fields... | other fields... | Notes: - The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). @@ -598,20 +598,20 @@ type MachineDeploymentStatus struct { } ``` -| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | -|---------------------------------------|-------------------------------------------------------------|---------------------------------------------------| -| `UpdatedReplicas` | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | -| `ExperimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | -| `ExperimentalAvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | -| | `BackCompatibilty` (new) | (removed) | -| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | -| `AvailableReplicas` (deprecated) | `BackCompatibilty.AvailableReplicas` (renamed) (deprecated) | (removed) | -| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| other fields... | other fields... | other fields... | +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|---------------------------------------|-------------------------------------------------------------|----------------------------------------------------| +| `UpdatedReplicas` | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | +| `ExperimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | +| `ExperimentalAvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | +| | `BackCompatibilty` (new) | (removed) | +| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | +| `AvailableReplicas` (deprecated) | `BackCompatibilty.AvailableReplicas` (renamed) (deprecated) | (removed) | +| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| other fields... | other fields... | other fields... | Notes: - The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). @@ -768,30 +768,30 @@ type WorkersStatus struct { } ``` -| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | -|----------------------------------------|----------------------------------------------------------|---------------------------------------------------| -| | `Initialization` (new) | `Initialization` | -| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | -| `ControlPlaneReady` | `Initialization.ControlPlaneInitialized` (renamed) | `Initialization.ControlPlaneInitialized` | -| | `BackCompatibilty` (new) | (removed) | -| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| `ControlPlane` (new) | `ControlPlane` | `ControlPlane` | -| `ControlPlane.DesiredReplicas` (new) | `ControlPlane.DesiredReplicas` | `ControlPlane.DesiredReplicas` | -| `ControlPlane.Replicas` (new) | `ControlPlane.Replicas` | `ControlPlane.Replicas` | -| `ControlPlane.ReadyReplicas` (new) | `ControlPlane.ReadyReplicas` | `ControlPlane.ReadyReplicas` | -| `ControlPlane.UpToDateReplicas` (new) | `ControlPlane.UpToDateReplicas` | `ControlPlane.UpToDateReplicas` | -| `ControlPlane.AvailableReplicas` (new) | `ControlPlane.AvailableReplicas` | `ControlPlane.AvailableReplicas` | -| `Workers` (new) | `Workers` | `Workers` | -| `Workers.DesiredReplicas` (new) | `Workers.DesiredReplicas` | `Workers.DesiredReplicas` | -| `Workers.Replicas` (new) | `Workers.Replicas` | `Workers.Replicas` | -| `Workers.ReadyReplicas` (new) | `Workers.ReadyReplicas` | `Workers.ReadyReplicas` | -| `Workers.UpToDateReplicas` (new) | `Workers.UpToDateReplicas` | `Workers.UpToDateReplicas` | -| `Workers.AvailableReplicas` (new) | `Workers.AvailableReplicas` | `Workers.AvailableReplicas` | -| other fields... | other fields... | other fields... | +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|----------------------------------------|----------------------------------------------------------|----------------------------------------------------| +| | `Initialization` (new) | `Initialization` | +| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | +| `ControlPlaneReady` | `Initialization.ControlPlaneInitialized` (renamed) | `Initialization.ControlPlaneInitialized` | +| | `BackCompatibilty` (new) | (removed) | +| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| `ControlPlane` (new) | `ControlPlane` | `ControlPlane` | +| `ControlPlane.DesiredReplicas` (new) | `ControlPlane.DesiredReplicas` | `ControlPlane.DesiredReplicas` | +| `ControlPlane.Replicas` (new) | `ControlPlane.Replicas` | `ControlPlane.Replicas` | +| `ControlPlane.ReadyReplicas` (new) | `ControlPlane.ReadyReplicas` | `ControlPlane.ReadyReplicas` | +| `ControlPlane.UpToDateReplicas` (new) | `ControlPlane.UpToDateReplicas` | `ControlPlane.UpToDateReplicas` | +| `ControlPlane.AvailableReplicas` (new) | `ControlPlane.AvailableReplicas` | `ControlPlane.AvailableReplicas` | +| `Workers` (new) | `Workers` | `Workers` | +| `Workers.DesiredReplicas` (new) | `Workers.DesiredReplicas` | `Workers.DesiredReplicas` | +| `Workers.Replicas` (new) | `Workers.Replicas` | `Workers.Replicas` | +| `Workers.ReadyReplicas` (new) | `Workers.ReadyReplicas` | `Workers.ReadyReplicas` | +| `Workers.UpToDateReplicas` (new) | `Workers.UpToDateReplicas` | `Workers.UpToDateReplicas` | +| `Workers.AvailableReplicas` (new) | `Workers.AvailableReplicas` | `Workers.AvailableReplicas` | +| other fields... | other fields... | other fields... | notes: - The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). @@ -859,10 +859,10 @@ type ClusterAvailabilityGate struct { } ``` -| v1beta1 (current) | v1Beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | -|---------------------------|-----------------------------|---------------------------------------------------| -| `AvailabilityGates` (new) | `AvailabilityGates` | `AvailabilityGates` | -| other fields... | other fields... | other fields... | +| v1beta1 (tentative Dec 2024) | v1Beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|------------------------------|------------------------------|----------------------------------------------------| +| `AvailabilityGates` (new) | `AvailabilityGates` | `AvailabilityGates` | +| other fields... | other fields... | other fields... | Notes: - Similarly to Pod's `ReadinessGates`, also Cluster's `AvailabilityGates` accepts only conditions with positive polarity; @@ -941,19 +941,19 @@ type KubeadmControlPlaneStatus struct { } ``` -| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | -|-----------------------------------|----------------------------------------------------------|---------------------------------------------------| -| `Ready` (deprecated) | `Ready` (deprecated) | (removed) | -| `UpdatedReplicas` | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | -| `ExperimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | -| `AvailableReplicas` (new) | `AvailableReplicas` | `AvailableReplicas` | -| | `BackCompatibilty` (new) | (removed) | -| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| other fields... | other fields... | other fields... | +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|-----------------------------------|----------------------------------------------------------|----------------------------------------------------| +| `Ready` (deprecated) | `Ready` (deprecated) | (removed) | +| `UpdatedReplicas` | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | +| `ExperimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | +| `AvailableReplicas` (new) | `AvailableReplicas` | `AvailableReplicas` | +| | `BackCompatibilty` (new) | (removed) | +| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| other fields... | other fields... | other fields... | TODO: double check usages of status.ready. @@ -1073,23 +1073,23 @@ type MachinePoolInitializationStatus struct { } ``` -| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | -|--------------------------------------|-------------------------------------------------------------|---------------------------------------------------| -| | `Initialization` (new) | `Initialization` | -| `BootstrapReady` | `Initialization.BootstrapDataSecretCreated` (renamed) | `Initialization.BootstrapDataSecretCreated` | -| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | -| `UpdatedReplicas` (new) | `UpToDateReplicas` | `UpToDateReplicas` | -| `ExprimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | -| `ExprimentalAvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | -| | `BackCompatibilty` (new) | (removed) | -| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | -| `AvailableReplicas` (deprecated) | `BackCompatibilty.AvailableReplicas` (renamed) (deprecated) | (removed) | -| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| other fields... | other fields... | other fields... | +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|--------------------------------------|-------------------------------------------------------------|----------------------------------------------------| +| | `Initialization` (new) | `Initialization` | +| `BootstrapReady` | `Initialization.BootstrapDataSecretCreated` (renamed) | `Initialization.BootstrapDataSecretCreated` | +| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | +| `UpdatedReplicas` (new) | `UpToDateReplicas` | `UpToDateReplicas` | +| `ExprimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | +| `ExprimentalAvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | +| | `BackCompatibilty` (new) | (removed) | +| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | +| `AvailableReplicas` (deprecated) | `BackCompatibilty.AvailableReplicas` (renamed) (deprecated) | (removed) | +| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | +| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | +| other fields... | other fields... | other fields... | Notes: - The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). @@ -1144,7 +1144,7 @@ Notes: The Cluster API contract defines a set of rules a provider is expected to comply with in order to interact with Cluster API. -When the v1beta2 API will be released (tentative Q1 2025), also the Cluster API contract will be bumped to v1beta2. +When the v1beta2 API will be released (tentative Apr 2025), also the Cluster API contract will be bumped to v1beta2. As written at the beginning of this document, this proposal is not going to change how the Cluster API contract with infrastructure, bootstrap and control providers currently works (by using status fields). @@ -1163,9 +1163,9 @@ a mechanism that allows providers to adapt to a new contract incrementally, more Cluster API's v1beta2 release. - Each provider can implement changes described in the following paragraphs at its own pace, but the transition - _must be completed_ before v1beta1 removal (tentative Q1 2026). + _must be completed_ before v1beta1 removal (tentative Apr 2026). -- Starting from the CAPI release when v1beta1 removal will happen (tentative Q1 2026), providers which are implementing +- Starting from the CAPI release when v1beta1 removal will happen (tentative Apr 2026), providers which are implementing the v1beta1 contract will stop to work (they will work only with older versions of Cluster API). Additionally: @@ -1175,7 +1175,7 @@ Additionally: with Kubernetes, Cluster API and the ecosystem). - However, providers choosing to keep using Cluster API custom conditions should be aware that starting from the - CAPI release when v1beta1 removal will happen (tentative Q1 2026), the Cluster API project will remove the + CAPI release when v1beta1 removal will happen (tentative Apr 2026), the Cluster API project will remove the Cluster API condition type, the `util/conditions` package, the code handling conditions in `util/patch.Helper` and everything related to the custom Cluster API `v1beta.Condition` type. (in other words, Cluster API custom condition must be replaced by provider's own custom conditions). @@ -1193,7 +1193,7 @@ Following changes are planned for the contract for the InfrastructureCluster res - Rename `status.ready` into `status.initialization.provisioned`. - Remove `failureReason` and `failureMessage`. -| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | |-----------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------| | `status.ready`, required | `status.ready` (deprecated), one of `status.ready` or `status.initialization.provisioned` required | (removed) | | | `status.initialization.provisioned` (new), one of `status.ready` or `status.initialization.provisioned` required | `status.initialization.provisioned` | @@ -1218,7 +1218,7 @@ Following changes are planned for the contract for the InfrastructureMachine res - Rename `status.ready` into `status.initialization.provisioned`. - Remove `failureReason` and `failureMessage`. -| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | |-----------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------| | `status.ready`, required | `status.ready` (deprecated), one of `status.ready` or `status.initialization.provisioned` required | (removed) | | | `status.initialization.provisioned` (new), one of `status.ready` or `status.initialization.provisioned` required | `status.initialization.provisioned` | @@ -1243,7 +1243,7 @@ Following changes are planned for the contract for the BootstrapConfig resource: - Rename `status.ready` into `status.initialization.dataSecretCreated`. - Remove `failureReason` and `failureMessage`. -| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | |-----------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------| | `status.ready`, required | `status.ready` (deprecated), one of `status.ready` or `status.initialization.dataSecretCreated`, required | (removed) | | | `status.initialization.dataSecretCreated` (new), one of `status.ready` or `status.initialization.dataSecretCreated`, required | `status.initialization.dataSecretCreated`, required | @@ -1269,7 +1269,7 @@ Following changes are planned for the contract for the ControlPlane resource: - Rename `status.initialized` into `status.initialization.controlPlaneInitialized`. - Remove `failureReason` and `failureMessage`. -| v1beta1 (current) | v1beta2 (tentative Q1 2025) | v1beta2 after v1beta1 removal (tentative Q1 2026) | +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | |-----------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------| | `status.ready`, required | `status.ready` (deprecated), one of `status.ready` or `status.initialization.controlPlaneInitialized` required | (removed) | | `status.initialized`, required | `status.initialization.controlPlaneInitialized` (renamed), one of `status.ready` or `status.initialization.controlPlaneInitialized` required | `status.initialization.controlPlaneInitialized`, required | @@ -1328,13 +1328,13 @@ _Like any API change, this proposal will have impact on Cluster API users_ Mitigations: This proposal abides to Kubernetes deprecation rules, and it also ensures isomorphic conversions to/from v1beta1 APIs -can be supported (until v1beta1 removal, tentative Q1 2026). +can be supported (until v1beta1 removal, tentative Apr 2026). On top of that, a few design decisions have been made with the specific intent to further minimize impact on users and providers e.g. -- The decision to keep `BackCompatibility` fields in v1beta2 API (until v1beta1 removal, tentative Q1 2026). +- The decision to keep `BackCompatibility` fields in v1beta2 API (until v1beta1 removal, tentative Apr 2026). - The decision to allow providers to adopt the Cluster API v1beta2 contract at their own pace (transition _must be completed_ - before v1beta1 removal, tentative Q1 2026). + before v1beta1 removal, tentative Apr 2026). All in all, those decisions are consistent with the fact that in Cluster API we are already treating our APIs (and the Cluster API contract) as fully graduated APIs no matter if they are still beta. From 043e3e78ef58d5d0aaee8d95d2a46fad8315c34e Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Fri, 2 Aug 2024 13:01:48 +0200 Subject: [PATCH 13/22] Add topology reconciled condition to Cluster Availability (only for Classy Clusters) --- .../improve-status-in-CAPI-resources.md | 30 +++++++++---------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index c341db4c37b7..0b5856cd419c 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -799,21 +799,21 @@ notes: ##### Cluster (New)Conditions -| Condition | Note | -|---------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `Available` | True if Cluster `RemoteConnectionProbe` is true, if Cluster's control plane `Available` condition is true, if all MachineDeployment and MachinePool's `Available` conditions are true; if conditions are defined in `spec.availabilityGates`, those conditions must be true as well. | -| `ControlPlaneInitialized` | True when the Cluster's control plane is functional enough to accept requests. This information is usually used as a signal for starting all the provisioning operations that depends on a functional API server, but do not require a full HA control plane to exists. | -| `RemoteConnectionProbe` | True when control plane can be reached; in case of connection problems, the condition turns to false only if the the cluster cannot be reached for 40s after the first connection problem is detected (or whatever period is defined in the `--remote-connection-grace-period` flag) | -| `InfrastructureReady` | Mirror of Cluster's infrastructure `Ready` condition | -| `ControlPlaneAvailable` | Mirror of Cluster's control plane `Available` condition | -| `WorkersAvailable` | Summary of MachineDeployment and MachinePool's `Available` conditions | -| `TopologyReconciled` | | -| `ScalingUp` | True if available replicas < desired replicas | -| `ScalingDown` | True if replicas > desired replicas | -| `UpToDate` | True if all the Machines controlled by this Cluster are up to date (replicas = upToDateReplicas) | -| `Remediating` | True if there is at least one Machine controlled by this Cluster that is not passing health checks | -| `Deleted` | True if Cluster is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | -| `Paused` | True if Cluster and all the resources being part of it are paused | +| Condition | Note | +|---------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `Available` | True if Cluster `RemoteConnectionProbe` is true, if Cluster's control plane `Available` condition is true, if all MachineDeployment and MachinePool's `Available` conditions are true, if `TopologyReconciled` is true (if present); if conditions are defined in `spec.availabilityGates`, those conditions must be true as well. | +| `ControlPlaneInitialized` | True when the Cluster's control plane is functional enough to accept requests. This information is usually used as a signal for starting all the provisioning operations that depends on a functional API server, but do not require a full HA control plane to exists. | +| `RemoteConnectionProbe` | True when control plane can be reached; in case of connection problems, the condition turns to false only if the the cluster cannot be reached for 40s after the first connection problem is detected (or whatever period is defined in the `--remote-connection-grace-period` flag) | +| `InfrastructureReady` | Mirror of Cluster's infrastructure `Ready` condition | +| `ControlPlaneAvailable` | Mirror of Cluster's control plane `Available` condition | +| `WorkersAvailable` | Summary of MachineDeployment and MachinePool's `Available` conditions | +| `TopologyReconciled` | True if the topoology controller is working properly | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `UpToDate` | True if all the Machines controlled by this Cluster are up to date (replicas = upToDateReplicas) | +| `Remediating` | True if there is at least one Machine controlled by this Cluster that is not passing health checks | +| `Deleted` | True if Cluster is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | +| `Paused` | True if Cluster and all the resources being part of it are paused | > To better evaluate proposed changes, below you can find the list of current Cluster's conditions: > Ready, InfrastructureReady, ControlPlaneReady, ControlPlaneInitialized, TopologyReconciled From 7bd46a47eef8d5919dade5b20ce8575019facd6d Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Fri, 2 Aug 2024 13:25:58 +0200 Subject: [PATCH 14/22] Clarify the definition of the `--remote-conditions-grace-period` flag --- docs/proposals/improve-status-in-CAPI-resources.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index 0b5856cd419c..2ce998599668 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -359,8 +359,8 @@ Notes: the Machine's owner controllers to set this condition. - Conditions like `NodeReady` and `NodeHealthy` which depend on the connection to the remote cluster will benefit from the new `RemoteConnectionProbe` condition at cluster level (see [Cluster (New)Conditions](#cluster-newconditions)); - more specifically those condition should be set to `Unknown` after the cluster probe fails - (or after whatever period is defined in the `--remote-conditions-grace-period` flag) + more specifically those condition should be set to `Unknown` when after `lastRemoteConnectionProbeTime` plus the value + defined in the `--remote-conditions-grace-period` flag. - `HealthCheckSucceeded` and `OwnerRemediated` (or `ExternalRemediationRequestAvailable`) conditions are set by the MachineHealthCheck controller in case a MachineHealthCheck targets the machine. - KubeadmControlPlane also adds additional conditions to Machines, but those conditions are not included in the table above From f80a6b860482a230a73106f05294ef9c15079135 Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Wed, 28 Aug 2024 12:30:57 +0200 Subject: [PATCH 15/22] Rollback on dropping status.phases --- .../improve-status-in-CAPI-resources.md | 158 +++++++++--------- 1 file changed, 81 insertions(+), 77 deletions(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index 2ce998599668..482e47c2b53e 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -32,7 +32,8 @@ see-also: - [Non-Goals](#non-goals) - [Proposal](#proposal) - [Readiness and Availability](#readiness-and-availability) - - [Transition to K8s API conventions aligned conditions](#transition-to-k8s-api-conventions-aligned-conditions) + - [Transition to Kubernetes API conventions aligned conditions](#transition-to-kubernetes-api-conventions-aligned-conditions) + - [Phases field/print column](#phases-fieldprint-column) - [Changes to Machine resource](#changes-to-machine-resource) - [Machine Status](#machine-status) - [Machine (New)Conditions](#machine-newconditions) @@ -64,8 +65,8 @@ see-also: - [InfrastructureCluster](#infrastructurecluster) - [InfrastructureMachine](#infrastructuremachine) - [Contract for bootstrap providers](#contract-for-bootstrap-providers) - - [Contract for control plane Providers](#contract-for-control-plane-providers) - - [\[WIP\] Example use cases](#wip-example-use-cases) + - [Contract for control plane providers](#contract-for-control-plane-providers) + - [Example use cases](#example-use-cases) - [Security Model](#security-model) - [Risks and Mitigations](#risks-and-mitigations) - [Alternatives](#alternatives) @@ -127,8 +128,6 @@ At high level, proposed changes to status fields can be grouped in three sets of Some of those changes could be considered straight forward, e.g. -- K8s API conventions suggest to deprecate and remove `phase` fields from status, Cluster API is going to align to this recommendation - (and improve Conditions to provide similar or even better info as a replacement). - K8s resources do not have a concept similar to "terminal failure" in Cluster API resources, and users approaching the project are struggling with this idea. In some cases also provider's implementers are struggling with it. Accordingly, Cluster API resources are dropping `FailureReason` and `FailureMessage` fields. @@ -255,6 +254,13 @@ In case someone wants a more sophisticated control over the process of merging c condition utils in Cluster API will allow developers to plug in custom functions to compute merge priority for a condition, e.g. by looking at status, reason, time since the condition transitioned, etc. +### Phases field/print column + +K8s API conventions suggest to deprecate and remove `phase` fields from status. + +However, Cluster API maintainers decided to not align to this recommendation because there is consensus that +existing `phase` fields provide valuable information to users. + ### Changes to Machine resource #### Machine Status @@ -262,7 +268,6 @@ for a condition, e.g. by looking at status, reason, time since the condition tra Following changes are implemented to Machine's status: - Disambiguate the usage of the ready term by renaming fields used for the initial provisioning workflow -- Align to K8s API conventions by deprecating `Phase` and corresponding `LastUpdated` - Remove `FailureReason` and `FailureMessage` to get rid of the confusing concept of terminal failures - Transition to new, improved, K8s API conventions aligned conditions @@ -286,7 +291,7 @@ type MachineStatus struct { Conditions []metav1.Condition `json:"conditions,omitempty"` // Other fields... - // NOTE: `Phase`, `LastUpdated`, `FailureReason`, `FailureMessage`, `BootstrapReady`, `InfrastructureReady` fields won't be there anymore + // NOTE: `FailureReason`, `FailureMessage`, `BootstrapReady`, `InfrastructureReady` fields won't be there anymore } // MachineInitializationStatus provides observations of the Machine initialization process. @@ -314,7 +319,6 @@ type MachineInitializationStatus struct { | `BootstrapReady` | `Initialization.BootstrapDataSecretCreated` (renamed) | `Initialization.BootstrapDataSecretCreated` | | `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | | | `BackCompatibilty` (new) | (removed) | -| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | | `LastUpdated` (deprecated) | `BackCompatibilty.LastUpdated` (renamed) (deprecated) | (removed) | | `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | | `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | @@ -422,21 +426,22 @@ Notes: #### Machine Print columns -| Current | To be | -|-------------------|-------------------------------| -| `NAME` | `NAME` | -| `CLUSTER` | `CLUSTER` | -| `NODE NAME` | `PAUSED` (new) (*) | -| `PROVIDER ID` | `NODE NAME` | -| `PHASE` (deleted) | `PROVIDER ID` | -| `AGE` | `READY` (new) | -| `VERSION` | `AVAILABLE` (new) | -| | `UP-TO-DATE` (new) | -| | `AGE` | -| | `VERSION` | -| | `OS-IMAGE` (new) (*) | -| | `KERNEL-VERSION` (new) (*) | -| | `CONTAINER-RUNTIME` (new) (*) | +| Current | To be | +|---------------|-------------------------------| +| `NAME` | `NAME` | +| `CLUSTER` | `CLUSTER` | +| `NODE NAME` | `PAUSED` (new) (*) | +| `PROVIDER ID` | `NODE NAME` | +| `PHASE` | `PROVIDER ID` | +| `AGE` | `READY` (new) | +| `VERSION` | `AVAILABLE` (new) | +| | `UP-TO-DATE` (new) | +| | `PHASE` | +| | `AGE` | +| | `VERSION` | +| | `OS-IMAGE` (new) (*) | +| | `KERNEL-VERSION` (new) (*) | +| | `CONTAINER-RUNTIME` (new) (*) | (*) visible only when using `kubectl get -o wide` @@ -565,7 +570,6 @@ Notes: Following changes are implemented to MachineDeployment's status: - Align `UpdatedReplicas` to use Machine's `UpToDate` condition (and rename it accordingly to `UpToDateReplicas`) -- Align to K8s API conventions by deprecating `Phase` - Remove `FailureReason` and `FailureMessage` to get rid of the confusing concept of terminal failures - Transition to new, improved, K8s API conventions aligned conditions @@ -594,7 +598,7 @@ type MachineDeploymentStatus struct { Conditions []metav1.Condition `json:"conditions,omitempty"` // Other fields... - // NOTE: `Phase`, `FailureReason`, `FailureMessage` fields won't be there anymore + // NOTE: `FailureReason`, `FailureMessage` fields won't be there anymore } ``` @@ -606,7 +610,6 @@ type MachineDeploymentStatus struct { | | `BackCompatibilty` (new) | (removed) | | `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | | `AvailableReplicas` (deprecated) | `BackCompatibilty.AvailableReplicas` (renamed) (deprecated) | (removed) | -| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | | `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | | `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | | `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | @@ -649,9 +652,10 @@ Notes: | `READY` | `CURRENT` (*) | | `UPDATED` (renamed) | `READY` | | `UNAVAILABLE` (deleted) | `AVAILABLE` (new) | -| `PHASE` (deleted) | `UP-TO-DATE` (renamed) | -| `AGE` | `AGE` | -| `VERSION` | `VERSION` | +| `PHASE` | `UP-TO-DATE` (renamed) | +| `AGE` | `PHASE` | +| `VERSION` | `AGE` | +| | `VERSION` | (*) visible only when using `kubectl get -o wide` @@ -666,7 +670,6 @@ Notes: Following changes are implemented to Cluster's status: - Disambiguate the usage of the ready term by renaming fields used for the initial provisioning workflow -- Align to K8s API conventions by deprecating `Phase` and corresponding `LastUpdated` - Remove `FailureReason` and `FailureMessage` to get rid of the confusing concept of terminal failures - Transition to new, improved, K8s API conventions aligned conditions - Add replica counters to surface status of Machines belonging to this Cluster @@ -766,6 +769,8 @@ type WorkersStatus struct { // +optional AvailableReplicas int32 `json:"availableReplicas"` } + +// NOTE: `FailureReason`, `FailureMessage` fields won't be there anymore ``` | v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | @@ -774,7 +779,6 @@ type WorkersStatus struct { | `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | | `ControlPlaneReady` | `Initialization.ControlPlaneInitialized` (renamed) | `Initialization.ControlPlaneInitialized` | | | `BackCompatibilty` (new) | (removed) | -| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | | `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | | `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | | `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | @@ -872,24 +876,25 @@ Notes: #### Cluster Print columns -| Current | To be | -|-------------------|-----------------------| -| `NAME` | `NAME` | -| `CLUSTER CLASS` | `CLUSTER CLASS` | -| `PHASE` (deleted) | `PAUSED` (new) (*) | -| `AGE` | `AVAILABLE` (new) | -| `VERSION` | `CP_DESIRED` (new) | -| | `CP_CURRENT`(new) (*) | -| | `CP_READY` (new) (*) | -| | `CP_AVAILABLE` (new) | -| | `CP_UP-TO-DATE` (new) | -| | `W_DESIRED` (new) | -| | `W_CURRENT`(new) (*) | -| | `W_READY` (new) (*) | -| | `W_AVAILABLE` (new) | -| | `W_UP-TO-DATE` (new) | -| | `AGE` | -| | `VERSION` | +| Current | To be | +|-----------------|-----------------------| +| `NAME` | `NAME` | +| `CLUSTER CLASS` | `CLUSTER CLASS` | +| `PHASE` | `PAUSED` (new) (*) | +| `AGE` | `AVAILABLE` (new) | +| `VERSION` | `CP_DESIRED` (new) | +| | `CP_CURRENT`(new) (*) | +| | `CP_READY` (new) (*) | +| | `CP_AVAILABLE` (new) | +| | `CP_UP-TO-DATE` (new) | +| | `W_DESIRED` (new) | +| | `W_CURRENT`(new) (*) | +| | `W_READY` (new) (*) | +| | `W_AVAILABLE` (new) | +| | `W_UP-TO-DATE` (new) | +| | `PHASE` | +| | `AGE` | +| | `VERSION` | (*) visible only when using `kubectl get -o wide` @@ -986,19 +991,19 @@ Notes: #### KubeadmControlPlane Print columns -| Current | To be | -|--------------------------|------------------------| -| `NAME` | `NAME` | -| `CLUSTER` | `CLUSTER` | -| `DESIRED` (*) | `PAUSED` (new) (*) | -| `REPLICAS` | `INITIALIZED` (new) | -| `READY` | `DESIRED` | -| `UPDATED` (renamed) | `CURRENT` (*) | -| ``UNAVAILABLE` (deleted) | `READY` | -| `PHASE` (deleted) | `AVAILABLE` (new) | -| `AGE` | `UP-TO-DATE` (renamed) | -| `VERSION` | `AGE` | -| | `VERSION` | +| Current | To be | +|-------------------------|------------------------| +| `NAME` | `NAME` | +| `CLUSTER` | `CLUSTER` | +| `DESIRED` (*) | `PAUSED` (new) (*) | +| `REPLICAS` | `INITIALIZED` (new) | +| `READY` | `DESIRED` | +| `UPDATED` (renamed) | `CURRENT` (*) | +| `UNAVAILABLE` (deleted) | `READY` | +| `AGE` | `AVAILABLE` (new) | +| `VERSION` | `UP-TO-DATE` (renamed) | +| | `AGE` | +| | `VERSION` | (*) visible only when using `kubectl get -o wide` @@ -1015,7 +1020,6 @@ Following changes are implemented to MachinePool's status: - Disambiguate the usage of the ready term by renaming fields used for the initial provisioning workflow - Update `ReadyReplicas` counter to use the same semantic Machine's `Ready` condition and add missing `UpToDateReplicas`. - Align MachinePools replica counters to other CAPI resources -- Align to K8s API conventions by deprecating `Phase` - Remove `FailureReason` and `FailureMessage` to get rid of the confusing concept of terminal failures - Transition to new, improved, K8s API conventions aligned conditions @@ -1051,7 +1055,7 @@ type MachinePoolStatus struct { Conditions []metav1.Condition `json:"conditions,omitempty"` // Other fields... - // NOTE: `Phase`, `FailureReason`, `FailureMessage`, `BootstrapReady`, `InfrastructureReady` fields won't be there anymore + // NOTE:`FailureReason`, `FailureMessage`, `BootstrapReady`, `InfrastructureReady` fields won't be there anymore } // MachinePoolInitializationStatus provides observations of the MachinePool initialization process. @@ -1084,7 +1088,6 @@ type MachinePoolInitializationStatus struct { | | `BackCompatibilty` (new) | (removed) | | `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | | `AvailableReplicas` (deprecated) | `BackCompatibilty.AvailableReplicas` (renamed) (deprecated) | (removed) | -| `Phase` (deprecated) | `BackCompatibilty.Phase` (renamed) (deprecated) | (removed) | | `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | | `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | | `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | @@ -1121,18 +1124,19 @@ Notes: #### MachinePool Print columns -| Current | To be | -|-------------------|------------------------| -| `NAME` | `NAME` | -| `CLUSTER` | `CLUSTER` | -| `DESIRED` (*) | `PAUSED` (new) (*) | -| `REPLICAS` | `DESIRED` | -| `PHASE` (deleted) | `CURRENT` (*) | -| `AGE` | `READY` | -| `VERSION` | `AVAILABLE` (new) | -| | `UP-TO-DATE` (renamed) | -| | `AGE` | -| | `VERSION` | +| Current | To be | +|---------------|------------------------| +| `NAME` | `NAME` | +| `CLUSTER` | `CLUSTER` | +| `DESIRED` (*) | `PAUSED` (new) (*) | +| `REPLICAS` | `DESIRED` | +| `PHASE` | `CURRENT` (*) | +| `AGE` | `READY` | +| `VERSION` | `AVAILABLE` (new) | +| | `UP-TO-DATE` (renamed) | +| | `PHASE` | +| | `AGE` | +| | `VERSION` | (*) visible only when using `kubectl get -o wide` From 9af5e3d45da626c4789f87e17d64db03b306c79c Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Wed, 28 Aug 2024 12:37:50 +0200 Subject: [PATCH 16/22] Rename MachinesAvailable to MachinesReady --- .../improve-status-in-CAPI-resources.md | 44 +++++++++---------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index 482e47c2b53e..fcfdbe48e010 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -514,15 +514,15 @@ Notes: #### MachineSet (New)Conditions -| Condition | Note | -|---------------------|----------------------------------------------------------------------------------------------------------------| -| `MachinesAvailable` | This condition surfaces detail of issues on the controlled machines, if any. | -| `ScalingUp` | True if available replicas < desired replicas | -| `ScalingDown` | True if replicas > desired replicas | -| `UpToDate` | True if all the Machines controlled by this MachineSet are up to date (replicas = upToDateReplicas) | -| `Remediating` | True if there is at least one Machine controlled by this MachineSet that is not passing health checks | -| `Deleted` | True if MachineSet is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | -| `Paused` | True if this MachineSet or the Cluster it belongs to are paused | +| Condition | Note | +|-----------------|----------------------------------------------------------------------------------------------------------------| +| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `UpToDate` | True if all the Machines controlled by this MachineSet are up to date (replicas = upToDateReplicas) | +| `Remediating` | True if there is at least one Machine controlled by this MachineSet that is not passing health checks | +| `Deleted` | True if MachineSet is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | +| `Paused` | True if this MachineSet or the Cluster it belongs to are paused | > To better evaluate proposed changes, below you can find the list of current MachineSet's conditions: > Ready, MachinesCreated, Resized, MachinesReady. @@ -622,16 +622,16 @@ Notes: #### MachineDeployment (New)Conditions -| Condition | Note | -|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `Available` | True if the MachineDeployment has minimum availability according to parameters specified in the deployment strategy, e.g. If using RollingUpgrade strategy, availableReplicas must be greater or equal than desired replicas - MaxUnavailable replicas | -| `MachinesAvailable` | This condition surfaces detail of issues on the controlled machines, if any. | -| `ScalingUp` | True if available replicas < desired replicas | -| `ScalingDown` | True if replicas > desired replicas | -| `UpToDate` | True if all the Machines controlled by this MachineDeployment are up to date (replicas = upToDateReplicas) | -| `Remediating` | True if there is at least one Machine controlled by this MachineDeployment that is not passing health checks | -| `Deleted` | True if MachineDeployment is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | -| `Paused` | True if this MachineDeployment or the Cluster it belongs to are paused | +| Condition | Note | +|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `Available` | True if the MachineDeployment has minimum availability according to parameters specified in the deployment strategy, e.g. If using RollingUpgrade strategy, availableReplicas must be greater or equal than desired replicas - MaxUnavailable replicas | +| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `UpToDate` | True if all the Machines controlled by this MachineDeployment are up to date (replicas = upToDateReplicas) | +| `Remediating` | True if there is at least one Machine controlled by this MachineDeployment that is not passing health checks | +| `Deleted` | True if MachineDeployment is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | +| `Paused` | True if this MachineDeployment or the Cluster it belongs to are paused | > To better evaluate proposed changes, below you can find the list of current MachineDeployment's conditions: > Ready, Available. @@ -969,7 +969,7 @@ TODO: double check usages of status.ready. | `Available` | True if the control plane can be reached, `EtcdClusterAvailable` is true, and `CertificatesAvailable` is true | | `CertificatesAvailable` | True if all the cluster certificates exist. | | `EtcdClusterAvailable` | This condition surfaces issues to the managed etcd cluster, if any. It is computed as aggregation of Machines's `EtcdMemberHealthy` (if not using an external etcd) conditions plus additional checks validating potential issues to etcd quorum | -| `MachinesAvailable` | This condition surfaces detail of issues on the controlled machines, if any. Please note this will include also `ApiServerPodHealthy`, `ControllerManagerPodHealthy`, `SchedulerPodHealthy`, and if not using an external etcd also `EtcdPodHealthy`, `EtcdMemberHealthy` | +| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. Please note this will include also `ApiServerPodHealthy`, `ControllerManagerPodHealthy`, `SchedulerPodHealthy`, and if not using an external etcd also `EtcdPodHealthy`, `EtcdMemberHealthy` | | `ScalingUp` | True if available replicas < desired replicas | | `ScalingDown` | True if replicas > desired replicas | | `UpToDate` | True if all the Machines controlled by this ControlPlane are up to date | @@ -1105,7 +1105,7 @@ Notes: | `Available` | True when `InfrastructureReady` and available replicas >= desired replicas (see notes below) | | `BootstrapConfigReady` | Mirrors the corresponding condition from the MachinePool's BootstrapConfig resource | | `InfrastructureReady` | Mirrors the corresponding condition from the MachinePool's Infrastructure resource | -| `MachinesAvailable` | This condition surfaces detail of issues on the controlled machines, if any. | +| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | | `ScalingUp` | True if available replicas < desired replicas | | `ScalingDown` | True if replicas > desired replicas | | `UpToDate` | True if all the Machines controlled by this MachinePool are up to date (replicas = upToDateReplicas) | @@ -1315,7 +1315,7 @@ As a cluster admin with MachineDeployment ownership I'd like to understand why m As a cluster admin with MachineDeployment ownership I'd like to understand why Machines are failing to be available by looking at the MD status/conditions -> `MD.Status.Conditions[MachinesAvailable]` condition will aggregate errors from all the Machines controlled by a MD. +> `MD.Status.Conditions[MachinesReady]` condition will aggregate errors from all the Machines controlled by a MD. As a cluster admin with MachineDeployment ownership I'd like to understand why Machines are stuck on deletion looking at the MD status/conditions From c4a3984525e7625f4da7e77d3253bb61e971a68a Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Fri, 30 Aug 2024 12:32:16 +0200 Subject: [PATCH 17/22] More refinements to `MachinesUpToDate`, `Remediating`, `Deleting` --- .../improve-status-in-CAPI-resources.md | 155 +++++++++--------- 1 file changed, 78 insertions(+), 77 deletions(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index fcfdbe48e010..66684d0c15d8 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -332,19 +332,19 @@ Notes: ##### Machine (New)Conditions -| Condition | Note | -|------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `Available` | True if at the machine is Ready for at least MinReady seconds, as defined by the Machine's minReadySeconds field | -| `Ready` | True if the Machines is not deleted, Machine's `BootstrapConfigReady`, `InfrastructureReady`, `NodeHealthy` and `HealthCheckSucceeded` (if present) are true; if other conditions are defined in `spec.readinessGates`, these conditions must be true as well. | -| `UpToDate` | True if the Machine spec matches the spec of the Machine's owner resource, e.g KubeadmControlPlane or MachineDeployment | -| `BootstrapConfigReady` | Mirrors the corresponding `Ready` condition from the Machine's BootstrapConfig resource | -| `InfrastructureReady` | Mirrors the corresponding `Ready` condition from the Machine's Infrastructure resource | -| `NodeHealthy` | True if the Machine's Node is ready and it does not report MemoryPressure, DiskPressure and PIDPressure | -| `NodeReady` | True if the Machine's Node is ready | -| `HealthCheckSucceeded` | True if MHC instances targeting this machine report the Machine is healthy according to the definition of healthy present in the spec of the MachineHealthCheck object | -| `OwnerRemediated` | | -| `Deleted` | True if Machine is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | -| `Paused` | True if the Machine or the Cluster it belongs to are paused | +| Condition | Note | +|------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `Available` | True if at the machine is Ready for at least MinReady seconds, as defined by the Machine's minReadySeconds field | +| `Ready` | True if the Machines is not deleted, Machine's `BootstrapConfigReady`, `InfrastructureReady`, `NodeHealthy` and `HealthCheckSucceeded` (if present) are true; if other conditions are defined in `spec.readinessGates`, these conditions must be true as well | +| `UpToDate` | True if the Machine spec matches the spec of the Machine's owner resource, e.g KubeadmControlPlane or MachineDeployment | +| `BootstrapConfigReady` | Mirrors the corresponding `Ready` condition from the Machine's BootstrapConfig resource | +| `InfrastructureReady` | Mirrors the corresponding `Ready` condition from the Machine's Infrastructure resource | +| `NodeHealthy` | True if the Machine's Node is ready and it does not report MemoryPressure, DiskPressure and PIDPressure | +| `NodeReady` | True if the Machine's Node is ready | +| `HealthCheckSucceeded` | True if MHC instances targeting this machine report the Machine is healthy according to the definition of healthy present in the spec of the MachineHealthCheck object | +| `OwnerRemediated` | True if MHC instances targeting this machine determine that the controller owning this machine should perform remediation | +| `Deleting` | If Machine is deleted, this condition surfaces details about progress in the machine deletion workflow | +| `Paused` | True if the Machine or the Cluster it belongs to are paused | > To better evaluate proposed changes, below you can find the list of current Machine's conditions: > Ready, InfrastructureReady, BootstrapReady, NodeHealthy, PreDrainDeleteHookSucceeded, VolumeDetachSucceeded, DrainingSucceeded. @@ -514,27 +514,27 @@ Notes: #### MachineSet (New)Conditions -| Condition | Note | -|-----------------|----------------------------------------------------------------------------------------------------------------| -| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | -| `ScalingUp` | True if available replicas < desired replicas | -| `ScalingDown` | True if replicas > desired replicas | -| `UpToDate` | True if all the Machines controlled by this MachineSet are up to date (replicas = upToDateReplicas) | -| `Remediating` | True if there is at least one Machine controlled by this MachineSet that is not passing health checks | -| `Deleted` | True if MachineSet is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | -| `Paused` | True if this MachineSet or the Cluster it belongs to are paused | +| Condition | Note | +|--------------------|-------------------------------------------------------------------------------------------------------------| +| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any | +| `MachinesUpToDate` | This condition surfaces details of controlled machines not up to date, if any | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `Remediating` | This condition surfaces details about ongoing remediation of the controlled machines, if any | +| `Deleting` | If MachineSet is deleted, this condition surfaces details about ongoing deletion of the controlled machines | +| `Paused` | True if this MachineSet or the Cluster it belongs to are paused | > To better evaluate proposed changes, below you can find the list of current MachineSet's conditions: > Ready, MachinesCreated, Resized, MachinesReady. Notes: -- Conditions like `ScalingUp`, `ScalingDown`, `Remediating` are intended to provide visibility on the corresponding lifecycle operation. +- Conditions like `ScalingUp`, `ScalingDown`, `Remediating` and `Deleting` are intended to provide visibility on the corresponding lifecycle operation. e.g. If the scaling down operation is being blocked by a machine having issues while deleting, this should surface with a reason/message in the `ScalingDown` condition. - MachineSet conditions are intentionally mostly consistent with MachineDeployment conditions to help users troubleshooting. - MachineSet is considered as a sort of implementation detail of MachineDeployments, so it doesn't have its own concept of availability. Similarly, this proposal is dropping the notion of MachineSet readiness because it is preferred to let users focus on Machines readiness. -- When implementing this proposal `UpToDate` condition will be `false` for older MachineSet, `true` for the current MachineSet; +- When implementing this proposal `MachinesUpToDate` condition will be `false` for older MachineSet, `true` for the current MachineSet; in the future this might change in case Cluster API will start supporting in-place upgrades. - `Remediating` for older MachineSets will report that remediation will happen as part of the regular rollout (Cluster API does not remediate Machines on old MachineSets, because those Machines are already scheduled for deletion). @@ -622,22 +622,22 @@ Notes: #### MachineDeployment (New)Conditions -| Condition | Note | -|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `Available` | True if the MachineDeployment has minimum availability according to parameters specified in the deployment strategy, e.g. If using RollingUpgrade strategy, availableReplicas must be greater or equal than desired replicas - MaxUnavailable replicas | -| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | -| `ScalingUp` | True if available replicas < desired replicas | -| `ScalingDown` | True if replicas > desired replicas | -| `UpToDate` | True if all the Machines controlled by this MachineDeployment are up to date (replicas = upToDateReplicas) | -| `Remediating` | True if there is at least one Machine controlled by this MachineDeployment that is not passing health checks | -| `Deleted` | True if MachineDeployment is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | -| `Paused` | True if this MachineDeployment or the Cluster it belongs to are paused | +| Condition | Note | +|--------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `Available` | True if the MachineDeployment has minimum availability according to parameters specified in the deployment strategy, e.g. If using RollingUpgrade strategy, availableReplicas must be greater or equal than desired replicas - MaxUnavailable replicas | +| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any | +| `MachinesUpToDate` | This condition surfaces details of controlled machines not up to date, if any | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `Remediating` | This condition surfaces details about ongoing remediation of the controlled machines, if any | +| `Deleting` | If MachineDeployment is deleted, this condition surfaces details about ongoing deletion of the controlled machines | +| `Paused` | True if this MachineDeployment or the Cluster it belongs to are paused | > To better evaluate proposed changes, below you can find the list of current MachineDeployment's conditions: > Ready, Available. Notes: -- Conditions like `ScalingUp`, `ScalingDown`, `Remediating` are intended to provide visibility on the corresponding lifecycle operation. +- Conditions like `ScalingUp`, `ScalingDown`, `Remediating` and `Deleting` are intended to provide visibility on the corresponding lifecycle operation. e.g. If the scaling down operation is being blocked by a machine having issues while deleting, this should surface as a reason/message in the `ScalingDown` condition. @@ -803,27 +803,28 @@ notes: ##### Cluster (New)Conditions -| Condition | Note | -|---------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `Available` | True if Cluster `RemoteConnectionProbe` is true, if Cluster's control plane `Available` condition is true, if all MachineDeployment and MachinePool's `Available` conditions are true, if `TopologyReconciled` is true (if present); if conditions are defined in `spec.availabilityGates`, those conditions must be true as well. | -| `ControlPlaneInitialized` | True when the Cluster's control plane is functional enough to accept requests. This information is usually used as a signal for starting all the provisioning operations that depends on a functional API server, but do not require a full HA control plane to exists. | -| `RemoteConnectionProbe` | True when control plane can be reached; in case of connection problems, the condition turns to false only if the the cluster cannot be reached for 40s after the first connection problem is detected (or whatever period is defined in the `--remote-connection-grace-period` flag) | -| `InfrastructureReady` | Mirror of Cluster's infrastructure `Ready` condition | -| `ControlPlaneAvailable` | Mirror of Cluster's control plane `Available` condition | -| `WorkersAvailable` | Summary of MachineDeployment and MachinePool's `Available` conditions | -| `TopologyReconciled` | True if the topoology controller is working properly | -| `ScalingUp` | True if available replicas < desired replicas | -| `ScalingDown` | True if replicas > desired replicas | -| `UpToDate` | True if all the Machines controlled by this Cluster are up to date (replicas = upToDateReplicas) | -| `Remediating` | True if there is at least one Machine controlled by this Cluster that is not passing health checks | -| `Deleted` | True if Cluster is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | -| `Paused` | True if Cluster and all the resources being part of it are paused | +| Condition | Note | +|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `Available` | True if Cluster `RemoteConnectionProbe` is true, if Cluster's control plane `Available` condition is true, if all MachineDeployment and MachinePool's `Available` conditions are true, if `TopologyReconciled` is true (if present); if conditions are defined in `spec.availabilityGates`, those conditions must be true as well | +| `TopologyReconciled` | True if the topology controller is working properly | +| `InfrastructureReady` | Mirror of Cluster's infrastructure `Ready` condition | +| `ControlPlaneInitialized` | True when the Cluster's control plane is functional enough to accept requests. This information is usually used as a signal for starting all the provisioning operations that depends on a functional API server, but do not require a full HA control plane to exists | +| `ControlPlaneAvailable` | Mirror of Cluster's control plane `Available` condition | +| `WorkersAvailable` | Summary of MachineDeployment and MachinePool's `Available` conditions | +| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any | +| `MachinesUpToDate` | This condition surfaces details of Cluster's machines not up to date, if any | +| `RemoteConnectionProbe` | True when control plane can be reached; in case of connection problems, the condition turns to false only if the the cluster cannot be reached for 40s after the first connection problem is detected (or whatever period is defined in the `--remote-connection-grace-period` flag) | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `Remediating` | This condition surfaces details about ongoing remediation of the controlled machines, if any | +| `Deleting` | If Cluster is deleted, this condition surfaces details about ongoing deletion of the controlled machines | +| `Paused` | True if Cluster and all the resources being part of it are paused | > To better evaluate proposed changes, below you can find the list of current Cluster's conditions: > Ready, InfrastructureReady, ControlPlaneReady, ControlPlaneInitialized, TopologyReconciled Notes: -- Conditions like `ScalingUp`, `ScalingDown`, `Remediating` are intended to provide visibility on the corresponding lifecycle operation. +- Conditions like `ScalingUp`, `ScalingDown`, `Remediating` and `Deleting` are intended to provide visibility on the corresponding lifecycle operation. e.g. If the scaling down operation is being blocked by a Machine having issues while deleting, this should surface as a reason/message in the `ScalingDown` condition. - `TopologyReconciled` exists only for classy clusters; this condition is managed by the topology reconciler. @@ -964,25 +965,25 @@ TODO: double check usages of status.ready. #### KubeadmControlPlane (New)Conditions -| Condition | Note | -|-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `Available` | True if the control plane can be reached, `EtcdClusterAvailable` is true, and `CertificatesAvailable` is true | -| `CertificatesAvailable` | True if all the cluster certificates exist. | -| `EtcdClusterAvailable` | This condition surfaces issues to the managed etcd cluster, if any. It is computed as aggregation of Machines's `EtcdMemberHealthy` (if not using an external etcd) conditions plus additional checks validating potential issues to etcd quorum | -| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. Please note this will include also `ApiServerPodHealthy`, `ControllerManagerPodHealthy`, `SchedulerPodHealthy`, and if not using an external etcd also `EtcdPodHealthy`, `EtcdMemberHealthy` | -| `ScalingUp` | True if available replicas < desired replicas | -| `ScalingDown` | True if replicas > desired replicas | -| `UpToDate` | True if all the Machines controlled by this ControlPlane are up to date | -| `Remediating` | True if there is at least one Machine controlled by this KubeadmControlPlane that is not passing health checks | -| `Deleted` | True if KubeadmControlPlane is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | -| `Paused` | True if this resource or the Cluster it belongs to are paused | +| Condition | Note | +|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `Available` | True if the control plane can be reached, `EtcdClusterAvailable` is true, and `CertificatesAvailable` is true | +| `CertificatesAvailable` | True if all the cluster certificates exist. | +| `EtcdClusterAvailable` | This condition surfaces issues to the managed etcd cluster, if any It is computed as aggregation of Machines's `EtcdMemberHealthy` (if not using an external etcd) conditions plus additional checks validating potential issues to etcd quorum | +| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any Please note this will include also `ApiServerPodHealthy`, `ControllerManagerPodHealthy`, `SchedulerPodHealthy`, and if not using an external etcd also `EtcdPodHealthy`, `EtcdMemberHealthy` | +| `MachinesUpToDate` | This condition surfaces details of controlled machines not up to date, if any | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `Remediating` | This condition surfaces details about ongoing remediation of the controlled machines, if any | +| `Deleting` | If KubeadmControlPlane is deleted, this condition surfaces details about ongoing deletion of the controlled machines | +| `Paused` | True if this resource or the Cluster it belongs to are paused | > To better evaluate proposed changes, below you can find the list of current KubeadmControlPlane's conditions: > Ready, CertificatesAvailable, MachinesCreated, Available, MachinesSpecUpToDate, Resized, MachinesReady, > ControlPlaneComponentsHealthy, EtcdClusterHealthy. Notes: -- Conditions like `ScalingUp`, `ScalingDown`, `Remediating` are intended to provide visibility on the corresponding lifecycle operation. +- Conditions like `ScalingUp`, `ScalingDown`, `Remediating` and `Deleting` are intended to provide visibility on the corresponding lifecycle operation. e.g. If the scaling down operation is being blocked by a Machine having issues while deleting, this should surface as a reason/message in the `ScalingDown` condition. - The KubeadmControlPlane controller is going to add `ApiServerPodHealthy`, `ControllerManagerPodHealthy`, `SchedulerPodHealthy`, @@ -1100,24 +1101,24 @@ Notes: ##### MachinePool (New)Conditions -| Condition | Note | -|------------------------|-----------------------------------------------------------------------------------------------------------------| -| `Available` | True when `InfrastructureReady` and available replicas >= desired replicas (see notes below) | -| `BootstrapConfigReady` | Mirrors the corresponding condition from the MachinePool's BootstrapConfig resource | -| `InfrastructureReady` | Mirrors the corresponding condition from the MachinePool's Infrastructure resource | -| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. | -| `ScalingUp` | True if available replicas < desired replicas | -| `ScalingDown` | True if replicas > desired replicas | -| `UpToDate` | True if all the Machines controlled by this MachinePool are up to date (replicas = upToDateReplicas) | -| `Remediating` | True if there is at least one Machine controlled by this MachinePool that is not passing health checks | -| `Deleted` | True if MachinePool is deleted; Reason can be used to observe the cleanup progress when the resource is deleted | -| `Paused` | True if this MachinePool or the Cluster it belongs to are paused | +| Condition | Note | +|------------------------|--------------------------------------------------------------------------------------------------------------| +| `Available` | True when `InfrastructureReady` and available replicas >= desired replicas (see notes below) | +| `BootstrapConfigReady` | Mirrors the corresponding condition from the MachinePool's BootstrapConfig resource | +| `InfrastructureReady` | Mirrors the corresponding condition from the MachinePool's Infrastructure resource | +| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any | +| `MachinesUpToDate` | This condition surfaces details of controlled machines not up to date, if any | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `Remediating` | This condition surfaces details about ongoing remediation of the controlled machines, if any | +| `Deleting` | If MachinePool is deleted, this condition surfaces details about ongoing deletion of the controlled machines | +| `Paused` | True if this MachinePool or the Cluster it belongs to are paused | > To better evaluate proposed changes, below you can find the list of current MachinePool's conditions: > Ready, BootstrapReady, InfrastructureReady, ReplicasReady. Notes: -- Conditions like `ScalingUp`, `ScalingDown`, `Remediating` are intended to provide visibility on the corresponding lifecycle operation. +- Conditions like `ScalingUp`, `ScalingDown`, `Remediating` and `Deleting` are intended to provide visibility on the corresponding lifecycle operation. e.g. If the scaling down operation is being blocked by a Machine having issues while deleting, this should surface with a reason/message in the `ScalingDown` condition. - As of today MachinePool does not have a notion similar to MachineDeployment's MaxUnavailability. From 969d103e48fc190ae01bac94a8b8aa7ace078988 Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Wed, 4 Sep 2024 17:27:25 +0200 Subject: [PATCH 18/22] Introduce nested struct to for a cleaner API surface in phase 1 and 2 --- .../improve-status-in-CAPI-resources.md | 245 ++++++++++-------- 1 file changed, 132 insertions(+), 113 deletions(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index 66684d0c15d8..f5700369c337 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -313,21 +313,23 @@ type MachineInitializationStatus struct { } ``` -| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | -|--------------------------------|----------------------------------------------------------|----------------------------------------------------| -| | `Initialization` (new) | `Initialization` | -| `BootstrapReady` | `Initialization.BootstrapDataSecretCreated` (renamed) | `Initialization.BootstrapDataSecretCreated` | -| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | -| | `BackCompatibilty` (new) | (removed) | -| `LastUpdated` (deprecated) | `BackCompatibilty.LastUpdated` (renamed) (deprecated) | (removed) | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| other fields... | other fields... | other fields... | +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|-------------------------------|------------------------------------------------------------|----------------------------------------------------| +| | `Initialization` (new) | `Initialization` | +| `BootstrapReady` | `Initialization.BootstrapDataSecretCreated` (renamed) | `Initialization.BootstrapDataSecretCreated` | +| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | +| `V1Beta2` (new) | (removed) | (removed) | +| `V1Beta2.Conditions` (new) | `Conditions` (renamed) | `Conditions` | +| | `Deprecated.V1Beta1` (new) | (removed) | +| `FailureReason` (deprecated) | `Deprecated.V1Beta1.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `Deprecated.V1Beta1.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` (deprecated) | `Deprecated.V1Beta1.Conditions` (renamed) (deprecated) | (removed) | +| other fields... | other fields... | other fields... | Notes: -- The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). +- The `V1Beta2` struct is going to be added to in v1beta1 types in order to provide a preview of changes coming with the v1beta2 types, but without impacting the semantic of existing fields. + Fields in the `V1Beta2` will be promoted to status top level fields in the v1beta2 types. +- The `Deprecated` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). Fields in this struct are used for supporting down conversions, thus providing users relying on v1beta1 APIs additional buffer time to pick up the new changes. ##### Machine (New)Conditions @@ -490,22 +492,25 @@ type MachineSetStatus struct { } ``` -| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | -|---------------------------------------|-------------------------------------------------------------|----------------------------------------------------| -| `ExperimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | -| `ExperimentalAvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | -| | `BackCompatibilty` (new) | (removed) | -| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | -| `AvailableReplicas` (deprecated) | `BackCompatibilty.AvailableReplicas` (renamed) (deprecated) | (removed) | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| `UpToDateReplicas` (new) | `UpToDateReplicas` | `UpToDateReplicas` | -| other fields... | other fields... | other fields... | +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|-----------------------------------|---------------------------------------------------------------|----------------------------------------------------| +| `V1Beta2` (new) | (removed) | (removed) | +| `V1Beta2.Conditions` (new) | `Conditions` (renamed) | `Conditions` | +| `V1Beta2.ReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | +| `V1Beta2.AvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | +| `V1Beta2.UpToDateReplicas` (new) | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | +| | `Deprecated.V1Beta1` (new) | (removed) | +| `ReadyReplicas` | `Deprecated.V1Beta1.ReadyReplicas` (renamed) (deprecated) | (removed) | +| `AvailableReplicas` | `Deprecated.V1Beta1.AvailableReplicas` (renamed) (deprecated) | (removed) | +| `FailureReason` | `Deprecated.V1Beta1.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` | `Deprecated.V1Beta1.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` | `Deprecated.V1Beta1.Conditions` (renamed) (deprecated) | (removed) | +| other fields... | other fields... | other fields... | Notes: -- The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). +- The `V1Beta2` struct is going to be added to in v1beta1 types in order to provide a preview of changes coming with the v1beta2 types, but without impacting the semantic of existing fields. + Fields in the `V1Beta2` will be promoted to status top level fields in the v1beta2 types. +- The `Deprecated` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). Fields in this struct are used for supporting down conversions, thus providing users relying on v1beta1 APIs additional buffer time to pick up the new changes. - This proposal is using `UpToDateReplicas` instead of `UpdatedReplicas`; This is a deliberated choice to avoid confusion between update (any change) and upgrade (change of the Kubernetes versions). @@ -602,22 +607,25 @@ type MachineDeploymentStatus struct { } ``` -| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | -|---------------------------------------|-------------------------------------------------------------|----------------------------------------------------| -| `UpdatedReplicas` | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | -| `ExperimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | -| `ExperimentalAvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | -| | `BackCompatibilty` (new) | (removed) | -| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | -| `AvailableReplicas` (deprecated) | `BackCompatibilty.AvailableReplicas` (renamed) (deprecated) | (removed) | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| other fields... | other fields... | other fields... | +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|----------------------------------|---------------------------------------------------------------|----------------------------------------------------| +| `V1Beta2` (new) | (removed) | (removed) | +| `V1Beta2.Conditions` (new) | `Conditions` (renamed) | `Conditions` | +| `V1Beta2.ReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | +| `V1Beta2.vailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | +| | `Deprecated.V1Beta1` (new) | (removed) | +| `ReadyReplicas` (deprecated) | `Deprecated.V1Beta1.ReadyReplicas` (renamed) (deprecated) | (removed) | +| `AvailableReplicas` (deprecated) | `Deprecated.V1Beta1.AvailableReplicas` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `Deprecated.V1Beta1.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `Deprecated.V1Beta1.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` (deprecated) | `Deprecated.V1Beta1.Conditions` (renamed) (deprecated) | (removed) | +| `UpdatedReplicas` | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | +| other fields... | other fields... | other fields... | Notes: -- The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). +- The `V1Beta2` struct is going to be added to in v1beta1 types in order to provide a preview of changes coming with the v1beta2 types, but without impacting the semantic of existing fields. + Fields in the `V1Beta2` will be promoted to status top level fields in the v1beta2 types. +- The `Deprecated` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). Fields in this struct are used for supporting down conversions, thus providing users relying on v1beta1 APIs additional buffer time to pick up the new changes. #### MachineDeployment (New)Conditions @@ -773,32 +781,35 @@ type WorkersStatus struct { // NOTE: `FailureReason`, `FailureMessage` fields won't be there anymore ``` -| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | -|----------------------------------------|----------------------------------------------------------|----------------------------------------------------| -| | `Initialization` (new) | `Initialization` | -| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | -| `ControlPlaneReady` | `Initialization.ControlPlaneInitialized` (renamed) | `Initialization.ControlPlaneInitialized` | -| | `BackCompatibilty` (new) | (removed) | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| `ControlPlane` (new) | `ControlPlane` | `ControlPlane` | -| `ControlPlane.DesiredReplicas` (new) | `ControlPlane.DesiredReplicas` | `ControlPlane.DesiredReplicas` | -| `ControlPlane.Replicas` (new) | `ControlPlane.Replicas` | `ControlPlane.Replicas` | -| `ControlPlane.ReadyReplicas` (new) | `ControlPlane.ReadyReplicas` | `ControlPlane.ReadyReplicas` | -| `ControlPlane.UpToDateReplicas` (new) | `ControlPlane.UpToDateReplicas` | `ControlPlane.UpToDateReplicas` | -| `ControlPlane.AvailableReplicas` (new) | `ControlPlane.AvailableReplicas` | `ControlPlane.AvailableReplicas` | -| `Workers` (new) | `Workers` | `Workers` | -| `Workers.DesiredReplicas` (new) | `Workers.DesiredReplicas` | `Workers.DesiredReplicas` | -| `Workers.Replicas` (new) | `Workers.Replicas` | `Workers.Replicas` | -| `Workers.ReadyReplicas` (new) | `Workers.ReadyReplicas` | `Workers.ReadyReplicas` | -| `Workers.UpToDateReplicas` (new) | `Workers.UpToDateReplicas` | `Workers.UpToDateReplicas` | -| `Workers.AvailableReplicas` (new) | `Workers.AvailableReplicas` | `Workers.AvailableReplicas` | -| other fields... | other fields... | other fields... | - -notes: -- The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|------------------------------------------------|------------------------------------------------------------|----------------------------------------------------| +| | `Initialization` (new) | `Initialization` | +| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | +| `ControlPlaneReady` | `Initialization.ControlPlaneInitialized` (renamed) | `Initialization.ControlPlaneInitialized` | +| `V1Beta2` (new) | (removed) | (removed) | +| `V1Beta2.Conditions` (new) | `Conditions` (renamed) | `Conditions` | +| `V1Beta2.ControlPlane` (new) | `ControlPlane` (renamed) | `ControlPlane` | +| `V1Beta2.ControlPlane.DesiredReplicas` (new) | `ControlPlane.DesiredReplicas` (renamed) | `ControlPlane.DesiredReplicas` | +| `V1Beta2.ControlPlane.Replicas` (new) | `ControlPlane.Replicas` (renamed) | `ControlPlane.Replicas` | +| `V1Beta2.ControlPlane.ReadyReplicas` (new) | `ControlPlane.ReadyReplicas` (renamed) | `ControlPlane.ReadyReplicas` | +| `V1Beta2.ControlPlane.UpToDateReplicas` (new) | `ControlPlane.UpToDateReplicas` (renamed) | `ControlPlane.UpToDateReplicas` | +| `V1Beta2.ControlPlane.AvailableReplicas` (new) | `ControlPlane.AvailableReplicas` (renamed) | `ControlPlane.AvailableReplicas` | +| `V1Beta2.Workers` (new) | `Workers` (renamed) | `Workers` | +| `V1Beta2.Workers.DesiredReplicas` (new) | `Workers.DesiredReplicas` (renamed) | `Workers.DesiredReplicas` | +| `V1Beta2.Workers.Replicas` (new) | `Workers.Replicas` (renamed) | `Workers.Replicas` | +| `V1Beta2.Workers.ReadyReplicas` (new) | `Workers.ReadyReplicas` (renamed) | `Workers.ReadyReplicas` | +| `V1Beta2.Workers.UpToDateReplicas` (new) | `Workers.UpToDateReplicas` (renamed) | `Workers.UpToDateReplicas` | +| `V1Beta2.Workers.AvailableReplicas` (new) | `Workers.AvailableReplicas` (renamed) | `Workers.AvailableReplicas` | +| | `Deprecated.V1Beta1` (new) | (removed) | +| `FailureReason` (deprecated) | `Deprecated.V1Beta1.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `Deprecated.V1Beta1.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` (deprecated) | `Deprecated.V1Beta1.Conditions` (renamed) (deprecated) | (removed) | +| other fields... | other fields... | other fields... | + +Notes: +- The `V1Beta2` struct is going to be added to in v1beta1 types in order to provide a preview of changes coming with the v1beta2 types, but without impacting the semantic of existing fields. + Fields in the `V1Beta2` will be promoted to status top level fields in the v1beta2 types. +- The `Deprecated` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). Fields in this struct are used for supporting down conversions, thus providing users relying on v1beta1 APIs additional buffer time to pick up the new changes. ##### Cluster (New)Conditions @@ -813,7 +824,7 @@ notes: | `WorkersAvailable` | Summary of MachineDeployment and MachinePool's `Available` conditions | | `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any | | `MachinesUpToDate` | This condition surfaces details of Cluster's machines not up to date, if any | -| `RemoteConnectionProbe` | True when control plane can be reached; in case of connection problems, the condition turns to false only if the the cluster cannot be reached for 40s after the first connection problem is detected (or whatever period is defined in the `--remote-connection-grace-period` flag) | +| `RemoteConnectionProbe` | True when control plane can be reached; in case of connection problems, the condition turns to false only if the the cluster cannot be reached for 50s after the first connection problem is detected (or whatever period is defined in the `--remote-connection-grace-period` flag) | | `ScalingUp` | True if available replicas < desired replicas | | `ScalingDown` | True if replicas > desired replicas | | `Remediating` | This condition surfaces details about ongoing remediation of the controlled machines, if any | @@ -947,21 +958,26 @@ type KubeadmControlPlaneStatus struct { } ``` -| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | -|-----------------------------------|----------------------------------------------------------|----------------------------------------------------| -| `Ready` (deprecated) | `Ready` (deprecated) | (removed) | -| `UpdatedReplicas` | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | -| `ExperimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | -| `AvailableReplicas` (new) | `AvailableReplicas` | `AvailableReplicas` | -| | `BackCompatibilty` (new) | (removed) | -| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| other fields... | other fields... | other fields... | - -TODO: double check usages of status.ready. +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|-----------------------------------|------------------------------------------------------------|----------------------------------------------------| +| `Ready` (deprecated) | `Ready` (deprecated) | (removed) | +| `V1Beta2` (new) | (removed) | (removed) | +| `V1Beta2.Conditions` (new) | `Conditions` (renamed) | `Conditions` | +| `V1Beta2.ReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | +| `V1Beta2.AvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | +| | `Deprecated.V1Beta1` (new) | (removed) | +| `ReadyReplicas` (deprecated) | `Deprecated.V1Beta1.ReadyReplicas` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `Deprecated.V1Beta1.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `Deprecated.V1Beta1.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` (deprecated) | `Deprecated.V1Beta1.Conditions` (renamed) (deprecated) | (removed) | +| `UpdatedReplicas` | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | +| other fields... | other fields... | other fields... | + +Notes: +- The `V1Beta2` struct is going to be added to in v1beta1 types in order to provide a preview of changes coming with the v1beta2 types, but without impacting the semantic of existing fields. + Fields in the `V1Beta2` will be promoted to status top level fields in the v1beta2 types. +- The `Deprecated` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). + Fields in this struct are used for supporting down conversions, thus providing users relying on v1beta1 APIs additional buffer time to pick up the new changes. #### KubeadmControlPlane (New)Conditions @@ -1078,25 +1094,28 @@ type MachinePoolInitializationStatus struct { } ``` -| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | -|--------------------------------------|-------------------------------------------------------------|----------------------------------------------------| -| | `Initialization` (new) | `Initialization` | -| `BootstrapReady` | `Initialization.BootstrapDataSecretCreated` (renamed) | `Initialization.BootstrapDataSecretCreated` | -| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | -| `UpdatedReplicas` (new) | `UpToDateReplicas` | `UpToDateReplicas` | -| `ExprimentalReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | -| `ExprimentalAvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | -| | `BackCompatibilty` (new) | (removed) | -| `ReadyReplicas` (deprecated) | `BackCompatibilty.ReadyReplicas` (renamed) (deprecated) | (removed) | -| `AvailableReplicas` (deprecated) | `BackCompatibilty.AvailableReplicas` (renamed) (deprecated) | (removed) | -| `FailureReason` (deprecated) | `BackCompatibilty.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `BackCompatibilty.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `BackCompatibilty.Conditions` (renamed) (deprecated) | (removed) | -| `ExperimentalConditions` (new) | `Conditions` (renamed) | `Conditions` | -| other fields... | other fields... | other fields... | +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|-----------------------------------|---------------------------------------------------------------|----------------------------------------------------| +| | `Initialization` (new) | `Initialization` | +| `BootstrapReady` | `Initialization.BootstrapDataSecretCreated` (renamed) | `Initialization.BootstrapDataSecretCreated` | +| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | +| `V1Beta2` (new) | (removed) | (removed) | +| `V1Beta2.Conditions` (new) | `Conditions` (renamed) | `Conditions` | +| `V1Beta2.UpdatedReplicas` (new) | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | +| `V1Beta2.ReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | +| `V1Beta2.AvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | +| | `Deprecated.V1Beta1` (new) | (removed) | +| `ReadyReplicas` (deprecated) | `Deprecated.V1Beta1.ReadyReplicas` (renamed) (deprecated) | (removed) | +| `AvailableReplicas` (deprecated) | `Deprecated.V1Beta1.AvailableReplicas` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `Deprecated.V1Beta1.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `Deprecated.V1Beta1.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` (deprecated) | `Deprecated.V1Beta1.Conditions` (renamed) (deprecated) | (removed) | +| other fields... | other fields... | other fields... | Notes: -- The `BackCompatibilty` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). +- The `V1Beta2` struct is going to be added to in v1beta1 types in order to provide a preview of changes coming with the v1beta2 types, but without impacting the semantic of existing fields. + Fields in the `V1Beta2` will be promoted to status top level fields in the v1beta2 types. +- The `Deprecated` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). Fields in this struct are used for supporting down conversions, thus providing users relying on v1beta1 APIs additional buffer time to pick up the new changes. ##### MachinePool (New)Conditions @@ -1274,15 +1293,15 @@ Following changes are planned for the contract for the ControlPlane resource: - Rename `status.initialized` into `status.initialization.controlPlaneInitialized`. - Remove `failureReason` and `failureMessage`. -| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | -|-----------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------| -| `status.ready`, required | `status.ready` (deprecated), one of `status.ready` or `status.initialization.controlPlaneInitialized` required | (removed) | -| `status.initialized`, required | `status.initialization.controlPlaneInitialized` (renamed), one of `status.ready` or `status.initialization.controlPlaneInitialized` required | `status.initialization.controlPlaneInitialized`, required | -| `status.conditions[Ready]`, optional with fall back on `status.ready` | `status.backCompatibilty.conditions[Ready]` (renamed, deprecated), optional with fall back on `status.ready` or `status.initialization.controlPlaneInitialized` set | (removed) | -| | `status.conditions[Available]` (new), optional with fall back optional with fall back on `status.ready` or `status.initialization.controlPlaneInitialized` set | `status.conditions[Available]`, optional with fall back on `status.initializiation.controlPlaneInitialized` | -| `status.failureReason`, optional | `status.failureReason` (deprecated), optional | (removed) | -| `status.failureMessage`, optional | `status.failureMessage` (deprecated), optional | (removed) | -| other fields/rules... | other fields/rules... | | +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|-----------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------| +| `status.ready`, required | `status.ready` (deprecated), one of `status.ready` or `status.initialization.controlPlaneInitialized` required | (removed) | +| `status.initialized`, required | `status.initialization.controlPlaneInitialized` (renamed), one of `status.ready` or `status.initialization.controlPlaneInitialized` required | `status.initialization.controlPlaneInitialized`, required | +| `status.conditions[Ready]`, optional with fall back on `status.ready` | `status.deprecated.v1beta1.conditions[Ready]` (renamed, deprecated), optional with fall back on `status.ready` or `status.initialization.controlPlaneInitialized` set | (removed) | +| | `status.conditions[Available]` (new), optional with fall back optional with fall back on `status.ready` or `status.initialization.controlPlaneInitialized` set | `status.conditions[Available]`, optional with fall back on `status.initializiation.controlPlaneInitialized` | +| `status.failureReason`, optional | `status.failureReason` (deprecated), optional | (removed) | +| `status.failureMessage`, optional | `status.failureMessage` (deprecated), optional | (removed) | +| other fields/rules... | other fields/rules... | | Notes: - ControlPlane's `status.initialization.controlPlaneInitialized` will surface into Cluster's `staus.initialization.controlPlaneInitialized` field; also, @@ -1337,7 +1356,7 @@ can be supported (until v1beta1 removal, tentative Apr 2026). On top of that, a few design decisions have been made with the specific intent to further minimize impact on users and providers e.g. -- The decision to keep `BackCompatibility` fields in v1beta2 API (until v1beta1 removal, tentative Apr 2026). +- The decision to keep `Deprecated` fields in v1beta2 API (until v1beta1 removal, tentative Apr 2026). - The decision to allow providers to adopt the Cluster API v1beta2 contract at their own pace (transition _must be completed_ before v1beta1 removal, tentative Apr 2026). @@ -1358,7 +1377,7 @@ _Keep Cluster API custom condition types, eventually improve them incrementally_ This idea was considered, but ultimately discarded because the end state we are aiming for is to align to Kubernetes. Therefore, the sooner, the better, and the opportunity materialized when discussing the scope for v1beta2 API version. -_Implement down conversion instead of maintaining `BackCompatibility` fields_ +_Implement down conversion instead of maintaining `Deprecated` fields_ This idea was considered, but discarded because the constraint of ensuring down conversion for every new field/condition would have prevented this proposal from designing the ideal target state we are aiming to. @@ -1366,10 +1385,10 @@ would have prevented this proposal from designing the ideal target state we are Additionally, the idea of dropping all the existing status fields/conditions in the new v1beta2 API (by supporting down conversion), was considered negatively because it implies a sudden, big change both for users and providers. -Instead, we would like to minimize impact on users and providers by preserving old fields in `BackCompatibility` until v1beta1 removal, +Instead, we would like to minimize impact on users and providers by preserving old fields in `Deprecated` until v1beta1 removal, which is ultimately the same process suggested for removal of API fields from graduated APIs. -Note: There will still be some impacts because `BackCompatibility` fields will be in a different location from where the +Note: There will still be some impacts because `Deprecated` fields will be in a different location from where the original fields was, but this should be easier to handle than being forced to immediately adapt the new status fields/conditions. ## Upgrade Strategy From a2506a7f4c26b72be38250a53c0cf0b2ac1e8861 Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Fri, 6 Sep 2024 16:59:48 +0200 Subject: [PATCH 19/22] nits --- .../improve-status-in-CAPI-resources.md | 154 +++++++++--------- 1 file changed, 79 insertions(+), 75 deletions(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index f5700369c337..c1dc29f18807 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -113,7 +113,7 @@ Kubernetes, and ideally with the entire ecosystem. - Resolving all the idiosyncrasies that exists in Cluster API, core Kubernetes, the rest of the ecosystem. (Let’s stay focused on Cluster API and keep improving incrementally). -- To change how the Cluster API contract with infrastructure, bootstrap and control providers currently works +- To change fundamental way how the Cluster API contract with infrastructure, bootstrap and control providers currently works (by using status fields). ## Proposal @@ -198,8 +198,8 @@ will now clearly represent the "machine can host workloads" (prior art Kubernete To improve the benefit of this change: - This proposal is ensuring that whenever Machine ready is used, it always means the same thing (e.g. ready replica counters) -- This proposal is also changing contract fields where ready was used improperly to represent - initial provisioning (k8s API conventions suggest to use ready only for long-running process). +- This proposal is also changing contract fields where ready was used to represent initial provisioning of infrastructure + or bootstrap secrets (so ready had different meanings). All in all, Machine's Ready concept should be much more clear, consistent, intuitive after proposed changes. But there is more. @@ -344,7 +344,7 @@ Notes: | `NodeHealthy` | True if the Machine's Node is ready and it does not report MemoryPressure, DiskPressure and PIDPressure | | `NodeReady` | True if the Machine's Node is ready | | `HealthCheckSucceeded` | True if MHC instances targeting this machine report the Machine is healthy according to the definition of healthy present in the spec of the MachineHealthCheck object | -| `OwnerRemediated` | True if MHC instances targeting this machine determine that the controller owning this machine should perform remediation | +| `OwnerRemediated` | Only present if MHC instances targeting this machine determine that the controller owning this machine should perform remediation | | `Deleting` | If Machine is deleted, this condition surfaces details about progress in the machine deletion workflow | | `Paused` | True if the Machine or the Cluster it belongs to are paused | @@ -500,11 +500,11 @@ type MachineSetStatus struct { | `V1Beta2.AvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | | `V1Beta2.UpToDateReplicas` (new) | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | | | `Deprecated.V1Beta1` (new) | (removed) | -| `ReadyReplicas` | `Deprecated.V1Beta1.ReadyReplicas` (renamed) (deprecated) | (removed) | -| `AvailableReplicas` | `Deprecated.V1Beta1.AvailableReplicas` (renamed) (deprecated) | (removed) | -| `FailureReason` | `Deprecated.V1Beta1.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` | `Deprecated.V1Beta1.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` | `Deprecated.V1Beta1.Conditions` (renamed) (deprecated) | (removed) | +| `ReadyReplicas` (deprecated) | `Deprecated.V1Beta1.ReadyReplicas` (renamed) (deprecated) | (removed) | +| `AvailableReplicas` (deprecated) | `Deprecated.V1Beta1.AvailableReplicas` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `Deprecated.V1Beta1.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `Deprecated.V1Beta1.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` (deprecated) | `Deprecated.V1Beta1.Conditions` (renamed) (deprecated) | (removed) | | other fields... | other fields... | other fields... | Notes: @@ -592,7 +592,7 @@ type MachineDeploymentStatus struct { // +optional AvailableReplicas int32 `json:"availableReplicas"` - // The number of up-to-date replicas targeted by this deployment. + // The number of up-to-date replicas targeted by this deployment. A machine is considered up-to-date when Machine's UpToDate condition is true. // +optional UpToDateReplicas int32 `json:"upToDateReplicas"` @@ -612,14 +612,15 @@ type MachineDeploymentStatus struct { | `V1Beta2` (new) | (removed) | (removed) | | `V1Beta2.Conditions` (new) | `Conditions` (renamed) | `Conditions` | | `V1Beta2.ReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | -| `V1Beta2.vailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | +| `V1Beta2.AvilableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | +| `V1Beta2.UpToDateReplicas` (new) | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | | | `Deprecated.V1Beta1` (new) | (removed) | | `ReadyReplicas` (deprecated) | `Deprecated.V1Beta1.ReadyReplicas` (renamed) (deprecated) | (removed) | | `AvailableReplicas` (deprecated) | `Deprecated.V1Beta1.AvailableReplicas` (renamed) (deprecated) | (removed) | | `FailureReason` (deprecated) | `Deprecated.V1Beta1.FailureReason` (renamed) (deprecated) | (removed) | | `FailureMessage` (deprecated) | `Deprecated.V1Beta1.FailureMessage` (renamed) (deprecated) | (removed) | | `Conditions` (deprecated) | `Deprecated.V1Beta1.Conditions` (renamed) (deprecated) | (removed) | -| `UpdatedReplicas` | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | +| `UpdatedReplicas` (deprecated) | `Deprecated.V1Beta1.UpToDateReplicas` (UpdatedReplicas) | (removed) | | other fields... | other fields... | other fields... | Notes: @@ -738,19 +739,19 @@ type ClusterControlPlaneStatus struct { // +optional DesiredReplicas int32 `json:"desiredReplicas"` - // Total number of non-terminated control plane machines in this cluster. + // Total number of control plane machines in this cluster. // +optional Replicas int32 `json:"replicas"` - // The number of up-to-date control plane machines in this cluster. + // The number of up-to-date control plane machines in this cluster. A machine is considered up-to-date when Machine's UpToDate condition is true. // +optional UpToDateReplicas int32 `json:"upToDateReplicas"` - // Total number of ready control plane machines in this cluster. + // Total number of ready control plane machines in this cluster. A machine is considered ready when Machine's Ready condition is true. // +optional ReadyReplicas int32 `json:"readyReplicas"` - // Total number of available control plane machines in this cluster. + // Total number of available control plane machines in this cluster. A machine is considered ready when Machine's Available condition is true. // +optional AvailableReplicas int32 `json:"availableReplicas"` } @@ -761,19 +762,19 @@ type WorkersStatus struct { // +optional DesiredReplicas int32 `json:"desiredReplicas"` - // Total number of non-terminated worker machines in this cluster. + // Total number of worker machines in this cluster. // +optional Replicas int32 `json:"replicas"` - // The number of up-to-date worker machines in this cluster. + // The number of up-to-date worker machines in this cluster. A machine is considered up-to-date when Machine's UpToDate condition is true. // +optional UpToDateReplicas int32 `json:"upToDateReplicas"` - // Total number of ready worker machines in this cluster. + // Total number of ready worker machines in this cluster. A machine is considered ready when Machine's Ready condition is true. // +optional ReadyReplicas int32 `json:"readyReplicas"` - // Total number of available worker machines in this cluster. + // Total number of available worker machines in this cluster. A machine is considered ready when Machine's Available condition is true. // +optional AvailableReplicas int32 `json:"availableReplicas"` } @@ -814,22 +815,22 @@ Notes: ##### Cluster (New)Conditions -| Condition | Note | -|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `Available` | True if Cluster `RemoteConnectionProbe` is true, if Cluster's control plane `Available` condition is true, if all MachineDeployment and MachinePool's `Available` conditions are true, if `TopologyReconciled` is true (if present); if conditions are defined in `spec.availabilityGates`, those conditions must be true as well | -| `TopologyReconciled` | True if the topology controller is working properly | -| `InfrastructureReady` | Mirror of Cluster's infrastructure `Ready` condition | -| `ControlPlaneInitialized` | True when the Cluster's control plane is functional enough to accept requests. This information is usually used as a signal for starting all the provisioning operations that depends on a functional API server, but do not require a full HA control plane to exists | -| `ControlPlaneAvailable` | Mirror of Cluster's control plane `Available` condition | -| `WorkersAvailable` | Summary of MachineDeployment and MachinePool's `Available` conditions | -| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any | -| `MachinesUpToDate` | This condition surfaces details of Cluster's machines not up to date, if any | -| `RemoteConnectionProbe` | True when control plane can be reached; in case of connection problems, the condition turns to false only if the the cluster cannot be reached for 50s after the first connection problem is detected (or whatever period is defined in the `--remote-connection-grace-period` flag) | -| `ScalingUp` | True if available replicas < desired replicas | -| `ScalingDown` | True if replicas > desired replicas | -| `Remediating` | This condition surfaces details about ongoing remediation of the controlled machines, if any | -| `Deleting` | If Cluster is deleted, this condition surfaces details about ongoing deletion of the controlled machines | -| `Paused` | True if Cluster and all the resources being part of it are paused | +| Condition | Note | +|---------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `Available` | True if the Cluster's `RemoteConnectionProbe`, `InfrastructureReady`, `ControlPlaneAvailable`, `WorkersAvailable`, `TopologyReconciled` (if present) conditions are true. if conditions are defined in `spec.availabilityGates`, those conditions must be true as well | +| `TopologyReconciled` | True if the topology controller is working properly | +| `InfrastructureReady` | Mirror of Cluster's infrastructure `Ready` condition | +| `ControlPlaneInitialized` | True when the Cluster's control plane is functional enough to accept requests. This information is usually used as a signal for starting all the provisioning operations that depends on a functional API server, but do not require a full HA control plane to exists | +| `ControlPlaneAvailable` | Mirror of Cluster's control plane `Available` condition | +| `WorkersAvailable` | Summary of MachineDeployment and MachinePool's `Available` conditions | +| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any | +| `MachinesUpToDate` | This condition surfaces details of Cluster's machines not up to date, if any | +| `RemoteConnectionProbe` | True when control plane can be reached; in case of connection problems, the condition turns to false only if the the cluster cannot be reached for 50s after the first connection problem is detected (or whatever period is defined in the `--remote-connection-grace-period` flag) | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `Remediating` | This condition surfaces details about ongoing remediation of the controlled machines, if any | +| `Deleting` | If Cluster is deleted, this condition surfaces details about ongoing deletion of the cluster | +| `Paused` | True if Cluster and all the resources being part of it are paused | > To better evaluate proposed changes, below you can find the list of current Cluster's conditions: > Ready, InfrastructureReady, ControlPlaneReady, ControlPlaneInitialized, TopologyReconciled @@ -939,15 +940,15 @@ type KubeadmControlPlaneStatus struct { // +optional ReadyReplicas int32 `json:"readyReplicas"` - // The number of available replicas targeted by this ControlPlane. + // The number of available replicas targeted by this ControlPlane. A machine is considered ready when Machine's Available condition is true. // +optional AvailableReplicas int32 `json:"availableReplicas"` - // The number of up-to-date replicas targeted by this ControlPlane. + // The number of up-to-date replicas targeted by this ControlPlane. A machine is considered ready when Machine's UpToDate condition is true. // +optional UpToDateReplicas int32 `json:"upToDateReplicas"` - // Represents the observations of a ControlPlane's current state. + // Represents the observations of a ControlPlane's current state. // +optional // +listType=map // +listMapKey=type @@ -965,12 +966,13 @@ type KubeadmControlPlaneStatus struct { | `V1Beta2.Conditions` (new) | `Conditions` (renamed) | `Conditions` | | `V1Beta2.ReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | | `V1Beta2.AvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | +| `V1Beta2.UpToDateReplicas` (new) | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | | | `Deprecated.V1Beta1` (new) | (removed) | | `ReadyReplicas` (deprecated) | `Deprecated.V1Beta1.ReadyReplicas` (renamed) (deprecated) | (removed) | | `FailureReason` (deprecated) | `Deprecated.V1Beta1.FailureReason` (renamed) (deprecated) | (removed) | | `FailureMessage` (deprecated) | `Deprecated.V1Beta1.FailureMessage` (renamed) (deprecated) | (removed) | | `Conditions` (deprecated) | `Deprecated.V1Beta1.Conditions` (renamed) (deprecated) | (removed) | -| `UpdatedReplicas` | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | +| `UpdatedReplicas` (deprecated) | `Deprecated.V1Beta1.UpToDateReplicas` (UpdatedReplicas) | (removed) | | other fields... | other fields... | other fields... | Notes: @@ -981,18 +983,18 @@ Notes: #### KubeadmControlPlane (New)Conditions -| Condition | Note | -|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `Available` | True if the control plane can be reached, `EtcdClusterAvailable` is true, and `CertificatesAvailable` is true | -| `CertificatesAvailable` | True if all the cluster certificates exist. | -| `EtcdClusterAvailable` | This condition surfaces issues to the managed etcd cluster, if any It is computed as aggregation of Machines's `EtcdMemberHealthy` (if not using an external etcd) conditions plus additional checks validating potential issues to etcd quorum | -| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any Please note this will include also `ApiServerPodHealthy`, `ControllerManagerPodHealthy`, `SchedulerPodHealthy`, and if not using an external etcd also `EtcdPodHealthy`, `EtcdMemberHealthy` | -| `MachinesUpToDate` | This condition surfaces details of controlled machines not up to date, if any | -| `ScalingUp` | True if available replicas < desired replicas | -| `ScalingDown` | True if replicas > desired replicas | -| `Remediating` | This condition surfaces details about ongoing remediation of the controlled machines, if any | -| `Deleting` | If KubeadmControlPlane is deleted, this condition surfaces details about ongoing deletion of the controlled machines | -| `Paused` | True if this resource or the Cluster it belongs to are paused | +| Condition | Note | +|-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `Available` | True if the control plane can be reached, `EtcdClusterAvailable` is true, and `CertificatesAvailable` is true | +| `CertificatesAvailable` | True if all the cluster certificates exist. | +| `EtcdClusterAvailable` | This condition surfaces issues to the managed etcd cluster, if any It is computed as aggregation of Machines's `EtcdMemberHealthy` (if not using an external etcd) conditions plus additional checks validating potential issues to etcd quorum | +| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any. Please note this will include also `APIServerPodHealthy`, `ControllerManagerPodHealthy`, `SchedulerPodHealthy`, and if not using an external etcd also `EtcdPodHealthy`, `EtcdMemberHealthy` | +| `MachinesUpToDate` | This condition surfaces details of controlled machines not up to date, if any | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `Remediating` | This condition surfaces details about ongoing remediation of the controlled machines, if any | +| `Deleting` | If KubeadmControlPlane is deleted, this condition surfaces details about ongoing deletion of the controlled machines | +| `Paused` | True if this resource or the Cluster it belongs to are paused | > To better evaluate proposed changes, below you can find the list of current KubeadmControlPlane's conditions: > Ready, CertificatesAvailable, MachinesCreated, Available, MachinesSpecUpToDate, Resized, MachinesReady, @@ -1002,7 +1004,7 @@ Notes: - Conditions like `ScalingUp`, `ScalingDown`, `Remediating` and `Deleting` are intended to provide visibility on the corresponding lifecycle operation. e.g. If the scaling down operation is being blocked by a Machine having issues while deleting, this should surface as a reason/message in the `ScalingDown` condition. -- The KubeadmControlPlane controller is going to add `ApiServerPodHealthy`, `ControllerManagerPodHealthy`, `SchedulerPodHealthy`, +- The KubeadmControlPlane controller is going to add `APIServerPodHealthy`, `ControllerManagerPodHealthy`, `SchedulerPodHealthy`, `EtcdPodHealthy`, `EtcdMemberHealthy`conditions to the controller machines. These conditions will also be defined as `readinessGates` for computing Machine's `Ready` condition. @@ -1054,7 +1056,7 @@ type MachinePoolStatus struct { // +optional AvailableReplicas int32 `json:"availableReplicas"` - // The number of up-to-date replicas targeted by this MachinePool. + // The number of up-to-date replicas targeted by this MachinePool. A machine is considered available when Machine's UpToDate condition is true. // +optional UpToDateReplicas int32 `json:"upToDateReplicas"` @@ -1094,23 +1096,23 @@ type MachinePoolInitializationStatus struct { } ``` -| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | -|-----------------------------------|---------------------------------------------------------------|----------------------------------------------------| -| | `Initialization` (new) | `Initialization` | -| `BootstrapReady` | `Initialization.BootstrapDataSecretCreated` (renamed) | `Initialization.BootstrapDataSecretCreated` | -| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | -| `V1Beta2` (new) | (removed) | (removed) | -| `V1Beta2.Conditions` (new) | `Conditions` (renamed) | `Conditions` | -| `V1Beta2.UpdatedReplicas` (new) | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | -| `V1Beta2.ReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | -| `V1Beta2.AvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | -| | `Deprecated.V1Beta1` (new) | (removed) | -| `ReadyReplicas` (deprecated) | `Deprecated.V1Beta1.ReadyReplicas` (renamed) (deprecated) | (removed) | -| `AvailableReplicas` (deprecated) | `Deprecated.V1Beta1.AvailableReplicas` (renamed) (deprecated) | (removed) | -| `FailureReason` (deprecated) | `Deprecated.V1Beta1.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `Deprecated.V1Beta1.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` (deprecated) | `Deprecated.V1Beta1.Conditions` (renamed) (deprecated) | (removed) | -| other fields... | other fields... | other fields... | +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|------------------------------------|---------------------------------------------------------------|----------------------------------------------------| +| | `Initialization` (new) | `Initialization` | +| `BootstrapReady` | `Initialization.BootstrapDataSecretCreated` (renamed) | `Initialization.BootstrapDataSecretCreated` | +| `InfrastructureReady` | `Initialization.InfrastructureProvisioned` (renamed) | `Initialization.InfrastructureProvisioned` | +| `V1Beta2` (new) | (removed) | (removed) | +| `V1Beta2.Conditions` (new) | `Conditions` (renamed) | `Conditions` | +| `V1Beta2.UpToDateReplicas` (new) | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | +| `V1Beta2.ReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | +| `V1Beta2.AvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | +| | `Deprecated.V1Beta1` (new) | (removed) | +| `ReadyReplicas` (deprecated) | `Deprecated.V1Beta1.ReadyReplicas` (renamed) (deprecated) | (removed) | +| `AvailableReplicas` (deprecated) | `Deprecated.V1Beta1.AvailableReplicas` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `Deprecated.V1Beta1.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `Deprecated.V1Beta1.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` (deprecated) | `Deprecated.V1Beta1.Conditions` (renamed) (deprecated) | (removed) | +| other fields... | other fields... | other fields... | Notes: - The `V1Beta2` struct is going to be added to in v1beta1 types in order to provide a preview of changes coming with the v1beta2 types, but without impacting the semantic of existing fields. @@ -1170,8 +1172,9 @@ The Cluster API contract defines a set of rules a provider is expected to comply When the v1beta2 API will be released (tentative Apr 2025), also the Cluster API contract will be bumped to v1beta2. -As written at the beginning of this document, this proposal is not going to change how the Cluster API contract -with infrastructure, bootstrap and control providers currently works (by using status fields). +As written at the beginning of this document, this proposal is not going to change the fundamental way the Cluster API contract +with infrastructure, bootstrap and control providers currently works (by using status fields; however, we are renaming a few fields +as detailed below). Similarly, this proposal is not going to change the fact that the Cluster API contract does not require providers to implement conditions, even if this is recommended because conditions greatly improve user's experience. @@ -1300,6 +1303,7 @@ Following changes are planned for the contract for the ControlPlane resource: | `status.conditions[Ready]`, optional with fall back on `status.ready` | `status.deprecated.v1beta1.conditions[Ready]` (renamed, deprecated), optional with fall back on `status.ready` or `status.initialization.controlPlaneInitialized` set | (removed) | | | `status.conditions[Available]` (new), optional with fall back optional with fall back on `status.ready` or `status.initialization.controlPlaneInitialized` set | `status.conditions[Available]`, optional with fall back on `status.initializiation.controlPlaneInitialized` | | `status.failureReason`, optional | `status.failureReason` (deprecated), optional | (removed) | +| `status.failureReason`, optional | `status.failureReason` (deprecated), optional | (removed) | | `status.failureMessage`, optional | `status.failureMessage` (deprecated), optional | (removed) | | other fields/rules... | other fields/rules... | | @@ -1322,11 +1326,11 @@ proposal address those use cases. As a cluster admin with MachineDeployment ownership I'd like to understand if my MD is performing a rolling upgrade and why by looking at the MD status/conditions -> The main signal for MD is performing a rolling upgrade will be `MD.Status.Conditions[UpToDate]`. +> The main signal for MD is performing a rolling upgrade will be `MD.Status.Conditions[MachinesUpToDate]`. > At least in the first iteration there won't be a signal at MD level about why rollout is happening, because controlled machines might > have different reasons why they are not UpToDate (and the admin can check those conditions by looking at single machines). -> In future iterations of this proposal we might find ways to aggregate those reasons into the message for the `MD.Status.Conditions[UpToDate]` condition. +> In future iterations of this proposal we might find ways to aggregate those reasons into the message for the `MD.Status.Conditions[MachinesUpToDate]` condition. As a cluster admin with MachineDeployment ownership I'd like to understand why my MD rollout is blocked by looking at the MD status/conditions From 75c4f63e3ff338fd85f90c3178a9e2ffc1e2b60f Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Fri, 6 Sep 2024 17:22:33 +0200 Subject: [PATCH 20/22] Align replica counters in control plane contract + add a note about in place updated fields --- docs/proposals/improve-status-in-CAPI-resources.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index c1dc29f18807..b35959794e36 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -1295,6 +1295,7 @@ Following changes are planned for the contract for the ControlPlane resource: - Remove `status.ready` (`status.ready` is a redundant signal of the control plane being initialized). - Rename `status.initialized` into `status.initialization.controlPlaneInitialized`. - Remove `failureReason` and `failureMessage`. +- Align replica counters with CAPI core objects | v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | |-----------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------| @@ -1305,8 +1306,16 @@ Following changes are planned for the contract for the ControlPlane resource: | `status.failureReason`, optional | `status.failureReason` (deprecated), optional | (removed) | | `status.failureReason`, optional | `status.failureReason` (deprecated), optional | (removed) | | `status.failureMessage`, optional | `status.failureMessage` (deprecated), optional | (removed) | +| `status.unavailableReplicas`, optional | `status.unavailableReplicas` (deprecated), optional | (removed) | +| | `status.availableReplicas` (new), optional with fallback on replicas - `status.unavailableReplicas` | `status.availableReplicas`, optional | +| | `status.readyReplicas` (new), optional | `status.readyReplicas`, optional | +| `status.updatedReplicas`, optional | `status.uptoDateReplicas` (renamed), optional will fall back on `status.updatedReplicas` | `status.uptoDateReplicas`, optional | | other fields/rules... | other fields/rules... | | +Additionally, control plane providers will be expected to continuously set Machine's `status.conditions[UpToDate]` condition +and `spec.minReadySeconds`. Those fields should be treated like other fields propagated /updated in place, without triggering +machine rollouts (`nodeDrainTimeout`, `nodeVolumeDetachTimeout`, `nodeDeletionTimeout`, labels and annotations). + Notes: - ControlPlane's `status.initialization.controlPlaneInitialized` will surface into Cluster's `staus.initialization.controlPlaneInitialized` field; also, the fact that the control plane is available to receive requests will be recorded in Cluster's `status.conditions[ControlPlaneInitialized]` condition. From ab42d7dd544c7a8f620aff5871b288d76ce37f08 Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Fri, 6 Sep 2024 17:42:19 +0200 Subject: [PATCH 21/22] Clarify MinReadySeconds transformation for MS, MD, MP --- .../improve-status-in-CAPI-resources.md | 74 ++++++++++++++++++- 1 file changed, 70 insertions(+), 4 deletions(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index b35959794e36..6bc00c2c7702 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -336,7 +336,7 @@ Notes: | Condition | Note | |------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `Available` | True if at the machine is Ready for at least MinReady seconds, as defined by the Machine's minReadySeconds field | +| `Available` | True if at the machine is Ready for at least MinReadySeconds, as defined by the Machine's MinReadySeconds field | | `Ready` | True if the Machines is not deleted, Machine's `BootstrapConfigReady`, `InfrastructureReady`, `NodeHealthy` and `HealthCheckSucceeded` (if present) are true; if other conditions are defined in `spec.readinessGates`, these conditions must be true as well | | `UpToDate` | True if the Machine spec matches the spec of the Machine's owner resource, e.g KubeadmControlPlane or MachineDeployment | | `BootstrapConfigReady` | Mirrors the corresponding `Ready` condition from the Machine's BootstrapConfig resource | @@ -517,7 +517,7 @@ Notes: - Also `AvailableReplicas` will determine Machine's availability via Machine's `Available` condition instead of computing availability as of today (based on the Node `Ready` condition) -#### MachineSet (New)Conditions +##### MachineSet (New)Conditions | Condition | Note | |--------------------|-------------------------------------------------------------------------------------------------------------| @@ -568,6 +568,46 @@ Notes: - In k8s Deployment and ReplicaSet have different print columns for replica counters; this proposal enforces replicas counter columns consistent across all resources. +#### MachineSet Spec + +Following changes are implemented to MachineSet's spec: + +- Remove `Spec.MinReadySeconds`, which is now part of Machine's spec (and thus exists in MachineSet as `Spec.Template.Spec.MinReadySeconds`). + +Below you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. + +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|------------------------------|------------------------------------------------|----------------------------------------------------| +| `MinReadySeconds` | `Spec.Template.Spec.MinReadySeconds` (renamed) | `Spec.Template.Spec.MinReadySeconds` (removed) | +| other fields... | other fields... | other fields... | + +##### MachineSet (New)Conditions + +| Condition | Note | +|--------------------|-------------------------------------------------------------------------------------------------------------| +| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any | +| `MachinesUpToDate` | This condition surfaces details of controlled machines not up to date, if any | +| `ScalingUp` | True if available replicas < desired replicas | +| `ScalingDown` | True if replicas > desired replicas | +| `Remediating` | This condition surfaces details about ongoing remediation of the controlled machines, if any | +| `Deleting` | If MachineSet is deleted, this condition surfaces details about ongoing deletion of the controlled machines | +| `Paused` | True if this MachineSet or the Cluster it belongs to are paused | + +> To better evaluate proposed changes, below you can find the list of current MachineSet's conditions: +> Ready, MachinesCreated, Resized, MachinesReady. + +Notes: +- Conditions like `ScalingUp`, `ScalingDown`, `Remediating` and `Deleting` are intended to provide visibility on the corresponding lifecycle operation. + e.g. If the scaling down operation is being blocked by a machine having issues while deleting, this should surface with a reason/message in + the `ScalingDown` condition. +- MachineSet conditions are intentionally mostly consistent with MachineDeployment conditions to help users troubleshooting. +- MachineSet is considered as a sort of implementation detail of MachineDeployments, so it doesn't have its own concept of availability. + Similarly, this proposal is dropping the notion of MachineSet readiness because it is preferred to let users focus on Machines readiness. +- When implementing this proposal `MachinesUpToDate` condition will be `false` for older MachineSet, `true` for the current MachineSet; + in the future this might change in case Cluster API will start supporting in-place upgrades. +- `Remediating` for older MachineSets will report that remediation will happen as part of the regular rollout (Cluster API + does not remediate Machines on old MachineSets, because those Machines are already scheduled for deletion). + ### Changes to MachineDeployment resource #### MachineDeployment Status @@ -629,7 +669,7 @@ Notes: - The `Deprecated` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). Fields in this struct are used for supporting down conversions, thus providing users relying on v1beta1 APIs additional buffer time to pick up the new changes. -#### MachineDeployment (New)Conditions +##### MachineDeployment (New)Conditions | Condition | Note | |--------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| @@ -672,6 +712,19 @@ Notes: - Print columns are not subject to any deprecation rule, so it is possible to iteratively improve print columns without waiting for the next API version. - During the implementation we are going to verify the resulting layout and eventually make final adjustments to the column list. +#### MachineDeployment Spec + +Following changes are implemented to MachineDeployment's spec: + +- Remove `Spec.MinReadySeconds`, which is now part of Machine's spec (and thus exists in MachineDeployment as `Spec.Template.Spec.MinReadySeconds`). + +Below you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. + +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|------------------------------|------------------------------------------------|----------------------------------------------------| +| `MinReadySeconds` | `Spec.Template.Spec.MinReadySeconds` (renamed) | `Spec.Template.Spec.MinReadySeconds` (removed) | +| other fields... | other fields... | other fields... | + ### Changes to Cluster resource #### Cluster Status @@ -981,7 +1034,7 @@ Notes: - The `Deprecated` struct is going to exist in v1beta2 types only until v1beta1 removal (9 months or 3 minor releases after v1beta2 is released/v1beta1 is deprecated, whichever is longer). Fields in this struct are used for supporting down conversions, thus providing users relying on v1beta1 APIs additional buffer time to pick up the new changes. -#### KubeadmControlPlane (New)Conditions +##### KubeadmControlPlane (New)Conditions | Condition | Note | |-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| @@ -1166,6 +1219,19 @@ Notes: - Print columns are not subject to any deprecation rule, so it is possible to iteratively improve print columns without waiting for the next API version. - During the implementation we are going to verify the resulting layout and eventually make final adjustments to the column list. +#### MachinePool Spec + +Following changes are implemented to MachinePool's spec: + +- Remove `Spec.MinReadySeconds`, which is now part of Machine's spec (and thus exists in MachinePool as `Spec.Template.Spec.MinReadySeconds`). + +Below you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. + +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|------------------------------|------------------------------------------------|----------------------------------------------------| +| `MinReadySeconds` | `Spec.Template.Spec.MinReadySeconds` (renamed) | `Spec.Template.Spec.MinReadySeconds` (removed) | +| other fields... | other fields... | other fields... | + ### Changes to Cluster API contract The Cluster API contract defines a set of rules a provider is expected to comply with in order to interact with Cluster API. From b2e30953792cc30e0d755d9ad9f81dfcf50f1bc5 Mon Sep 17 00:00:00 2001 From: fabriziopandini Date: Mon, 9 Sep 2024 11:53:04 +0200 Subject: [PATCH 22/22] fix review findings --- .../improve-status-in-CAPI-resources.md | 159 ++++++++---------- 1 file changed, 68 insertions(+), 91 deletions(-) diff --git a/docs/proposals/improve-status-in-CAPI-resources.md b/docs/proposals/improve-status-in-CAPI-resources.md index 6bc00c2c7702..80a0bfa97af4 100644 --- a/docs/proposals/improve-status-in-CAPI-resources.md +++ b/docs/proposals/improve-status-in-CAPI-resources.md @@ -41,11 +41,13 @@ see-also: - [Machine Print columns](#machine-print-columns) - [Changes to MachineSet resource](#changes-to-machineset-resource) - [MachineSet Status](#machineset-status) - - [MachineSet (New)Conditions](#machineset-newconditions) + - [MachineSet (New)Conditions](#machineset-newconditions) + - [MachineSet Spec](#machineset-spec) - [MachineSet Print columns](#machineset-print-columns) - [Changes to MachineDeployment resource](#changes-to-machinedeployment-resource) - [MachineDeployment Status](#machinedeployment-status) - - [MachineDeployment (New)Conditions](#machinedeployment-newconditions) + - [MachineDeployment (New)Conditions](#machinedeployment-newconditions) + - [MachineDeployment Spec](#machinedeployment-spec) - [MachineDeployment Print columns](#machinedeployment-print-columns) - [Changes to Cluster resource](#changes-to-cluster-resource) - [Cluster Status](#cluster-status) @@ -54,11 +56,12 @@ see-also: - [Cluster Print columns](#cluster-print-columns) - [Changes to KubeadmControlPlane (KCP) resource](#changes-to-kubeadmcontrolplane-kcp-resource) - [KubeadmControlPlane Status](#kubeadmcontrolplane-status) - - [KubeadmControlPlane (New)Conditions](#kubeadmcontrolplane-newconditions) + - [KubeadmControlPlane (New)Conditions](#kubeadmcontrolplane-newconditions) - [KubeadmControlPlane Print columns](#kubeadmcontrolplane-print-columns) - [Changes to MachinePool resource](#changes-to-machinepool-resource) - [MachinePool Status](#machinepool-status) - [MachinePool (New)Conditions](#machinepool-newconditions) + - [MachinePool Spec](#machinepool-spec) - [MachinePool Print columns](#machinepool-print-columns) - [Changes to Cluster API contract](#changes-to-cluster-api-contract) - [Contract for infrastructure providers](#contract-for-infrastructure-providers) @@ -544,6 +547,19 @@ Notes: - `Remediating` for older MachineSets will report that remediation will happen as part of the regular rollout (Cluster API does not remediate Machines on old MachineSets, because those Machines are already scheduled for deletion). +#### MachineSet Spec + +Following changes are implemented to MachineSet's spec: + +- Remove `Spec.MinReadySeconds`, which is now part of Machine's spec (and thus exists in MachineSet as `Spec.Template.Spec.MinReadySeconds`). + +Below you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. + +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|------------------------------|------------------------------------------------|----------------------------------------------------| +| `Spec.MinReadySeconds` | `Spec.Template.Spec.MinReadySeconds` (renamed) | `Spec.Template.Spec.MinReadySeconds` | +| other fields... | other fields... | other fields... | + #### MachineSet Print columns | Current | To be | @@ -568,46 +584,6 @@ Notes: - In k8s Deployment and ReplicaSet have different print columns for replica counters; this proposal enforces replicas counter columns consistent across all resources. -#### MachineSet Spec - -Following changes are implemented to MachineSet's spec: - -- Remove `Spec.MinReadySeconds`, which is now part of Machine's spec (and thus exists in MachineSet as `Spec.Template.Spec.MinReadySeconds`). - -Below you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. - -| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | -|------------------------------|------------------------------------------------|----------------------------------------------------| -| `MinReadySeconds` | `Spec.Template.Spec.MinReadySeconds` (renamed) | `Spec.Template.Spec.MinReadySeconds` (removed) | -| other fields... | other fields... | other fields... | - -##### MachineSet (New)Conditions - -| Condition | Note | -|--------------------|-------------------------------------------------------------------------------------------------------------| -| `MachinesReady` | This condition surfaces detail of issues on the controlled machines, if any | -| `MachinesUpToDate` | This condition surfaces details of controlled machines not up to date, if any | -| `ScalingUp` | True if available replicas < desired replicas | -| `ScalingDown` | True if replicas > desired replicas | -| `Remediating` | This condition surfaces details about ongoing remediation of the controlled machines, if any | -| `Deleting` | If MachineSet is deleted, this condition surfaces details about ongoing deletion of the controlled machines | -| `Paused` | True if this MachineSet or the Cluster it belongs to are paused | - -> To better evaluate proposed changes, below you can find the list of current MachineSet's conditions: -> Ready, MachinesCreated, Resized, MachinesReady. - -Notes: -- Conditions like `ScalingUp`, `ScalingDown`, `Remediating` and `Deleting` are intended to provide visibility on the corresponding lifecycle operation. - e.g. If the scaling down operation is being blocked by a machine having issues while deleting, this should surface with a reason/message in - the `ScalingDown` condition. -- MachineSet conditions are intentionally mostly consistent with MachineDeployment conditions to help users troubleshooting. -- MachineSet is considered as a sort of implementation detail of MachineDeployments, so it doesn't have its own concept of availability. - Similarly, this proposal is dropping the notion of MachineSet readiness because it is preferred to let users focus on Machines readiness. -- When implementing this proposal `MachinesUpToDate` condition will be `false` for older MachineSet, `true` for the current MachineSet; - in the future this might change in case Cluster API will start supporting in-place upgrades. -- `Remediating` for older MachineSets will report that remediation will happen as part of the regular rollout (Cluster API - does not remediate Machines on old MachineSets, because those Machines are already scheduled for deletion). - ### Changes to MachineDeployment resource #### MachineDeployment Status @@ -660,7 +636,7 @@ type MachineDeploymentStatus struct { | `FailureReason` (deprecated) | `Deprecated.V1Beta1.FailureReason` (renamed) (deprecated) | (removed) | | `FailureMessage` (deprecated) | `Deprecated.V1Beta1.FailureMessage` (renamed) (deprecated) | (removed) | | `Conditions` (deprecated) | `Deprecated.V1Beta1.Conditions` (renamed) (deprecated) | (removed) | -| `UpdatedReplicas` (deprecated) | `Deprecated.V1Beta1.UpToDateReplicas` (UpdatedReplicas) | (removed) | +| `UpdatedReplicas` (deprecated) | `Deprecated.V1Beta1.UpdatedReplicas` (renamed) (deprecated) | (removed) | | other fields... | other fields... | other fields... | Notes: @@ -690,6 +666,19 @@ Notes: e.g. If the scaling down operation is being blocked by a machine having issues while deleting, this should surface as a reason/message in the `ScalingDown` condition. +#### MachineDeployment Spec + +Following changes are implemented to MachineDeployment's spec: + +- Remove `Spec.MinReadySeconds`, which is now part of Machine's spec (and thus exists in MachineDeployment as `Spec.Template.Spec.MinReadySeconds`). + +Below you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. + +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|------------------------------|------------------------------------------------|----------------------------------------------------| +| `Spec.MinReadySeconds` | `Spec.Template.Spec.MinReadySeconds` (renamed) | `Spec.Template.Spec.MinReadySeconds` | +| other fields... | other fields... | other fields... | + #### MachineDeployment Print columns | Current | To be | @@ -712,19 +701,6 @@ Notes: - Print columns are not subject to any deprecation rule, so it is possible to iteratively improve print columns without waiting for the next API version. - During the implementation we are going to verify the resulting layout and eventually make final adjustments to the column list. -#### MachineDeployment Spec - -Following changes are implemented to MachineDeployment's spec: - -- Remove `Spec.MinReadySeconds`, which is now part of Machine's spec (and thus exists in MachineDeployment as `Spec.Template.Spec.MinReadySeconds`). - -Below you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. - -| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | -|------------------------------|------------------------------------------------|----------------------------------------------------| -| `MinReadySeconds` | `Spec.Template.Spec.MinReadySeconds` (renamed) | `Spec.Template.Spec.MinReadySeconds` (removed) | -| other fields... | other fields... | other fields... | - ### Changes to Cluster resource #### Cluster Status @@ -1012,21 +988,21 @@ type KubeadmControlPlaneStatus struct { } ``` -| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | -|-----------------------------------|------------------------------------------------------------|----------------------------------------------------| -| `Ready` (deprecated) | `Ready` (deprecated) | (removed) | -| `V1Beta2` (new) | (removed) | (removed) | -| `V1Beta2.Conditions` (new) | `Conditions` (renamed) | `Conditions` | -| `V1Beta2.ReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | -| `V1Beta2.AvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | -| `V1Beta2.UpToDateReplicas` (new) | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | -| | `Deprecated.V1Beta1` (new) | (removed) | -| `ReadyReplicas` (deprecated) | `Deprecated.V1Beta1.ReadyReplicas` (renamed) (deprecated) | (removed) | -| `FailureReason` (deprecated) | `Deprecated.V1Beta1.FailureReason` (renamed) (deprecated) | (removed) | -| `FailureMessage` (deprecated) | `Deprecated.V1Beta1.FailureMessage` (renamed) (deprecated) | (removed) | -| `Conditions` (deprecated) | `Deprecated.V1Beta1.Conditions` (renamed) (deprecated) | (removed) | -| `UpdatedReplicas` (deprecated) | `Deprecated.V1Beta1.UpToDateReplicas` (UpdatedReplicas) | (removed) | -| other fields... | other fields... | other fields... | +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|-----------------------------------|-------------------------------------------------------------|----------------------------------------------------| +| `Ready` (deprecated) | `Ready` (deprecated) | (removed) | +| `V1Beta2` (new) | (removed) | (removed) | +| `V1Beta2.Conditions` (new) | `Conditions` (renamed) | `Conditions` | +| `V1Beta2.ReadyReplicas` (new) | `ReadyReplicas` (renamed) | `ReadyReplicas` | +| `V1Beta2.AvailableReplicas` (new) | `AvailableReplicas` (renamed) | `AvailableReplicas` | +| `V1Beta2.UpToDateReplicas` (new) | `UpToDateReplicas` (renamed) | `UpToDateReplicas` | +| | `Deprecated.V1Beta1` (new) | (removed) | +| `ReadyReplicas` (deprecated) | `Deprecated.V1Beta1.ReadyReplicas` (renamed) (deprecated) | (removed) | +| `FailureReason` (deprecated) | `Deprecated.V1Beta1.FailureReason` (renamed) (deprecated) | (removed) | +| `FailureMessage` (deprecated) | `Deprecated.V1Beta1.FailureMessage` (renamed) (deprecated) | (removed) | +| `Conditions` (deprecated) | `Deprecated.V1Beta1.Conditions` (renamed) (deprecated) | (removed) | +| `UpdatedReplicas` (deprecated) | `Deprecated.V1Beta1.UpdatedReplicas` (renamed) (deprecated) | (removed) | +| other fields... | other fields... | other fields... | Notes: - The `V1Beta2` struct is going to be added to in v1beta1 types in order to provide a preview of changes coming with the v1beta2 types, but without impacting the semantic of existing fields. @@ -1197,6 +1173,19 @@ Notes: the `ScalingDown` condition. - As of today MachinePool does not have a notion similar to MachineDeployment's MaxUnavailability. +#### MachinePool Spec + +Following changes are implemented to MachinePool's spec: + +- Remove `Spec.MinReadySeconds`, which is now part of Machine's spec (and thus exists in MachinePool as `Spec.Template.Spec.MinReadySeconds`). + +Below you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. + +| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | +|------------------------------|------------------------------------------------|----------------------------------------------------| +| `Spec.MinReadySeconds` | `Spec.Template.Spec.MinReadySeconds` (renamed) | `Spec.Template.Spec.MinReadySeconds` | +| other fields... | other fields... | other fields... | + #### MachinePool Print columns | Current | To be | @@ -1219,19 +1208,6 @@ Notes: - Print columns are not subject to any deprecation rule, so it is possible to iteratively improve print columns without waiting for the next API version. - During the implementation we are going to verify the resulting layout and eventually make final adjustments to the column list. -#### MachinePool Spec - -Following changes are implemented to MachinePool's spec: - -- Remove `Spec.MinReadySeconds`, which is now part of Machine's spec (and thus exists in MachinePool as `Spec.Template.Spec.MinReadySeconds`). - -Below you can find a summary table that also shows how changes will be rolled out according to K8s deprecation rules. - -| v1beta1 (tentative Dec 2024) | v1beta2 (tentative Apr 2025) | v1beta2 after v1beta1 removal (tentative Apr 2026) | -|------------------------------|------------------------------------------------|----------------------------------------------------| -| `MinReadySeconds` | `Spec.Template.Spec.MinReadySeconds` (renamed) | `Spec.Template.Spec.MinReadySeconds` (removed) | -| other fields... | other fields... | other fields... | - ### Changes to Cluster API contract The Cluster API contract defines a set of rules a provider is expected to comply with in order to interact with Cluster API. @@ -1370,16 +1346,17 @@ Following changes are planned for the contract for the ControlPlane resource: | `status.conditions[Ready]`, optional with fall back on `status.ready` | `status.deprecated.v1beta1.conditions[Ready]` (renamed, deprecated), optional with fall back on `status.ready` or `status.initialization.controlPlaneInitialized` set | (removed) | | | `status.conditions[Available]` (new), optional with fall back optional with fall back on `status.ready` or `status.initialization.controlPlaneInitialized` set | `status.conditions[Available]`, optional with fall back on `status.initializiation.controlPlaneInitialized` | | `status.failureReason`, optional | `status.failureReason` (deprecated), optional | (removed) | -| `status.failureReason`, optional | `status.failureReason` (deprecated), optional | (removed) | | `status.failureMessage`, optional | `status.failureMessage` (deprecated), optional | (removed) | -| `status.unavailableReplicas`, optional | `status.unavailableReplicas` (deprecated), optional | (removed) | -| | `status.availableReplicas` (new), optional with fallback on replicas - `status.unavailableReplicas` | `status.availableReplicas`, optional | -| | `status.readyReplicas` (new), optional | `status.readyReplicas`, optional | -| `status.updatedReplicas`, optional | `status.uptoDateReplicas` (renamed), optional will fall back on `status.updatedReplicas` | `status.uptoDateReplicas`, optional | +| | `status.availableReplicas` (new), required (1) with fallback on `status.readyReplicas` (CP did not have a concept of availability before) | `status.availableReplicas`, required (1) | +| `status.updatedReplicas`, required (1) | `status.upToDateReplicas` (renamed), required (1) will fall back on `status.updatedReplicas` | `status.upToDateReplicas`, required (1) | | other fields/rules... | other fields/rules... | | +required (1): required only if using replicas. + Additionally, control plane providers will be expected to continuously set Machine's `status.conditions[UpToDate]` condition -and `spec.minReadySeconds`. Those fields should be treated like other fields propagated /updated in place, without triggering +and `spec.minReadySeconds`; please note that a CP provider implementation can decide to enforce `spec.minReadySeconds` to be 0 and +introduce a difference between readiness and availability at a later stage (e.g. KCP will do this). +Those fields should be treated like other fields propagated /updated in place, without triggering machine rollouts (`nodeDrainTimeout`, `nodeVolumeDetachTimeout`, `nodeDeletionTimeout`, labels and annotations). Notes: