Add more detail to alternatives section

kubernetes · Oct 6, 2022 · 7ba997e · 7ba997e
1 parent b7ee691
commit 7ba997e
Showing 1 changed file with 55 additions and 11 deletions.
diff --git a/keps/sig-multicluster/3335-statefulset-slice/README.md b/keps/sig-multicluster/3335-statefulset-slice/README.md
@@ -128,14 +128,14 @@ checklist items _must_ be updated for the enhancement to be released.
 
 Items marked with (R) are required *prior to targeting to a milestone / release*.
 
-- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
+- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
 - [ ] (R) KEP approvers have approved the KEP status as `implementable`
-- [ ] (R) Design details are appropriately documented
-- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
+- [X] (R) Design details are appropriately documented
+- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
   - [ ] e2e Tests for all Beta API Operations (endpoints)
   - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) 
   - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
-- [ ] (R) Graduation criteria is in place
+- [X] (R) Graduation criteria is in place
   - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) 
 - [ ] (R) Production readiness review completed
 - [ ] (R) Production readiness review approved
@@ -236,11 +236,12 @@ What is out of scope for this KEP? Listing non-goals helps to focus discussion
 and make progress.
 -->
 
-*   Updating a PDB to safeguard more than one StatefulSet slice
+* Updating a PDB to safeguard more than one StatefulSet slice
     *   As StatefulSet slices are scaled up or down, corresponding PDBs can also be adjusted. For example, a PDB corresponding to a slice of `k` replicas could be adjusted to `MinAvailable: k-1` on scale up or down events. Providing guidance and functionality to adjust these PDBs is outside the scope of this KEP.
-*   Orchestrating pod movement from one StatefulSet slice to another
-*   Managing network connectivity between pods in different StatefulSet slices
-*   Orchestrating storage lifecycle of PVCs and PVs across different StatefulSet slices
+* Orchestrating pod movement from one StatefulSet slice to another
+* Managing network connectivity between pods in different StatefulSet slices
+* Orchestrating storage lifecycle of PVCs and PVs across different StatefulSet slices
+  * Referenced PV/PVCs will need to be migrated in order for a new StatefulSet to reference data that was used by an existing StatefulSet. Orchestration complexity will depend on how volumes are used (RWO with `.spec.volumeClaimTemplates` on a StatefulSet, RWX with pod `.spec.volumes`).
 
 ## Proposal
 
@@ -940,9 +941,52 @@ not need to be as detailed as the proposal, but should include enough
 information to express the idea and why it was not acceptable.
 -->
 
-Users can orphan pods from a StatefulSet, migrate pods across a namespace or cluster, and create a new StatefulSet to manage pods upon migration. In the case of pod eviction or failure, pods will need to be manually restarted, requiring manual intervention and constant monitoring.
-
-Users can backup and restore a StatefulSet (and underlying storage) in a new namespace or cluster. Doing so requires the existing StatefulSet to be deleted, for underlying storage to be backed up and restored, resulting in downtime for the stateful application.
+### Alternative API changes
+
+**ReverseOrderedReady**: A new PodManagementPolicy policy called
+`ReverseOrderedReady` could be added. This would allow a StatefulSet to be
+started and actuated from the highest ordinal (current default is from the
+lowest ordinal). For the cross-cluster migration use case, this would allow for
+a source StatefulSet to be scaled down and a target StatefulSet to be scaled in.
+The downside with this API is that pod management policy is not a mutable field.
+So if an orchestrator uses this behavior to scale in a StatefulSet, in a
+destination cluster, and then wants to revert the PodManagementPolicy back to
+default, the StatefulSet would need to be deleted, and re-created.
+
+**KEP-3521**: [KEP-3521](https://github.com/kubernetes/enhancements/issues/3521)
+proposes a Pod `.spec` level API that enables a pod to be paused at the initial
+scheduling phase of pod lifecycle. This provides granular control of which pods
+should be started and running (active) and which pods shouldn't be scheduled
+(standby). An orchestrator can leverage control over specific pod scheduling,
+without making changes to the StatefulSet controller, as the StatefulSet
+controller is in control of creating pods.
+
+If the StatefulSet controller is using OrderedReady Pod Management, pausing
+scheduling can result in a pod being marked as not Ready. This will prevent
+the StatefulSet controller from actuating updates to higher ordinal pods (eg:
+pod `m` will not be created if pod `n` is unhealthy, where `m` > `n`). This
+may increase orchestrator complexity, by requiring an orchestrator of a
+migration to leverage Parallel Pod Management during a migration, and then
+re-create a StatefulSet (using `--cascade=orphan`) to revert back to
+`OrderedReady` if desired.
+
+Additionally, if modifying a StatefulSet template is undesired, a webhook must
+be introduced to mark Pods as paused when they are created. This adds a layer
+of complexity to an orchestrator operator, since it needs both an operator
+component that is capable of making changes to ApiServer, and a webhook that is
+reading from a consistent migration state.
+
+### Alternatives without any API changes
+
+**Orphan Pods**: Users can orphan pods from a StatefulSet, migrate pods across a
+namespace or cluster, and create a new StatefulSet to manage pods upon
+migration. In the case of pod eviction or failure, pods will need to be manually
+recreated, requiring manual intervention and constant monitoring.
+
+**Backup/Restore**: Users can backup and restore a StatefulSet (and underlying
+storage) in a new namespace or cluster. Doing so requires the existing
+StatefulSet to be deleted, for underlying storage to be backed up and restored,
+resulting in downtime for the stateful application.
 
 ## Infrastructure Needed (Optional)