Support disable DR when cluster is unavailable #1147

nirs · 2023-11-22T22:20:10Z

When cluster is not available, disabling DR get suck waiting for the vrgs.

On the remaining cluster, the vrg is stuck trying to delete data from the s3 store on the unavailable cluster.

On the hub, the drpc is stuck waiting for the stuck vrg, and for the stale vrg reported bu the managedclusterview of the unavailable cluster.

To fix the issue we require the admin to delete the failed drcluter after failing over to another cluster. The system will avoid accessing s3 profiles on deleted drcluster and do not wait for VRG on deleted clusters.

Disable DR flow when a cluster is not available

For each application on the failed cluster, fail over to the secondary cluster
Mark the failed drcluster for deletion on the hub
```
kubectl delete drcluster clustername --wait=false --context hub
```
Using --wait=false since the drcluster will not be deleted until the drpolicy referencing it will be deleted, and the drpolicy will not be deleted before the drpc referencing it will be deleted.
For each application disable DR:
1. Ensure Placement is pointing to failover cluster
2. Wait until the vrg for the application has only one s3 profile:
```
kubectl get vrg -n appnamespace --context cluster2 \
    -o jsonpath='{.items[0].spec.s3Profiles}' | jq
```
  This can take few minutes after the drcluster was deleted.
3. Delete the drpc resource for the OCM application on the hub
```
kubectl delete drpc drpcname -n drpcnamespace --context hub
```
  If the drpc delete does not finish, check that the vrg does not have s3 profile for the failed cluster. If it does, you need to edit the vrg and remove it manually.
4. Change the Placement to be reconciled by OCM
```
kubectl annotate placement placementname -n drpcnamespace \
   cluster.open-cluster-management.io/experimental-scheduling-disable-
```

Known issues

Docs on disabling DR needed
It can take few minutes after deleting the drcluster until the vrg on failover cluster updates to exclude the s3 profile. If the drpc is deleted after the vrg is updated, the delete will complete quickly.
If disable DR is started before the vrg is updated on the failover cluster, disabling DR will not complete, since deleting a drpc deletes the manifestwork for the vrg, and after that no change in the manifest work is propagated to the managed cluster.
- You need to edit the vrg on the managed cluster and delete the s3 profile for the deleted drcluster
With volsync (cephfs) disabling dr also delete the PVC: [VolSync] Ensure Application PVC is not Automatically Garbage Collected upon VRG Deletion #1150

Testing

Update the the ramen-operator image in csv on the hub and drcluster:
```
image: quay.io/nirsof/ramen-operator:drcluster-unavailable-v6
```

If upgrading an older version (e.g. 4.14), apply the CRDs from main:

for cluster in hub cluster1 cluster2; do
    oc apply -k 'https://github.com/RamenDR/ramen.git/config/crd' --context $cluster
done

Status

Tested using RamenDR/ocm-ramen-samples#41 on top of #1153

basic-test

drenv:

busybox-regional-rbd-deploy
busybox-regional-rbd-sts
busybox-regional-rbd-ds

ocp:

busybox-regional-rbd-deploy
busybox-regional-cephfs-deploy

Disable DR flow

Tested without the commit updating the vrg.

drenv:

busybox-regional-rbd-deploy

ocp:

rbd sub
rbd appset
cephfs sub
cephfs appset

Todo:

Test that VRG is regenerated when drclsuter is deleted
- took 8 minutes because we don't watch drcluster delete events
Test watching delete events - need more work on filtering drclusters events

Fixes #1139

nirs · 2023-12-05T15:05:18Z

Note from discussion with @ShyamsundarR

using annotation (or label) allows going back and making the cluster availlable again, which can lead to stale data in s3 store (was not deleted during disable dr). We want one way change that cannot be undone
Currently this works only when the annotation is applied before failover, since exiting VRG are not updated, so user need to add the annotation before failover. But failover is likely to be done quickly (to restore the service for users) before understanding that the cluster will never be available again. Since we don't want to be able to make a cluster available after it was marked as unavailable, we want to do this before disable dr, or maybe only if disable dr got stuck.
We can delete the drcluster instead of adding an annotation - this cannot be undone, and does not require teaching users about the new annotation.
The code should behave correctly when a drcluster is deleted, but not sure this is tested
Ideally when deleting the drcluster that drpolicy will become invalid, but not sure that this will be easy to do now
To update VRG when drcluster changes we can call the code creating or updating the vrg on every reconcile (it should read state from the cache so this should not be an issue)

Wanted behavior when drcluster is deleted:

Creating new drpc must fail
Relocate must fail
Failover must succeed

nirs · 2023-12-11T22:27:00Z

@ShyamsundarR @BenamarMk please review.

In cluster replacement flow, we may need to disable DR when a drcluster is not available, which breaks disable DR flow in many places. To allow cleanup to proceed on the hub and on the remaining managed cluster, the user will have to delete the drcluster for the failed cluster. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

When getting or updating vrgs from managed clusters, we passed the drpolicy, and used it's drcluster names for fetching the vrg. To handle unavailable drcluster, we need the DRCluster type instead of it name, but we don't have the required context and client to get it in getVRGsFromManagedClusters(). Fixed by getting the drclusters in the caller when we have the context and client, and pass down the drcluster list instead of the drpolicy. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

When drcluster is unreachable the ManagedClusterView may continue to report the last status for the VRG and no error. This fails disable DR when we wait until all VRGs are deleted. Fixed by ignoring VRGs on deleted drclusters. Once the VRG on the available drcluster is deleted, VRG count reach zero and we can complete drpc deletion. Since this change is in the getVRGsFromManagedClusters() is also affects creatingDRPCInstance and maintainance mode. I'm not sure about these changes yet, but it seems a good idea not to handle VRG on a deleted drcluster. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

ShyamsundarR

Reviewed changes done. Still reviewing impact to other parts of the code because of the changes and other use-cases that may need to handle DRCluster in a deleted state.

controllers/drplacementcontrol.go

ShyamsundarR · 2023-12-12T14:42:21Z

controllers/drplacementcontrol.go

@@ -1453,16 +1453,8 @@ func (d *DRPCInstance) createVRGManifestWork(homeCluster string, repState rmn.Re
 	return nil
 }

+// ensureVRGManifestWork ensures that the VRG manifestwork exists and matches the current VRG state.


We now update MW for VRG from the cached MCV VRG replicationState when this function is invoked.

While this is not a problem when called from ensureActionCompleted where its callers ensure that the cached VRG is primary and we then invoke ensureVRGManifestWork.

There are some gaps in the invocation from RunInitialDeployment that needs closer inspection.

For example the code that decides a VRG exists on one of the clusters is not checking if that VRG exists as primary, if that was done using the replication state from the MCV is fine, as it would always be primary when this function is called.

This function ensureVRGManifestWork is currently only safe when VRG MCV is Primary and desired VRG MW is also Primary.

I'll update the documentation and look at callers.

Is this issue new? this changes only calls when the vrg exists - this could be called for the wrong vrg state before this change if the vrg manifest work was deleted, no?

nirs · 2023-12-12T16:36:21Z

controllers/drplacementcontrol.go

+	for i := range d.drClusters {
+		c := &d.drClusters[i]
+		if !drClusterIsDeleted(c) {
+			r = append(r, c.Spec.S3ProfileName)


This works, but took 8 minutes before the drpc controller detect the cluster is deleted when updating the vrg. Looks like we have a caching issue.

drcluster controller find tha the drcluster was deleted

2023-12-12T16:15:34.356Z INFO controllers.DRCluster controllers/drcluster_controller.go:272 reconcile enter {"name": "dr1", "rid": "bf896718-4c69-41a0-bb95-9182e04a42ac"} 2023-12-12T16:15:34.356Z INFO controllers.DRCluster controllers/drcluster_controller.go:453 delete {"name": "dr1", "rid": "bf896718-4c69-41a0-bb95-9182e04a42ac"} 2023-12-12T16:15:34.357Z INFO controllers.DRCluster controllers/drcluster_controller.go:298 reconcile exit {"name": "dr1", "rid": "bf896718-4c69-41a0-bb95-9182e04a42ac"} 2023-12-12T16:15:34.357Z ERROR controller/controller.go:329 Reconciler error {"controller": "drcluster", "controllerGroup": "ramendr.openshift.io", "controllerKind": "DRCluster", "DRCluster": {"name":"dr1"}, "namespace": "", "name": "dr1", "reconcileID": "33b6081b-3d31-48a0-866c-88ab6a038e7b", "error": "drclusters undeploy: drcluster 'dr1' referenced in one or more existing drPolicy resources"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:329 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:274 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/internal/controller/controller.go:235

7 minutes later - drplacement controller detects that the cluster was deleted

2023-12-12T16:23:29.231Z INFO controllers.DRPlacementControl controllers/drplacementcontrol_controller.go:594 Entering reconcile loop {"DRPC": "busybox-regional-rbd-deploy/busybox-regional-rbd-deploy-drpc", "rid": "2e059f82-a46a-48bd-b975-a0619a608aa7"} ... 2023-12-12T16:23:29.234Z INFO controllers.DRPlacementControl controllers/drplacementcontrol_controller.go:1486 Skipping VRG on deleted drcluster {"DRPC": "busybox-regional-rbd-deploy/busybox-regional-rbd-deploy-drpc", "rid": "2e059f82-a46a-48bd-b975-a0619a608aa7", "drcluster": "dr1", "vrg": "busybox-regional-rbd-deploy-drpc"}

In the same reconcile loop - it updates the vrg manifestwork - including s3 store on the deleted cluster

2023-12-12T16:23:29.236Z INFO controllers.DRPlacementControl controllers/drplacementcontrol.go:1606 ensureNamespaceExistsOnManagedCluster: namespace 'busybox-regional-rbd-deploy' exists on cluster dr2: true {"DRPC": "busybox-regional-rbd-deploy/busybox-regional-rbd-deploy-drpc", "rid": "2e059f82-a46a-48bd-b975-a0619a608aa7"} 2023-12-12T16:23:29.236Z INFO controllers.DRPlacementControl controllers/drplacementcontrol.go:1434 Creating VRG ManifestWork {"DRPC": "busybox-regional-rbd-deploy/busybox-regional-rbd-deploy-drpc", "rid": "2e059f82-a46a-48bd-b975-a0619a608aa7", "Last State:": "FailedOver", "cluster": "dr2"} 2023-12-12T16:23:29.236Z INFO controllers.DRPlacementControl util/mw_util.go:121 Create or Update manifestwork busybox-regional-rbd-deploy-drpc:busybox-regional-rbd-deploy:dr2:{TypeMeta:{Kind:VolumeReplicationGroup APIVersion:ramendr.openshift.io/v1alpha1} ObjectMeta:{Name:busybox-regional-rbd-deploy-drpc GenerateName: Namespace:busybox-regional-rbd-deploy SelfLink: UID: ResourceVersion: Generation:0 CreationTimestamp:0001-01-01 00:00:00 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[] OwnerReferences:[] Finalizers:[] ManagedFields:[]} Spec:{PVCSelector:{MatchLabels:map[appname:busybox] MatchExpressions:[]} ReplicationState:primary S3Profiles:[minio-on-dr2] Async:0xc000fe3a90 Sync:<nil> VolSync:{RDSpec:[] Disabled:false} PrepareForFinalSync:false RunFinalSync:false Action:Failover KubeObjectProtection:<nil>} Status:{State: ProtectedPVCs:[] Conditions:[] ObservedGeneration:0 LastUpdateTime:0001-01-01 00:00:00 +0000 UTC KubeObjectProtection:{CaptureToRecoverFrom:<nil>} PrepareForFinalSyncComplete:false FinalSyncComplete:false LastGroupSyncTime:<nil> LastGroupSyncDuration:nil LastGroupSyncBytes:<nil>}} {"DRPC": "busybox-regional-rbd-deploy/busybox-regional-rbd-deploy-drpc", "rid": "2e059f82-a46a-48bd-b975-a0619a608aa7"} 2023-12-12T16:23:29.236Z INFO controllers.DRPlacementControl util/mw_util.go:499 Updating Manifestwork {"DRPC": "busybox-regional-rbd-deploy/busybox-regional-rbd-deploy-drpc", "rid": "2e059f82-a46a-48bd-b975-a0619a608aa7", "name": "busybox-regional-rbd-deploy-drpc-busybox-regional-rbd-deploy-vrg-mw", "namespace": "dr2"}

Few milliseconds later - we update the VRG without the s3 profile of the failed cluster

2023-12-12T16:23:29.236Z INFO controllers.DRPlacementControl util/mw_util.go:121 Create or Update manifestwork busybox-regional-rbd-deploy-drpc:busybox-regional-rbd-deploy:dr2:{TypeMeta:{Kind:VolumeReplicationGroup APIVersion:ramendr.openshift.io/v1alpha1} ObjectMeta:{Name:busybox-regional-rbd-deploy-drpc GenerateName: Namespace:busybox-regional-rbd-deploy SelfLink: UID: ResourceVersion: Generation:0 CreationTimestamp:0001-01-01 00:00:00 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[] OwnerReferences:[] Finalizers:[] ManagedFields:[]} Spec:{PVCSelector:{MatchLabels:map[appname:busybox] MatchExpressions:[]} ReplicationState:primary S3Profiles:[minio-on-dr2] Async:0xc000fe3a90 Sync:<nil> VolSync:{RDSpec:[] Disabled:false} PrepareForFinalSync:false RunFinalSync:false Action:Failover KubeObjectProtection:<nil>} Status:{State: ProtectedPVCs:[] Conditions:[] ObservedGeneration:0 LastUpdateTime:0001-01-01 00:00:00 +0000 UTC KubeObjectProtection:{CaptureToRecoverFrom:<nil>} PrepareForFinalSyncComplete:false FinalSyncComplete:false LastGroupSyncTime:<nil> LastGroupSyncDuration:nil LastGroupSyncBytes:<nil>}} {"DRPC": "busybox-regional-rbd-deploy/busybox-regional-rbd-deploy-drpc", "rid": "2e059f82-a46a-48bd-b975-a0619a608aa7"}

controllers/drplacementcontrol_controller.go

nirs · 2023-12-12T19:30:11Z

controllers/drplacementcontrol_controller.go

-			return false
+			log.Info("Delete event")
+
+			return true


This does not work, looks like FilterDRCluster is filtering only drcluster that are the failover clusters for some drpcs, including skiping for metro dr. This code is complex and we don't have enough time to modify it now.

I'll work on this later.

controllers/util/mw_util.go

ShyamsundarR

From the use-case perspective of deleting a lost DRCluster and prior to that failing over workloads from the lost cluster to the surviving cluster the code changes are fine.

There are a set of things we need to take care in the future as follows:

(existing issue?) DRCluster delete should delete volsync addon created at the hub
We should fail to process newer DRPC instances created referencing a DRPolicy with a deleted DRCluster
- e.g.
  
  ramen/controllers/drplacementcontrol_controller.go
  
  Line 974 in fe846a6
  
  // TODO: What if the DRCluster is deleted? If new DRPC fail reconciliation
  - We cannot error above, as existing DRPC would also detect the same and error out without processing newer actions (say failover from deleted cluster)
- Need to do this at a point before we apply the finalizer to the DRPC instance, such that we refuse to process it at all
If current DRPC is Primary on deleted cluster, then secondary cluster resources may not get garbage collected; if DRPC is deleted in this state
- RBD secondary images may get left behind
- MetroDR case may have PVs left behind on the surviving primary cluster?
- Volsync resources on Secondary should be cleaned up for such situations
- Should we insist on failover before DRPC deletion? But how, DRPC is already deleted when detecting this and so further spec changes are not allowed. (unsure how to handle this at the moment)
(good to have) On DRPC deletion it can garbage collect the ManifestWork for the namespace as well on the deleted DRCluster to leave a cleaner setup
(tidier status) cleanupSecondaries may keep waiting for MCV to report VRG is deleted on the deleted DRCluster (which will never happen)
- This leads to failedOver state, but DRPC stuck on cleaning up state, which can be avoided handling a deleted DRCluster in this scenario
- Similarly cleanupForVolSync may also be stuck attempting to cleanup
- When we handle the above we would need to also ensure we do not attempt to setup volsync again EnsureVolSyncReplicationSetup
Failover to deleted DRCluster should be prevented
Relocate to/from deleted cluster is currently allowed and should be prevented in both cases
createVRGManifestWorkAsPrimary MAY update VRG as primary from secondary (volsync case) and later updated with correct S3 stores in ensureVRGManifestWork
- Can be corrected, or marked as known for future code reading to avoid the question

controllers/drplacementcontrol.go

When creating s3 profile names, include only the s3 profiles from available drclusters. This affects generating a VRG for manifestwork. When validating failover prerequisites we already use only the failover clsuter, and this cluster cannot be deleted. Another case of drcluster deploy/undeploy flow, which still use all drclusters and may require more work. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

- More consistent logging - "Creating ManifestWork" or "Updating ManifestWork". - Extract key variable to clean up the Client.Get() calls - Eliminate unneeded temporary err and retryErr variables - Remove commented code Signed-off-by: Nir Soffer <nsoffer@redhat.com>

When a drcluster is deleted, the VRG s3 profiles list changes, so we need to update the manifestwork. In ensureVRGManifestWork() we skipped updating the manifestwork if the VRG exists, now we always update the manifestwork. This requires little bit more work for every reconcile, but should be efficient enough since the vrg and manifestwork are cached. Issues: - It is not clear if ensureVRGManifestWork() is safe only for primary vrg, and it callers are using it correctly. But if we had wrong callers before, this change increases the impact of a wrong call, updating existing VRG which was not updated before. Add a TODO to look at this later, so we can test replace cluster flow now. - The VRG is updated few minutes after detecting a deleted drcluster, since we don't watch drcluster delete events, and we filter drclsuter events only for drpc failing over to updated clusters. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

nirs requested review from ShyamsundarR and BenamarMk November 22, 2023 22:20

This comment was marked as outdated.

Sign in to view

nirs force-pushed the drcluster-unavailable branch 3 times, most recently from 22dff67 to f404c45 Compare November 30, 2023 22:26

nirs force-pushed the drcluster-unavailable branch 3 times, most recently from 8b29b1f to 355263f Compare December 11, 2023 21:39

nirs marked this pull request as ready for review December 11, 2023 21:39

nirs added 3 commits December 12, 2023 13:55

nirs force-pushed the drcluster-unavailable branch from 355263f to dd536d8 Compare December 12, 2023 11:56

ShyamsundarR reviewed Dec 12, 2023

View reviewed changes

nirs commented Dec 12, 2023

View reviewed changes

BenamarMk reviewed Dec 12, 2023

View reviewed changes

controllers/drplacementcontrol_controller.go Show resolved Hide resolved

nirs commented Dec 12, 2023

View reviewed changes

nirs force-pushed the drcluster-unavailable branch from 8bf9444 to fd37ec5 Compare December 12, 2023 19:34

nirs commented Dec 12, 2023

View reviewed changes

controllers/util/mw_util.go Outdated Show resolved Hide resolved

nirs force-pushed the drcluster-unavailable branch from fd37ec5 to 0e46740 Compare December 12, 2023 20:04

ShyamsundarR reviewed Dec 12, 2023

View reviewed changes

controllers/drplacementcontrol.go Outdated Show resolved Hide resolved

nirs force-pushed the drcluster-unavailable branch from 0e46740 to 1bf23a5 Compare December 12, 2023 20:11

nirs added 3 commits December 12, 2023 22:30

nirs force-pushed the drcluster-unavailable branch from 1bf23a5 to a841158 Compare December 12, 2023 20:30

ShyamsundarR approved these changes Dec 13, 2023

View reviewed changes

BenamarMk approved these changes Dec 13, 2023

View reviewed changes

ShyamsundarR merged commit b8bedb9 into RamenDR:main Dec 13, 2023
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support disable DR when cluster is unavailable #1147

Support disable DR when cluster is unavailable #1147

nirs commented Nov 22, 2023 •

edited

Loading

This comment was marked as outdated.

nirs commented Dec 5, 2023

nirs commented Dec 11, 2023

ShyamsundarR left a comment

ShyamsundarR Dec 12, 2023

nirs Dec 12, 2023

nirs Dec 12, 2023

nirs Dec 12, 2023 •

edited

Loading

nirs Dec 12, 2023

ShyamsundarR left a comment

Support disable DR when cluster is unavailable #1147

Support disable DR when cluster is unavailable #1147

Conversation

nirs commented Nov 22, 2023 • edited Loading

Disable DR flow when a cluster is not available

Known issues

Testing

Status

basic-test

Disable DR flow

This comment was marked as outdated.

nirs commented Dec 5, 2023

nirs commented Dec 11, 2023

ShyamsundarR left a comment

Choose a reason for hiding this comment

ShyamsundarR Dec 12, 2023

Choose a reason for hiding this comment

nirs Dec 12, 2023

Choose a reason for hiding this comment

nirs Dec 12, 2023

Choose a reason for hiding this comment

nirs Dec 12, 2023 • edited Loading

Choose a reason for hiding this comment

nirs Dec 12, 2023

Choose a reason for hiding this comment

ShyamsundarR left a comment

Choose a reason for hiding this comment

nirs commented Nov 22, 2023 •

edited

Loading

nirs Dec 12, 2023 •

edited

Loading