Reconcile related drpcs when drcluster is deleted #1168

nirs · 2024-01-08T16:43:27Z

Watch also DRCluster delete events
When filtering drpcs consider all drpcs referencing a deleted drcluster.

With this change, when a drcluster is deleted, the drpc controller should update the VRG ManifestWork on the remaining cluster. Previously this happened minutes after a drcluster was deleted.

Testing:

image: quay.io/nirsof/ramen-operator:update-vrg-v1

Status:

Test disable dr flow with single drpc with drenv and busybox rbd deployment on top of Test demo #1153
Test disable dr flow with multiple apps (deploy, sts, ds)
Run basic test with 3 subscription rbd apps (deploy, sts, ds) to check for regression
Test on OCP/volsync
- This small internal change should work on any replication/storage/platform. I'll do quick test on OCP later.

- When filtering update events, consider also the update when the new object is marked for deletion. - When filtering drpcs consider all drpcs referencing a deleted drcluster. With this change, when a drcluster is deleted, the drpc controller updates the VRG ManifestWork on the remaining cluster. Previously this happened minutes after a drcluster was deleted. Notes: - We don't get a delete event when the drcluster is deleted, but an update event. I don't know if this is a bug in controller runtime or expected behavior. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

nirs · 2024-01-08T19:33:23Z

Testing with drenv

Deploy busybox rbd deployment (app running on dr1)
Enable DR
Suspend vm dr1
```
virsh -c qemu:///system suspend dr1
```
Failover to dr2
- waiting for peer ready times out - expected

Dump vrg on dr2 before deleting the cluster

kubectl get vrg busybox-regional-rbd-deploy-drpc -n busybox-regional-rbd-deploy \
    -o yaml --context dr2 > vrg-before-delete-drcluster.yaml

Delete drcluter dr1

kubectl delete drcluster dr1 --wait=false --context hub

Dump vrg on dr2 after deleting the cluster

kubectl get vrg busybox-regional-rbd-deploy-drpc -n busybox-regional-rbd-deploy \
    -o yaml --context dr2 > vrg-after-delete-drcluster.yaml

Dump ramen hub logs

kubectl logs deploy/ramen-hub-operator -n ramen-system --context hub > ramen-hub.log

VRG diff show that minio-on-dr1 was removed

...
       appname: busybox
   replicationState: primary
   s3Profiles:
-  - minio-on-dr1
   - minio-on-dr2
   volSync: {}
 status:
   conditions:
...

Interesting events from ramen log

Gettting update event when drclsuter was deleted

2024-01-08T18:46:06.190Z        INFO    DRPCPredicate.DRCluster controllers/drplacementcontrol_controller.go:275        Update event
2024-01-08T18:46:06.190Z        INFO    controllers/drplacementcontrol_controller.go:623        DRPC Map: Filtering DRCluster (dr1)

New logs when filtering drpcs using this cluster

2024-01-08T18:46:06.190Z        INFO    DRPCFilter.DRCluster    controllers/drplacementcontrol_controller.go:405        Found DRPolicy referencing DRCluster    {"cluster": {"metadata":{"name":"dr1","uid":"215969da-fde4-4cbf-8c56-775523a90090","resourceVersion":"5709","generation":2,"creationTimestamp":"2024-01-08T18:38:30Z","deletionTimestamp":"2024-01-08T18:46:06Z","deletionGracePeriodSeconds":0,"labels":{"cluster.open-cluster-management.io/backup":"ramen"},"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"ramendr.openshift.io/v1alpha1\",\"kind\":\"DRCluster\",\"metadata\":{\"annotations\":{},\"name\":\"dr1\"},\"spec\":{\"region\":\"west\",\"s3ProfileName\":\"minio-on-dr1\"}}\n"},"finalizers":["drclusters.ramendr.openshift.io/ramen"],"managedFields":[{"manager":"kubectl-client-side-apply","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-08T18:38:30Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{".":{},"f:region":{},"f:s3ProfileName":{}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-08T18:38:30Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:finalizers":{".":{},"v:\"drclusters.ramendr.openshift.io/ramen\"":{}},"f:labels":{".":{},"f:cluster.open-cluster-management.io/backup":{}}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-08T18:38:30Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:conditions":{},"f:phase":{}}},"subresource":"status"}]},"spec":{"region":"west","s3ProfileName":"minio-on-dr1"},"status":{"phase":"Available","conditions":[{"type":"Fenced","status":"False","observedGeneration":1,"lastTransitionTime":"2024-01-08T18:38:30Z","reason":"Clean","message":"Cluster Clean"},{"type":"Clean","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-08T18:38:30Z","reason":"Clean","message":"Cluster Clean"},{"type":"Validated","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-08T18:38:30Z","reason":"Succeeded","message":"Validated the cluster"}]}}, "drpolicy": "busybox-regional-rbd-deploy"}
2024-01-08T18:46:06.190Z        INFO    DRPCFilter.DRCluster    controllers/drplacementcontrol_controller.go:445        Found DRPC referencing drpolicy {"cluster": {"metadata":{"name":"dr1","uid":"215969da-fde4-4cbf-8c56-775523a90090","resourceVersion":"5709","generation":2,"creationTimestamp":"2024-01-08T18:38:30Z","deletionTimestamp":"2024-01-08T18:46:06Z","deletionGracePeriodSeconds":0,"labels":{"cluster.open-cluster-management.io/backup":"ramen"},"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"ramendr.openshift.io/v1alpha1\",\"kind\":\"DRCluster\",\"metadata\":{\"annotations\":{},\"name\":\"dr1\"},\"spec\":{\"region\":\"west\",\"s3ProfileName\":\"minio-on-dr1\"}}\n"},"finalizers":["drclusters.ramendr.openshift.io/ramen"],"managedFields":[{"manager":"kubectl-client-side-apply","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-08T18:38:30Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{".":{},"f:region":{},"f:s3ProfileName":{}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-08T18:38:30Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:finalizers":{".":{},"v:\"drclusters.ramendr.openshift.io/ramen\"":{}},"f:labels":{".":{},"f:cluster.open-cluster-management.io/backup":{}}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-08T18:38:30Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:conditions":{},"f:phase":{}}},"subresource":"status"}]},"spec":{"region":"west","s3ProfileName":"minio-on-dr1"},"status":{"phase":"Available","conditions":[{"type":"Fenced","status":"False","observedGeneration":1,"lastTransitionTime":"2024-01-08T18:38:30Z","reason":"Clean","message":"Cluster Clean"},{"type":"Clean","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-08T18:38:30Z","reason":"Clean","message":"Cluster Clean"},{"type":"Validated","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-08T18:38:30Z","reason":"Succeeded","message":"Validated the cluster"}]}}, "name": "busybox-regional-rbd-deploy-drpc", "namespace": "busybox-regional-rbd-deploy", "drpolicy": "busybox-regional-rbd-deploy"}
2024-01-08T18:46:06.191Z        INFO    controllers.DRPlacementControl  controllers/drplacementcontrol_controller.go:677        Entering reconcile loop {"DRPC": "busybox-regional-rbd-deploy/busybox-regional-rbd-deploy-drpc", "rid": "236ae8c1-8353-487b-a03a-c970fb608f01"}

Updating VRG manifest work without minio-on-dr1 s3 profile

2024-01-08T18:46:06.206Z        INFO    controllers.DRPlacementControl  util/mw_util.go:121     Create or Update manifestwork busybox-regional-rbd-deploy-drpc:busybox-regional-rbd-deploy:dr2:{TypeMe
ta:{Kind:VolumeReplicationGroup APIVersion:ramendr.openshift.io/v1alpha1} ObjectMeta:{Name:busybox-regional-rbd-deploy-drpc GenerateName: Namespace:busybox-regional-rbd-deploy SelfLink: UID: Resourc
eVersion: Generation:0 CreationTimestamp:0001-01-01 00:00:00 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[drplacementcontrol.ramendr.openshift.io/d
estination-cluster:dr2] OwnerReferences:[] Finalizers:[] ManagedFields:[]} Spec:{PVCSelector:{MatchLabels:map[appname:busybox] MatchExpressions:[]} ReplicationState:primary S3Profiles:[minio-on-dr2]
 Async:0xc000cb3b80 Sync:<nil> VolSync:{RDSpec:[] Disabled:false} PrepareForFinalSync:false RunFinalSync:false Action:Failover KubeObjectProtection:<nil>} Status:{State: ProtectedPVCs:[] Conditions:
[] ObservedGeneration:0 LastUpdateTime:0001-01-01 00:00:00 +0000 UTC KubeObjectProtection:{CaptureToRecoverFrom:<nil>} PrepareForFinalSyncComplete:false FinalSyncComplete:false LastGroupSyncTime:<ni
l> LastGroupSyncDuration:nil LastGroupSyncBytes:<nil>}} {"DRPC": "busybox-regional-rbd-deploy/busybox-regional-rbd-deploy-drpc", "rid": "236ae8c1-8353-487b-a03a-c970fb608f01"}
2024-01-08T18:46:06.206Z        INFO    controllers.DRPolicy    util/secrets_util.go:541        Add Secret      {"DRPolicy": "busybox-regional-rbd-deploy", "rid": "4f9c75f5-588e-426e-93f1-54a8420ae3
81", "cluster": "dr2", "secret": "ramen-s3-secret-dr1"}
2024-01-08T18:46:06.206Z        INFO    controllers.DRPlacementControl  util/mw_util.go:499     Updating ManifestWork   {"DRPC": "busybox-regional-rbd-deploy/busybox-regional-rbd-deploy-drpc", "rid": "236ae8c1-8353-487b-a03a-c970fb608f01", "name": "busybox-regional-rbd-deploy-drpc-busybox-regional-rbd-deploy-vrg-mw", "namespace": "dr2"}
2024-01-08T18:46:06.209Z        INFO    controllers.DRPolicy    util/secrets_util.go:541        Add Secret      {"DRPolicy": "busybox-regional-rbd-deploy", "rid": "4f9c75f5-588e-426e-93f1-54a8420ae381", "cluster": "dr2", "secret": "ramen-s3-secret-dr2"}

Timeline

2024-01-08T18:46:06.190Z - detecting the deletion
2024-01-08T18:46:06.206Z - updating manifest work

Logs and resources

update-vrg.tar.gz

nirs · 2024-01-09T13:33:50Z

Testing multiple apps

Deploy and enable dr for 3 apps

Deploy 3 apps and relocate if need so all run on clsuter dr1.

basic-test/deploy -c configs/k8s/busybox-regional-rbd-deploy.yaml envs/regional-dr.yaml
basic-test/enable-dr -c configs/k8s/busybox-regional-rbd-deploy.yaml envs/regional-dr.yaml

basic-test/deploy -c configs/k8s/busybox-regional-rbd-sts.yaml envs/regional-dr.yaml
basic-test/enable-dr -c configs/k8s/busybox-regional-rbd-sts.yaml envs/regional-dr.yaml
asic-test/relocate -c configs/k8s/busybox-regional-rbd-sts.yaml envs/regional-dr.yaml

basic-test/deploy -c configs/k8s/busybox-regional-rbd-ds.yaml envs/regional-dr.yaml
basic-test/enable-dr -c configs/k8s/busybox-regional-rbd-ds.yaml envs/regional-dr.yaml

State before simulating disaster:

$ kubectl get drpc -A --context hub
NAMESPACE                     NAME                               AGE     PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE
busybox-regional-rbd-deploy   busybox-regional-rbd-deploy-drpc   7m46s   dr1                                                 Deployed
busybox-regional-rbd-ds       busybox-regional-rbd-ds-drpc       7m28s   dr1                                                 Deployed
busybox-regional-rbd-sts      busybox-regional-rbd-sts-drpc      7m36s   dr1                                  Relocate       Relocated

Simulate disaster

$ virsh -c qemu:///system list
 Id   Name   State
----------------------
 10   dr1    running
 11   dr2    running
 12   hub    running

$ virsh -c qemu:///system suspend dr1
Domain 'dr1' suspended

Failover all apps

basic-test/failover -c configs/k8s/busybox-regional-rbd-deploy.yaml envs/regional-dr.yaml

asic-test/failover -c configs/k8s/busybox-regional-rbd-sts.yaml envs/regional-dr.yaml

basic-test/failover -c configs/k8s/busybox-regional-rbd-ds.yaml envs/regional-dr.yaml

The failover will timeout waiting for PeerReady condition - expected
since dr1 is not available.

State after all apps failed over (Available condition met):

$ kubectl get drpc -A --context hub
NAMESPACE                     NAME                               AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE
busybox-regional-rbd-deploy   busybox-regional-rbd-deploy-drpc   25m   dr1                dr2               Failover       FailedOver
busybox-regional-rbd-ds       busybox-regional-rbd-ds-drpc       25m   dr1                dr2               Failover       FailedOver
busybox-regional-rbd-sts      busybox-regional-rbd-sts-drpc      25m   dr1                dr2               Failover       FailedOver

$ kubectl get vrg,vr -A --context dr2
NAMESPACE                     NAME                                                                           DESIREDSTATE   CURRENTSTATE
busybox-regional-rbd-deploy   volumereplicationgroup.ramendr.openshift.io/busybox-regional-rbd-deploy-drpc   primary        Primary
busybox-regional-rbd-ds       volumereplicationgroup.ramendr.openshift.io/busybox-regional-rbd-ds-drpc       primary        Primary
busybox-regional-rbd-sts      volumereplicationgroup.ramendr.openshift.io/busybox-regional-rbd-sts-drpc      primary        Primary

NAMESPACE                     NAME                                                                  AGE     VOLUMEREPLICATIONCLASS   PVCNAME            DESIREDSTATE   CURRENTSTATE
busybox-regional-rbd-deploy   volumereplication.replication.storage.openshift.io/busybox-pvc        2m42s   vrc-sample               busybox-pvc        primary        Primary
busybox-regional-rbd-ds       volumereplication.replication.storage.openshift.io/busybox-pvc        2m21s   vrc-sample               busybox-pvc        primary        Primary
busybox-regional-rbd-sts      volumereplication.replication.storage.openshift.io/varlog-busybox-0   2m33s   vrc-sample               varlog-busybox-0   primary        Primary
busybox-regional-rbd-sts      volumereplication.replication.storage.openshift.io/varlog-busybox-1   2m18s   vrc-sample               varlog-busybox-1   primary        Primary

Dump VRG before deleting the drcluster

kubectl get vrg -A -o yaml > vrgs-before-delete-cluster.yaml

Delete drcluster

kubectl delete drcluster dr1 --wait=false --context hub

Dump VRG after deleting the drcluster

kubectl get vrg -A -o yaml > vrgs-before-after-cluster.yaml

VRGs diff:

$ diff -u vrgs-before-delete-cluster.yaml vrgs-after-delete-cluster.yaml | grep -- '- minio-on-dr1'
-    - minio-on-dr1
-    - minio-on-dr1
-    - minio-on-dr1

Ramen hub logs:

Filtering DRPCs during update event with deleted drcluster:

$ kubectl logs deploy/ramen-hub-operator -n ramen-system --context hub | grep 'Found DRPC referencing drpolicy'
2024-01-09T13:09:50.988Z	INFO	DRPCFilter.DRCluster	controllers/drplacementcontrol_controller.go:445	Found DRPC referencing drpolicy	{"cluster": {"metadata":{"name":"dr1","uid":"60daba7c-f828-43fa-9b72-b2cb5dc5f49a","resourceVersion":"28069","generation":2,"creationTimestamp":"2024-01-09T12:39:04Z","deletionTimestamp":"2024-01-09T13:09:50Z","deletionGracePeriodSeconds":0,"labels":{"cluster.open-cluster-management.io/backup":"ramen"},"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"ramendr.openshift.io/v1alpha1\",\"kind\":\"DRCluster\",\"metadata\":{\"annotations\":{},\"name\":\"dr1\"},\"spec\":{\"region\":\"west\",\"s3ProfileName\":\"minio-on-dr1\"}}\n"},"finalizers":["drclusters.ramendr.openshift.io/ramen"],"managedFields":[{"manager":"kubectl-client-side-apply","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-09T12:39:04Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{".":{},"f:region":{},"f:s3ProfileName":{}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-09T12:39:04Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:finalizers":{".":{},"v:\"drclusters.ramendr.openshift.io/ramen\"":{}},"f:labels":{".":{},"f:cluster.open-cluster-management.io/backup":{}}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-09T12:39:05Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:conditions":{},"f:phase":{}}},"subresource":"status"}]},"spec":{"region":"west","s3ProfileName":"minio-on-dr1"},"status":{"phase":"Available","conditions":[{"type":"Fenced","status":"False","observedGeneration":1,"lastTransitionTime":"2024-01-09T12:39:04Z","reason":"Clean","message":"Cluster Clean"},{"type":"Clean","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-09T12:39:04Z","reason":"Clean","message":"Cluster Clean"},{"type":"Validated","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-09T12:39:05Z","reason":"Succeeded","message":"Validated the cluster"}]}}, "name": "busybox-regional-rbd-deploy-drpc", "namespace": "busybox-regional-rbd-deploy", "drpolicy": "busybox-regional-rbd-deploy"}
2024-01-09T13:09:50.988Z	INFO	DRPCFilter.DRCluster	controllers/drplacementcontrol_controller.go:445	Found DRPC referencing drpolicy	{"cluster": {"metadata":{"name":"dr1","uid":"60daba7c-f828-43fa-9b72-b2cb5dc5f49a","resourceVersion":"28069","generation":2,"creationTimestamp":"2024-01-09T12:39:04Z","deletionTimestamp":"2024-01-09T13:09:50Z","deletionGracePeriodSeconds":0,"labels":{"cluster.open-cluster-management.io/backup":"ramen"},"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"ramendr.openshift.io/v1alpha1\",\"kind\":\"DRCluster\",\"metadata\":{\"annotations\":{},\"name\":\"dr1\"},\"spec\":{\"region\":\"west\",\"s3ProfileName\":\"minio-on-dr1\"}}\n"},"finalizers":["drclusters.ramendr.openshift.io/ramen"],"managedFields":[{"manager":"kubectl-client-side-apply","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-09T12:39:04Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{".":{},"f:region":{},"f:s3ProfileName":{}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-09T12:39:04Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:finalizers":{".":{},"v:\"drclusters.ramendr.openshift.io/ramen\"":{}},"f:labels":{".":{},"f:cluster.open-cluster-management.io/backup":{}}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-09T12:39:05Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:conditions":{},"f:phase":{}}},"subresource":"status"}]},"spec":{"region":"west","s3ProfileName":"minio-on-dr1"},"status":{"phase":"Available","conditions":[{"type":"Fenced","status":"False","observedGeneration":1,"lastTransitionTime":"2024-01-09T12:39:04Z","reason":"Clean","message":"Cluster Clean"},{"type":"Clean","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-09T12:39:04Z","reason":"Clean","message":"Cluster Clean"},{"type":"Validated","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-09T12:39:05Z","reason":"Succeeded","message":"Validated the cluster"}]}}, "name": "busybox-regional-rbd-sts-drpc", "namespace": "busybox-regional-rbd-sts", "drpolicy": "busybox-regional-rbd-sts"}
2024-01-09T13:09:50.988Z	INFO	DRPCFilter.DRCluster	controllers/drplacementcontrol_controller.go:445	Found DRPC referencing drpolicy	{"cluster": {"metadata":{"name":"dr1","uid":"60daba7c-f828-43fa-9b72-b2cb5dc5f49a","resourceVersion":"28069","generation":2,"creationTimestamp":"2024-01-09T12:39:04Z","deletionTimestamp":"2024-01-09T13:09:50Z","deletionGracePeriodSeconds":0,"labels":{"cluster.open-cluster-management.io/backup":"ramen"},"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"ramendr.openshift.io/v1alpha1\",\"kind\":\"DRCluster\",\"metadata\":{\"annotations\":{},\"name\":\"dr1\"},\"spec\":{\"region\":\"west\",\"s3ProfileName\":\"minio-on-dr1\"}}\n"},"finalizers":["drclusters.ramendr.openshift.io/ramen"],"managedFields":[{"manager":"kubectl-client-side-apply","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-09T12:39:04Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{".":{},"f:region":{},"f:s3ProfileName":{}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-09T12:39:04Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:finalizers":{".":{},"v:\"drclusters.ramendr.openshift.io/ramen\"":{}},"f:labels":{".":{},"f:cluster.open-cluster-management.io/backup":{}}}}},{"manager":"manager","operation":"Update","apiVersion":"ramendr.openshift.io/v1alpha1","time":"2024-01-09T12:39:05Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{".":{},"f:conditions":{},"f:phase":{}}},"subresource":"status"}]},"spec":{"region":"west","s3ProfileName":"minio-on-dr1"},"status":{"phase":"Available","conditions":[{"type":"Fenced","status":"False","observedGeneration":1,"lastTransitionTime":"2024-01-09T12:39:04Z","reason":"Clean","message":"Cluster Clean"},{"type":"Clean","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-09T12:39:04Z","reason":"Clean","message":"Cluster Clean"},{"type":"Validated","status":"True","observedGeneration":1,"lastTransitionTime":"2024-01-09T12:39:05Z","reason":"Succeeded","message":"Validated the cluster"}]}}, "name": "busybox-regional-rbd-ds-drpc", "namespace": "busybox-regional-rbd-ds", "drpolicy": "busybox-regional-rbd-ds"}

Updating the VRG manifest work

2024-01-09T13:09:51.003Z        INFO    controllers.DRPlacementControl  util/mw_util.go:499     Updating ManifestWork   {"DRPC": "busybox-regional-rbd-ds/busybox-regional-rbd-ds-drpc", "rid": "1f75b7d4-4471-4fd9-b5af-8849c8268752", "name": "busybox-regional-rbd-ds-drpc-busybox-regional-rbd-ds-vrg-mw", "namespace": "dr2"}
...
2024-01-09T13:09:51.004Z        INFO    controllers.DRPlacementControl  util/mw_util.go:499     Updating ManifestWork   {"DRPC": "busybox-regional-rbd-sts/busybox-regional-rbd-sts-drpc", "rid": "f762233f-78ce-42cd-a9b3-b6f2ba874a93", "name": "busybox-regional-rbd-sts-drpc-busybox-regional-rbd-sts-vrg-mw", "namespace": "dr2"}
...
2024-01-09T13:09:51.004Z        INFO    controllers.DRPlacementControl  util/mw_util.go:499     Updating ManifestWork   {"DRPC": "busybox-regional-rbd-deploy/busybox-regional-rbd-deploy-drpc", "rid": "18b281c3-1d3d-4ac4-97d1-926ae3915037", "name": "busybox-regional-rbd-deploy-drpc-busybox-regional-rbd-deploy-vrg-mw", "namespace": "dr2"}

Disabling DR for all apps

$ basic-test/disable-dr -c configs/k8s/busybox-regional-rbd-deploy.yaml envs/regional-dr.yaml
2024-01-09 15:23:07,035 INFO    [disable-dr] Disable DR
2024-01-09 15:23:07,082 INFO    [disable-dr] Deleting 'drplacementcontrol.ramendr.openshift.io/busybox-regional-rbd-deploy-drpc'
2024-01-09 15:23:26,325 INFO    [disable-dr] Enabling OCM scheduling for 'placement.cluster.open-cluster-management.io/busybox-placement'
2024-01-09 15:23:26,388 INFO    [disable-dr] DR was disabled

$ basic-test/disable-dr -c configs/k8s/busybox-regional-rbd-sts.yaml envs/regional-dr.yaml
2024-01-09 15:23:18,388 INFO    [disable-dr] Disable DR
2024-01-09 15:23:18,436 INFO    [disable-dr] Deleting 'drplacementcontrol.ramendr.openshift.io/busybox-regional-rbd-sts-drpc'
2024-01-09 15:23:36,512 INFO    [disable-dr] Enabling OCM scheduling for 'placement.cluster.open-cluster-management.io/busybox-placement'
2024-01-09 15:23:36,573 INFO    [disable-dr] DR was disabled

$ basic-test/disable-dr -c configs/k8s/busybox-regional-rbd-ds.yaml envs/regional-dr.yaml
2024-01-09 15:23:39,791 INFO    [disable-dr] Disable DR
2024-01-09 15:23:39,839 INFO    [disable-dr] Deleting 'drplacementcontrol.ramendr.openshift.io/busybox-regional-rbd-ds-drpc'
2024-01-09 15:23:44,357 INFO    [disable-dr] Enabling OCM scheduling for 'placement.cluster.open-cluster-management.io/busybox-placement'
2024-01-09 15:23:44,419 INFO    [disable-dr] DR was disabled

State after disabling dr:

$ kubectl get deploy,pod,pvc -n busybox-regional-rbd-deploy --context dr2
NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/busybox   1/1     1            1           24m

NAME                           READY   STATUS    RESTARTS   AGE
pod/busybox-7c4d67bf49-sc89x   1/1     Running   0          24m

NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
persistentvolumeclaim/busybox-pvc   Bound    pvc-da2d3812-ce22-4421-a58e-75d9fb79b4ca   1Gi        RWO            rook-ceph-block   24m

$ kubectl get sts,pod,pvc -n busybox-regional-rbd-sts --context dr2
NAME                       READY   AGE
statefulset.apps/busybox   2/2     24m

NAME            READY   STATUS    RESTARTS   AGE
pod/busybox-0   1/1     Running   0          24m
pod/busybox-1   1/1     Running   0          23m

NAME                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
persistentvolumeclaim/varlog-busybox-0   Bound    pvc-eb2764c0-8228-45bb-a9ad-18cb227ee276   1Gi        RWO            rook-ceph-block   25m
persistentvolumeclaim/varlog-busybox-1   Bound    pvc-60de6fee-ae33-4f27-ba02-6538acbdcba8   1Gi        RWO            rook-ceph-block   25m

$ kubectl get ds,pod,pvc -n busybox-regional-rbd-ds --context dr2
NAME                     DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/busybox   1         1         1       1            1           <none>          24m

NAME                READY   STATUS    RESTARTS   AGE
pod/busybox-w8d8g   1/1     Running   0          24m

NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
persistentvolumeclaim/busybox-pvc   Bound    pvc-9937cd1b-dd65-4c6d-a78d-849f0318bae8   1Gi        RWO            rook-ceph-block   25m

Logs and resources

update-vrg-multi-apps.tar.gz

netzzer · 2024-01-12T01:36:24Z

Tested Nir's ramen image (quay.io/nirsof/ramen-operator:update-vrg-v1) today with fix described in this PR using downstream ODF 4.15 build 112, OCP 4.14.6, ACM 2.9.1. Log collected for ramen hub pod ~10-15 mins after drcluster perf2 deleted.

1) failed all nodes for perf2
2) Failover 4 apps to perf3 (busybox with cephrbd appset and subscription; busybox with cephfs appset and subscription)
3) deleted drcluster perf2
4) checked apps and perf2 s3Profile deleted in seconds from VRGs

Check on perf3
$ oc get vrg  -o yaml -A | grep -A 2 s3P
    s3Profiles:
    - s3profile-perf3-ocs-storagecluster
    volSync: {}
--
    s3Profiles:
    - s3profile-perf3-ocs-storagecluster
    volSync: {}
--
    s3Profiles:
    - s3profile-perf3-ocs-storagecluster
    volSync: {}
--
    s3Profiles:
    - s3profile-perf3-ocs-storagecluster
    volSync: {}

Logs: ramen-hub-operator-log.txt.gz

ShyamsundarR

LGTM! Just one observation on the delete event that may need to be resolved.

ShyamsundarR · 2024-01-12T12:57:29Z

controllers/drplacementcontrol_controller.go

-	// Exhausted all failover activation checks, this update is NOT of interest
-	return false
+	// Exhausted all failover activation checks, the only interesting update is deleting a drcluster.
+	return drClusterIsDeleted(newDRCluster)


Unsure if a delete of a resource is causing an Update event to trigger as well. I would have just returned true from line 269 above, which is where the predicate function called when the watched resource is deleted.

I agree with Shyam. You are better off returning true on line 269 and allowing all drpcs to reconcile. Deleting a drcluster should NOT be a common use case.

I started this change by returning true in line 269, but unfortunately this does not work.

Deleting creates an update event, when the new object has a deletionTimeStamp. When the object is actually removed from the system, we get a delete event.

In our case this event is not relevant. It will happen when the drpolicy is deleted the the ramen removes the finalizers from the the drcluster.

I see, so adding the deletionTimeStamp ends up firing for Updated. Got it.
So how about if in line 307, you return true every time the objectNew (newDRCluster) has a non-zero deletionTimeStamp?
I think avoiding adding the new code (DRPCsUsingDRCluster and DRPCsUsingDRPolicy) is desirable.

I did not try the change without modifying FilterDRCluster - but if we don't modify it we will trigger a reconcile only in the DRPCs which are failing over to the deleted cluster, which is always no DRPC.

We an simplify by returning all DPPCS but if we have to add code it makes sense to add the right code which is only few more lines.

nirs requested review from ShyamsundarR and BenamarMk January 8, 2024 16:43

nirs force-pushed the update-vrg branch from afa5112 to 60eee81 Compare January 8, 2024 19:07

nirs force-pushed the update-vrg branch from 60eee81 to 1561a28 Compare January 8, 2024 19:10

nirs marked this pull request as ready for review January 8, 2024 19:13

nirs requested a review from raghavendra-talur January 8, 2024 19:37

ShyamsundarR reviewed Jan 12, 2024

View reviewed changes

nirs requested a review from ShyamsundarR January 12, 2024 17:26

ShyamsundarR approved these changes Jan 12, 2024

View reviewed changes

ShyamsundarR merged commit 2b66f98 into RamenDR:main Jan 12, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconcile related drpcs when drcluster is deleted #1168

Reconcile related drpcs when drcluster is deleted #1168

nirs commented Jan 8, 2024 •

edited

Loading

nirs commented Jan 8, 2024 •

edited

Loading

nirs commented Jan 9, 2024

netzzer commented Jan 12, 2024 •

edited by nirs

Loading

ShyamsundarR left a comment

ShyamsundarR Jan 12, 2024

BenamarMk Jan 12, 2024 •

edited

Loading

nirs Jan 12, 2024

BenamarMk Jan 12, 2024 •

edited

Loading

nirs Jan 12, 2024

Reconcile related drpcs when drcluster is deleted #1168

Reconcile related drpcs when drcluster is deleted #1168

Conversation

nirs commented Jan 8, 2024 • edited Loading

nirs commented Jan 8, 2024 • edited Loading

Testing with drenv

VRG diff show that minio-on-dr1 was removed

Interesting events from ramen log

Timeline

Logs and resources

nirs commented Jan 9, 2024

Testing multiple apps

Deploy and enable dr for 3 apps

Simulate disaster

Failover all apps

Dump VRG before deleting the drcluster

Delete drcluster

Dump VRG after deleting the drcluster

Disabling DR for all apps

Logs and resources

netzzer commented Jan 12, 2024 • edited by nirs Loading

ShyamsundarR left a comment

Choose a reason for hiding this comment

ShyamsundarR Jan 12, 2024

Choose a reason for hiding this comment

BenamarMk Jan 12, 2024 • edited Loading

Choose a reason for hiding this comment

nirs Jan 12, 2024

Choose a reason for hiding this comment

BenamarMk Jan 12, 2024 • edited Loading

Choose a reason for hiding this comment

nirs Jan 12, 2024

Choose a reason for hiding this comment

nirs commented Jan 8, 2024 •

edited

Loading

nirs commented Jan 8, 2024 •

edited

Loading

netzzer commented Jan 12, 2024 •

edited by nirs

Loading

BenamarMk Jan 12, 2024 •

edited

Loading

BenamarMk Jan 12, 2024 •

edited

Loading