Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing pvc namespace to the vrg rdspec #1152

Merged
merged 1 commit into from
Nov 29, 2023

Conversation

nirs
Copy link
Member

@nirs nirs commented Nov 29, 2023

It seems that recent change for supporting multiple namespaces have broken failover with volsync when using multiple volsync apps. Add the pvc namespace to the protected pvc, similar to volrep code.

I did not modify the test to validate this change since I don't fully understand the test. It seems that there is a missing It section for testing the vrg.

Tested on ocp 4.14.

Issues:

  • During upgrade on a system with stuck workload, the vrg is not updated automatically yet.

It seems that recent change for supporting multiple namespaces have
broken failover with volsync when using multiple volsync apps. Add the
pvc namespace to the protected pvc, similar to volrep code.

I did not modify the test to validate this change since I don't fully
understand the test. It seems that there is a missing `It` section for
testing the vrg.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
@nirs
Copy link
Member Author

nirs commented Nov 29, 2023

Example failure before this change

After failover the application is stuck in WaitingForResourceRestore

$ oc get drpc -A -o wide
NAMESPACE     NAME                           AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION                 START TIME             DURATION          PEER READY
test-cephfs   test-cephfs-placement-1-drpc   11m   cluster1           cluster2          Failover       FailingOver    WaitingForResourceRestore   2023-11-29T19:50:24Z                     False
test-rbd      test-rbd-placement-1-drpc      15m   cluster1           cluster2          Failover       FailedOver     Completed                   2023-11-29T19:42:43Z   3m11.406944081s   True

State of the failover cluster:

 oc get deploy,pod,pvc,vrg,vr,replicationsource,replicationdestination -n test-cephfs -o wide --context perf2
NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                AGE   VOLUMEMODE
persistentvolumeclaim/busybox-pvc   Bound    pvc-2456bbbf-c89c-4b64-8efa-65838dff33c8   1Gi        RWX            ocs-storagecluster-cephfs   11m   Filesystem

NAME                                                                       DESIREDSTATE   CURRENTSTATE
volumereplicationgroup.ramendr.openshift.io/test-cephfs-placement-1-drpc   primary        Secondary

NAME                                                 LAST SYNC              DURATION         NEXT SYNC
replicationdestination.volsync.backube/busybox-pvc   2023-11-29T19:50:21Z   1m2.651263526s   

In ramen-dr-cluster-operator log we see:

2023-11-29T19:54:26.257Z        INFO    controllers.VolumeReplicationGroup.vrginstance  controllers/volumereplicationgroup_controller.go:1012   Failed to restore PVs   {"VolumeReplicationGroup": "test-cephfs/test-cephfs-placement-1-drpc", "rid": "0e93bd0b-1851-4263-937b-11754deb2cfc", "State": "primary", "error": "failed to restore PVs for VolSync (failed to restore all PVCs using RDSpec ([{{ busybox-pvc true { { []} { []}} 0xc004482540 map[] map[app:test-cephfs [app.kubernetes.io/part-of:test-cephfs](http://app.kubernetes.io/part-of:test-cephfs) appname:busybox [apps.open-cluster-management.io/reconcile-rate:medium](http://apps.open-cluster-management.io/reconcile-rate:medium)] [ReadWriteMany] {map[] map[storage:{{1073741824 0} {<nil>} 1Gi BinarySI}] []} [] <nil> nil <nil>}}]))"}

@BenamarMk BenamarMk merged commit 5cdc7eb into RamenDR:main Nov 29, 2023
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants