Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate volsync in test environment #927

Merged
merged 8 commits into from
Jul 5, 2023
Merged

Conversation

nirs
Copy link
Member

@nirs nirs commented Jun 14, 2023

Add volsync addon and test environment using minikube volumesnapshots and
csi-hostpath-driver storage.

The addon includes a self test deploying a busybox application on one cluster and
replicating it to the second cluster.

With this change we can add a basic test using volsync instead of rbd mirroring.

We will add later CephFS so we can test also volsync with CephFS.

Fixes: #661

@nirs nirs force-pushed the submariner-volsync branch 3 times, most recently from 1cee64c to 8faaa62 Compare June 15, 2023 17:59
@nirs
Copy link
Member Author

nirs commented Jun 15, 2023

Example deploy:

$ test/basic-test/deploy dr1
2023-06-16 00:56:40,257 INFO    [deploy] Creating temporary directory /tmp/ramen-test/basic-test
2023-06-16 00:56:40,257 INFO    [deploy] Cloning ocm-ramen-samples
2023-06-16 00:56:40,257 INFO    [deploy] Creating kustomization for using cluster 'dr1'
2023-06-16 00:56:40,257 INFO    [deploy] Deploying busybox example application
2023-06-16 00:56:40,639 INFO    [deploy] waiting for namespace busybox-sample
2023-06-16 00:56:40,697 INFO    [deploy] Waiting until busybox drpc reports phase
2023-06-16 00:56:41,816 INFO    [deploy] Waiting until busybox drpc is deployed
2023-06-16 00:56:41,970 INFO    [deploy] Waiting until application is replicated
2023-06-16 00:58:40,927 INFO    [deploy] Application was deployed successfully

The app is replicated successfully with volsync and submariner.

Failover - app moves to second cluster, but get stuck in clean up.

$ test/basic-test/failover dr2
2023-06-16 01:00:39,448 INFO    [failover] Waiting until application is replicated
2023-06-16 01:00:39,501 INFO    [failover] Starting failover
2023-06-16 01:00:39,569 INFO    [failover] Waiting until application is failed over...
2023-06-16 01:01:10,224 INFO    [failover] Waiting until application is replicated
2023-06-16 01:06:10,450 ERROR   [failover] test failed
Traceback (most recent call last):
  File "/home/nsoffer/src/ramen/test/basic-test/failover", line 50, in <module>
    drenv.wait_for(
  File "/home/nsoffer/src/ramen/test/drenv/__init__.py", line 50, in wait_for
    raise RuntimeError(f"Timeout waiting for {resource}")
RuntimeError: Timeout waiting for drpc/busybox-drpc

drpc changes:

$ kubectl get drpc -A --context hub -o wide -w
NAMESPACE        NAME           AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME   DURATION   PEER READY
busybox-sample   busybox-drpc   0s    dr1                                                                                                      
busybox-sample   busybox-drpc   0s    dr1                                                                                                      
busybox-sample   busybox-drpc   0s    dr1                                                 Deployed       UpdatingPlRule   2023-06-15T21:56:40Z              True
busybox-sample   busybox-drpc   1s    dr1                                                 Deployed       UpdatingPlRule   2023-06-15T21:56:41Z              True
busybox-sample   busybox-drpc   3s    dr1                                                 Deployed       UpdatingPlRule   2023-06-15T21:56:43Z              True
busybox-sample   busybox-drpc   5s    dr1                                                 Deployed       UpdatingPlRule   2023-06-15T21:56:45Z              True
busybox-sample   busybox-drpc   16s   dr1                                                 Deployed       UpdatingPlRule   2023-06-15T21:56:56Z              True
busybox-sample   busybox-drpc   30s   dr1                                                 Deployed       EnsuringVolSyncSetup   2023-06-15T21:56:56Z              True
busybox-sample   busybox-drpc   60s   dr1                                                 Deployed       Completed              2023-06-15T21:56:56Z   44.090243322s   True
busybox-sample   busybox-drpc   60s   dr1                                                 Deployed       EnsuringVolSyncSetup   2023-06-15T21:56:56Z   44.090243322s   True
busybox-sample   busybox-drpc   60s   dr1                                                 Deployed       Completed              2023-06-15T21:56:56Z   44.090243322s   True
busybox-sample   busybox-drpc   90s   dr1                                                 Deployed       Completed              2023-06-15T21:56:56Z   44.090243322s   True
busybox-sample   busybox-drpc   2m    dr1                                                 Deployed       Completed              2023-06-15T21:56:56Z   44.090243322s   True
busybox-sample   busybox-drpc   3m    dr1                                                 Deployed       Completed              2023-06-15T21:56:56Z   44.090243322s   True
busybox-sample   busybox-drpc   3m59s   dr1                dr2               Failover       Deployed       Completed              2023-06-15T21:56:56Z   44.090243322s   True
busybox-sample   busybox-drpc   3m59s   dr1                dr2               Failover       FailingOver    WaitingForResourceRestore   2023-06-15T22:00:39Z                   False
busybox-sample   busybox-drpc   4m      dr1                dr2               Failover       FailingOver    WaitingForResourceRestore   2023-06-15T22:00:39Z                   False
busybox-sample   busybox-drpc   4m30s   dr1                dr2               Failover       FailedOver     UpdatedPlacement            2023-06-15T22:00:39Z                   True
busybox-sample   busybox-drpc   4m30s   dr1                dr2               Failover       FailedOver     Cleaning Up                 2023-06-15T22:00:39Z                   True

@nirs nirs mentioned this pull request Jun 15, 2023
10 tasks
@nirs nirs changed the title Testing submariner and volsync Integrate volsync in test environment Jun 18, 2023
@nirs nirs linked an issue Jun 18, 2023 that may be closed by this pull request
4 tasks
@nirs nirs force-pushed the submariner-volsync branch 2 times, most recently from 34bb80a to fe67947 Compare June 23, 2023 00:33
@nirs nirs mentioned this pull request Jun 26, 2023
10 tasks
@nirs nirs force-pushed the submariner-volsync branch 5 times, most recently from 61fc3ff to 4717d41 Compare June 29, 2023 21:22
@nirs nirs marked this pull request as ready for review June 29, 2023 21:27
@nirs
Copy link
Member Author

nirs commented Jul 4, 2023

Rebased on main, one commit already merged via another pr.

@nirs nirs requested a review from ELENAGER July 4, 2023 15:35
nirs added 8 commits July 5, 2023 17:40
When other vms are running, I see random timeouts in submariner test.
Lets make the test more robust with overloaded machines.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Like other tools, developers have to install the tool manually, but helm
is available in package managers so this is trivial.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Install volsync using helm, removing the manual CRD install.
We use `helm upgrade --install` to make the installation idempotent.

Thanks: Tesshu Flower <tflower@redhat.com>
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
To test volsync we need create a replication destination on one cluster
and replication source on the other, and this can be done only after
volsync is installed on both clusters, and since we also depend on
submariner this must be done after submariner is deployed and tested on
both clusters.

The best way to handle these dependencies is to run the addon on the
global workers after all clusters are deployed, similar to the way we
run rbd-mirror. This change convert volsync to accept both clusters and
move it to the global workers.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
This environment can be used to debug the volsync addon.

We add the minikube `volumesnapshots` and `csi-hostpath-driver`
addons[1], so we can create snapshots with minikube builtin `hostpath`
storage.  With both addons we can create a busybox app with
`csi-hostpath-sc` storage class, and replicate it using `volsync`.

We want to add CephFS later, starting with the simplest solution for
testing volsync.

Example run:

    $ drenv start envs/volsync.yaml
    2023-06-28 23:26:24,534 INFO    [volsync] Starting environment
    2023-06-28 23:26:24,592 INFO    [dr1] Starting minikube cluster
    2023-06-28 23:26:25,610 INFO    [dr2] Starting minikube cluster
    2023-06-28 23:28:27,544 INFO    [dr1] Cluster started in 122.95 seconds
    2023-06-28 23:28:46,160 INFO    [dr2] Cluster started in 140.55 seconds
    2023-06-28 23:28:46,161 INFO    [volsync/0] Running addons/volsync/start
    2023-06-28 23:29:08,356 INFO    [volsync/0] addons/volsync/start completed in 22.20 seconds
    2023-06-28 23:29:08,356 INFO    [volsync] Environment started in 163.82 seconds

[1] https://minikube.sigs.k8s.io/docs/tutorials/volume_snapshots_and_csi/

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
With these addons we can deploy busybox using `csi-hostpath-sc` storage
class and ramen will setup replication using volsync and submariner.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Volsync creates a service on the destination cluster using ClusterIP or
LoadBalancer. The recommended way to make the service accessible to the
source cluster is using submariner export. Minikube has a tunnel command
that can expose services using LoadBalancer, but it is painful to use.

Since our submariner addon requires the broker on a separate cluster,
add a tiny hub cluster.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Testing volsync without ramen is the best way to understand how it works
and how we need to use it when using minikube (or vanilla k8s?).

We use security context to run the busybox app as unprivileged user and
set the ownership of the pvc. Without this the busybox app runs as root
on minikube. It is possible to allow volsync privileged movers using an
annotation on the namespace but this is great pain to use and conflicts
with ocm, and there is no reason to run the busybox app as root.

The test deploys busybox application on the source cluster, and setup
replication to the destination cluster via submariner service export.
When the replication is ready we start one replication and wait until it
completes.

We don't validate the replicated data since the test is complicated and
slow as is and it may be too slow for self test.

Example run with volsync test environment:

    $ drenv start envs/volsync.yaml
    2023-06-29 23:40:32,314 INFO    [volsync] Starting environment
    2023-06-29 23:40:32,371 INFO    [hub] Starting minikube cluster
    2023-06-29 23:40:33,391 INFO    [dr1] Starting minikube cluster
    2023-06-29 23:40:34,383 INFO    [dr2] Starting minikube cluster
    2023-06-29 23:41:14,260 INFO    [hub] Cluster started in 41.89 seconds
    2023-06-29 23:41:14,261 INFO    [hub/0] Running addons/submariner/start
    2023-06-29 23:43:35,144 INFO    [hub/0] addons/submariner/start completed in 140.88 seconds
    2023-06-29 23:43:35,145 INFO    [hub/0] Running addons/submariner/test
    2023-06-29 23:43:55,502 INFO    [dr1] Cluster started in 202.11 seconds
    2023-06-29 23:44:10,336 INFO    [dr2] Cluster started in 215.95 seconds
    2023-06-29 23:44:16,476 INFO    [hub/0] addons/submariner/test completed in 41.33 seconds
    2023-06-29 23:44:16,477 INFO    [volsync/0] Running addons/volsync/start
    2023-06-29 23:44:48,775 INFO    [volsync/0] addons/volsync/start completed in 32.30 seconds
    2023-06-29 23:44:48,775 INFO    [volsync/0] Running addons/volsync/test
    2023-06-29 23:46:08,017 INFO    [volsync/0] addons/volsync/test completed in 79.24 seconds
    2023-06-29 23:46:08,018 INFO    [volsync] Environment started in 335.70 seconds

Example run with regional-dr environment:

    2023-06-30 00:06:13,967 INFO    [rdr/0] Running addons/rbd-mirror/start
    2023-06-30 00:06:13,968 INFO    [rdr/1] Running addons/volsync/start
    2023-06-30 00:06:48,121 INFO    [rdr/1] addons/volsync/start completed in 34.15 seconds
    2023-06-30 00:06:48,121 INFO    [rdr/1] Running addons/volsync/test
    2023-06-30 00:07:11,942 INFO    [rdr/0] addons/rbd-mirror/start completed in 57.98 seconds
    2023-06-30 00:07:11,942 INFO    [rdr/0] Running addons/rbd-mirror/test
    2023-06-30 00:07:24,297 INFO    [rdr/1] addons/volsync/test completed in 36.18 seconds
    2023-06-30 00:07:28,676 INFO    [rdr/0] addons/rbd-mirror/test completed in 16.73 seconds

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
@nirs
Copy link
Member Author

nirs commented Jul 5, 2023

Rebase on main fixing conflicts with #903.

@nirs nirs merged commit 4eb9054 into RamenDR:main Jul 5, 2023
9 checks passed
@nirs nirs deleted the submariner-volsync branch September 7, 2023 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VolSync integration in test environment
3 participants