Cleanups and minor improments in lima provider #1568

nirs · 2024-09-22T22:04:30Z

Remove unneeded sudo usage
Group kubeconfig setup in same provision step
Simplify port forwarding rules
Clean up probing for completion
Don't wait for coredns deployment
Scale down coredns to 1 replica
Use lima also on darwin/x86_64
Disable rosetta for vm environment
Make the vm environment smaller
Add drenv start --timeout option
Quote addresses to avoid yaml parsing surprises

Signed-off-by: Nir Soffer <nsoffer@redhat.com>

This is useful when you want to avoid drenv start getting stuck, and the environment does not provide a way to time out. Example run: % drenv start --timeout 60 envs/vm.yaml 2024-09-21 04:27:43,555 INFO [vm] Starting environment 2024-09-21 04:27:43,581 INFO [cluster] Starting lima cluster 2024-09-21 04:28:43,785 ERROR [cluster] did not receive an event with the "running" status 2024-09-21 04:28:43,790 ERROR Command failed Traceback (most recent call last): ... drenv.commands.Error: Command failed: command: ('limactl', '--log-format=json', 'start', '--timeout=60s', 'cluster') exitcode: 1 error: For lima provider, we pass the timeout to limactl. For minikube we use commands.watch() timeout. External provider does not use the timeout since we don't start the cluster. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

On github actions we cannot start a vm with more than one cpu, and our test cluster is very small so it should work with 1g of ram. This makes the test cluster consume less resources if a developer leave it running for long time. Using 1 cpu conflicts with kubeadm preflight checks: [init] Using Kubernetes version: v1.31.0 ... error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR NumCPU]: the number of available CPUs 1 is less than the required 2 [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` Since we know we are doing, we suppress this preflight error. Running top shows that the cluster consumes only 9% cpu and 33.5% of the available memory. top - 04:51:02 up 10 min, 1 user, load average: 0.60, 0.29, 0.16 Tasks: 123 total, 1 running, 122 sleeping, 0 stopped, 0 zombie %Cpu(s): 6.2/3.1 9[|||||| ] MiB Mem : 33.5/1959.9 [||||||||||||||||||||| ] MiB Swap: 0.0/0.0 [ ] PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4124 root 20 0 1449480 244352 68224 S 2.3 12.2 0:18.47 kube-apiserver 4120 root 20 0 1309260 102456 60800 S 0.7 5.1 0:06.17 kube-controller 4273 root 20 0 1902692 92040 59904 S 2.0 4.6 0:12.47 kubelet 4159 root 20 0 1288880 60156 44288 S 0.3 3.0 0:02.20 kube-scheduler 3718 root 20 0 1267660 57880 31020 S 0.3 2.9 0:09.28 containerd 4453 root 20 0 1289120 53540 42880 S 0.0 2.7 0:00.15 kube-proxy 5100 65532 20 0 1284608 49656 37504 S 0.0 2.5 0:01.14 coredns 5000 65532 20 0 1284608 49528 37376 S 0.0 2.5 0:00.86 coredns 4166 root 20 0 11.2g 47360 21504 S 1.3 2.4 0:08.69 etcd We could make the cluster even smaller, but kubeadm preflight check requires 1700 MiB, and we don't have memory issue in github or on developers machines. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

We need rosetta only for complex setup when a component is not providing arm64 images. For testing drenv we have an empty cluster with busybox, and out busybox image is multi-arch. Disabling rosetta may speed up the the cluster, and this may be significant on github actions since the runners are about 6.5 times slower compared to M1 mac. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

This is useful for running the tests on github runner, and also support one option for macOS. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Same optimization used in minikube. Minikube use 2 replicas only when using HA configuration. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

It takes about 30 seconds until coredns is ready but we don't depend on it in the current code. Removing this wait we can start deploying 30 seconds earlier. This reduce the time for starting a cluster from 155 seconds to 125 seconds, and regional-dr environment from 450 to 420 seconds. Example run with vm environment: % drenv start envs/vm.yaml 2024-09-22 18:39:55,726 INFO [vm] Starting environment 2024-09-22 18:39:55,743 INFO [cluster] Starting lima cluster 2024-09-22 18:41:57,818 INFO [cluster] Cluster started in 122.07 seconds 2024-09-22 18:41:57,819 INFO [cluster/0] Running addons/example/start 2024-09-22 18:42:18,966 INFO [cluster/0] addons/example/start completed in 21.15 seconds 2024-09-22 18:42:18,966 INFO [cluster/0] Running addons/example/test 2024-09-22 18:42:19,120 INFO [cluster/0] addons/example/test completed in 0.15 seconds 2024-09-22 18:42:19,121 INFO [vm] Environment started in 143.40 seconds % drenv stop envs/vm.yaml 2024-09-22 18:42:44,244 INFO [vm] Stopping environment 2024-09-22 18:42:44,317 INFO [cluster] Stopping lima cluster 2024-09-22 18:42:44,578 WARNING [cluster] [hostagent] dhcp: unhandled message type: RELEASE 2024-09-22 18:42:49,441 INFO [cluster] Cluster stopped in 5.13 seconds 2024-09-22 18:42:49,441 INFO [vm] Environment stopped in 5.20 seconds % drenv start envs/vm.yaml 2024-09-22 18:42:53,132 INFO [vm] Starting environment 2024-09-22 18:42:53,156 INFO [cluster] Starting lima cluster 2024-09-22 18:43:34,436 INFO [cluster] Cluster started in 41.28 seconds 2024-09-22 18:43:34,437 INFO [cluster] Looking up failed deployments 2024-09-22 18:43:34,842 INFO [cluster/0] Running addons/example/start 2024-09-22 18:43:35,208 INFO [cluster/0] addons/example/start completed in 0.37 seconds 2024-09-22 18:43:35,208 INFO [cluster/0] Running addons/example/test 2024-09-22 18:43:35,371 INFO [cluster/0] addons/example/test completed in 0.16 seconds 2024-09-22 18:43:35,372 INFO [vm] Environment started in 42.24 seconds Signed-off-by: Nir Soffer <nsoffer@redhat.com>

- Use `/readyz` endpoint for waiting until kubernetes cluster is ready instead of `kubectl version`. This matches the way we wait for the cluster in drenv. - When waiting for coredns deployment, use `kubectl rollout status`, matching other code in drenv and addons. - Nicer probe description, visible in drenv when using --verbose. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

We can ignore all ports for all protocols for all addresses using a simpler rule. With this change we see this log in --verbose mode: 2024-09-23 01:12:18,130 DEBUG [cluster] [hostagent] TCP (except for SSH) and UDP port forwarding is disabled And no "Not forwarding port ..." message. Previously we had lot of these messages[1]. This requires lima commit[2]. pull current master if you run older version. [1] lima-vm/lima#2577 [2] lima-vm/lima@9a09350 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

And remove unneeded export - it was used to run kubectl commands before we setup root/.kube/config, but this step does not run any kubectl commands. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Provision scripts using `mode: system` run as root and do not need use sudo. Looks like the script were copied from code not running as root. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

This is fixed now in lima[1] so we don't need to keep the fix in drenv. Developers should pull latest lima from git to use this fix. [1] lima-vm/lima#2632 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

nirs requested review from raghavendra-talur and rakeshgm September 22, 2024 22:04

nirs force-pushed the lima-cleanups branch 2 times, most recently from b62ec4d to 0049f65 Compare September 22, 2024 22:18

rakeshgm approved these changes Sep 23, 2024

View reviewed changes

nirs force-pushed the lima-cleanups branch from 0049f65 to 04d25c0 Compare September 23, 2024 10:44

nirs mentioned this pull request Sep 23, 2024

Fix disable dr when VR failed validation #1570

Merged

3 tasks

nirs requested review from ShyamsundarR and BenamarMk September 24, 2024 23:04

nirs added 11 commits September 30, 2024 16:50

Quote addresses to avoid yaml parsing surprises

16c5198

Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Use lima also on darwin/x86_64

1d4caf9

This is useful for running the tests on github runner, and also support one option for macOS. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Scale down coredns to 1 replica

8ee2cb8

Same optimization used in minikube. Minikube use 2 replicas only when using HA configuration. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Group kubeconfig setup in same provision step

e5edc18

And remove unneeded export - it was used to run kubectl commands before we setup root/.kube/config, but this step does not run any kubectl commands. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

Remove unneeded sudo usage

63a6623

Provision scripts using `mode: system` run as root and do not need use sudo. Looks like the script were copied from code not running as root. Signed-off-by: Nir Soffer <nsoffer@redhat.com>

nirs force-pushed the lima-cleanups branch from 04d25c0 to 63a6623 Compare September 30, 2024 13:50

Remove unneeded route fix

569e2f8

This is fixed now in lima[1] so we don't need to keep the fix in drenv. Developers should pull latest lima from git to use this fix. [1] lima-vm/lima#2632 Signed-off-by: Nir Soffer <nsoffer@redhat.com>

raghavendra-talur approved these changes Oct 7, 2024

View reviewed changes

raghavendra-talur merged commit b9a39d2 into RamenDR:main Oct 7, 2024
19 of 20 checks passed

nirs deleted the lima-cleanups branch October 7, 2024 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanups and minor improments in lima provider #1568

Cleanups and minor improments in lima provider #1568

nirs commented Sep 22, 2024

Cleanups and minor improments in lima provider #1568

Cleanups and minor improments in lima provider #1568

Conversation

nirs commented Sep 22, 2024