Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanups and minor improments in lima provider #1568

Merged
merged 12 commits into from
Oct 7, 2024

Conversation

nirs
Copy link
Member

@nirs nirs commented Sep 22, 2024

  • Remove unneeded sudo usage
  • Group kubeconfig setup in same provision step
  • Simplify port forwarding rules
  • Clean up probing for completion
  • Don't wait for coredns deployment
  • Scale down coredns to 1 replica
  • Use lima also on darwin/x86_64
  • Disable rosetta for vm environment
  • Make the vm environment smaller
  • Add drenv start --timeout option
  • Quote addresses to avoid yaml parsing surprises

@nirs nirs force-pushed the lima-cleanups branch 2 times, most recently from b62ec4d to 0049f65 Compare September 22, 2024 22:18
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
This is useful when you want to avoid drenv start getting stuck, and the
environment does not provide a way to time out.

Example run:

    % drenv start --timeout 60 envs/vm.yaml
    2024-09-21 04:27:43,555 INFO    [vm] Starting environment
    2024-09-21 04:27:43,581 INFO    [cluster] Starting lima cluster
    2024-09-21 04:28:43,785 ERROR   [cluster] did not receive an event with the "running" status
    2024-09-21 04:28:43,790 ERROR   Command failed
    Traceback (most recent call last):
      ...
    drenv.commands.Error: Command failed:
       command: ('limactl', '--log-format=json', 'start', '--timeout=60s', 'cluster')
       exitcode: 1
       error:

For lima provider, we pass the timeout to limactl. For minikube we use
commands.watch() timeout. External provider does not use the timeout
since we don't start the cluster.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
On github actions we cannot start a vm with more than one cpu, and our
test cluster is very small so it should work with 1g of ram. This makes
the test cluster consume less resources if a developer leave it running
for long time.

Using 1 cpu conflicts with kubeadm preflight checks:

    [init] Using Kubernetes version: v1.31.0
    ...
    error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR NumCPU]: the number of available CPUs 1 is less than the required 2
    [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`

Since we know we are doing, we suppress this preflight error.

Running top shows that the cluster consumes only 9% cpu and 33.5% of the
available memory.

    top - 04:51:02 up 10 min,  1 user,  load average: 0.60, 0.29, 0.16
    Tasks: 123 total,   1 running, 122 sleeping,   0 stopped,   0 zombie
    %Cpu(s):   6.2/3.1     9[||||||                                                        ]
    MiB Mem : 33.5/1959.9   [|||||||||||||||||||||                                         ]
    MiB Swap:  0.0/0.0      [                                                              ]

        PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
       4124 root      20   0 1449480 244352  68224 S   2.3  12.2   0:18.47 kube-apiserver
       4120 root      20   0 1309260 102456  60800 S   0.7   5.1   0:06.17 kube-controller
       4273 root      20   0 1902692  92040  59904 S   2.0   4.6   0:12.47 kubelet
       4159 root      20   0 1288880  60156  44288 S   0.3   3.0   0:02.20 kube-scheduler
       3718 root      20   0 1267660  57880  31020 S   0.3   2.9   0:09.28 containerd
       4453 root      20   0 1289120  53540  42880 S   0.0   2.7   0:00.15 kube-proxy
       5100 65532     20   0 1284608  49656  37504 S   0.0   2.5   0:01.14 coredns
       5000 65532     20   0 1284608  49528  37376 S   0.0   2.5   0:00.86 coredns
       4166 root      20   0   11.2g  47360  21504 S   1.3   2.4   0:08.69 etcd

We could make the cluster even smaller, but kubeadm preflight check
requires 1700 MiB, and we don't have memory issue in github or on
developers machines.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
We need rosetta only for complex setup when a component is not providing
arm64 images. For testing drenv we have an empty cluster with busybox,
and out busybox image is multi-arch.

Disabling rosetta may speed up the the cluster, and this may be
significant on github actions since the runners are about 6.5 times
slower compared to M1 mac.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
This is useful for running the tests on github runner, and also support
one option for macOS.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Same optimization used in minikube. Minikube use 2 replicas only when
using HA configuration.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
It takes about 30 seconds until coredns is ready but we don't depend on
it in the current code. Removing this wait we can start deploying 30
seconds earlier. This reduce the time for starting a cluster from 155
seconds to 125 seconds, and regional-dr environment from 450 to 420
seconds.

Example run with vm environment:

    % drenv start envs/vm.yaml
    2024-09-22 18:39:55,726 INFO    [vm] Starting environment
    2024-09-22 18:39:55,743 INFO    [cluster] Starting lima cluster
    2024-09-22 18:41:57,818 INFO    [cluster] Cluster started in 122.07 seconds
    2024-09-22 18:41:57,819 INFO    [cluster/0] Running addons/example/start
    2024-09-22 18:42:18,966 INFO    [cluster/0] addons/example/start completed in 21.15 seconds
    2024-09-22 18:42:18,966 INFO    [cluster/0] Running addons/example/test
    2024-09-22 18:42:19,120 INFO    [cluster/0] addons/example/test completed in 0.15 seconds
    2024-09-22 18:42:19,121 INFO    [vm] Environment started in 143.40 seconds

    % drenv stop envs/vm.yaml
    2024-09-22 18:42:44,244 INFO    [vm] Stopping environment
    2024-09-22 18:42:44,317 INFO    [cluster] Stopping lima cluster
    2024-09-22 18:42:44,578 WARNING [cluster] [hostagent] dhcp: unhandled message type: RELEASE
    2024-09-22 18:42:49,441 INFO    [cluster] Cluster stopped in 5.13 seconds
    2024-09-22 18:42:49,441 INFO    [vm] Environment stopped in 5.20 seconds

    % drenv start envs/vm.yaml
    2024-09-22 18:42:53,132 INFO    [vm] Starting environment
    2024-09-22 18:42:53,156 INFO    [cluster] Starting lima cluster
    2024-09-22 18:43:34,436 INFO    [cluster] Cluster started in 41.28 seconds
    2024-09-22 18:43:34,437 INFO    [cluster] Looking up failed deployments
    2024-09-22 18:43:34,842 INFO    [cluster/0] Running addons/example/start
    2024-09-22 18:43:35,208 INFO    [cluster/0] addons/example/start completed in 0.37 seconds
    2024-09-22 18:43:35,208 INFO    [cluster/0] Running addons/example/test
    2024-09-22 18:43:35,371 INFO    [cluster/0] addons/example/test completed in 0.16 seconds
    2024-09-22 18:43:35,372 INFO    [vm] Environment started in 42.24 seconds

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
- Use `/readyz` endpoint for waiting until kubernetes cluster is ready
  instead of `kubectl version`. This matches the way we wait for the
  cluster in drenv.

- When waiting for coredns deployment, use `kubectl rollout status`,
  matching other code in drenv and addons.

- Nicer probe description, visible in drenv when using --verbose.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
We can ignore all ports for all protocols for all addresses using a
simpler rule.

With this change we see this log in --verbose mode:

    2024-09-23 01:12:18,130 DEBUG   [cluster] [hostagent] TCP (except for SSH) and UDP port forwarding is disabled

And no "Not forwarding port ..." message. Previously we had lot of these
messages[1]. This requires lima commit[2].  pull current master if you
run older version.

[1] lima-vm/lima#2577
[2] lima-vm/lima@9a09350

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
And remove unneeded export - it was used to run kubectl commands before
we setup root/.kube/config, but this step does not run any kubectl
commands.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Provision scripts using `mode: system` run as root and do not need use
sudo. Looks like the script were copied from code not running as root.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
This is fixed now in lima[1] so we don't need to keep the fix in drenv.
Developers should pull latest lima from git to use this fix.

[1] lima-vm/lima#2632

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
@raghavendra-talur raghavendra-talur merged commit b9a39d2 into RamenDR:main Oct 7, 2024
19 of 20 checks passed
@nirs nirs deleted the lima-cleanups branch October 7, 2024 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants