Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubevirt ramen log #1105

Merged
merged 9 commits into from
Dec 11, 2023
Merged

Kubevirt ramen log #1105

merged 9 commits into from
Dec 11, 2023

Conversation

nirs
Copy link
Member

@nirs nirs commented Oct 23, 2023

Add infrastructure for testing kubevirt DR:

  • service producing logs - like ramen busybox pod
  • cirros vm container image including the ramen logger service
  • update kubevirt self test to dump the ramen log
  • start ramen logger first to minimize downtime
  • minimize boot time by silencing kernel logs during boot
  • simpler ssh public key management

Tested with https://github.com/aglitke/ocm-kubevirt-samples

@nirs nirs force-pushed the kubevirt-ramen-log branch 2 times, most recently from 62b0265 to 76ad4d3 Compare October 25, 2023 18:03
nirs added a commit to nirs/ocm-kubevirt-samples that referenced this pull request Oct 25, 2023
This VM can be used to test DR flows in ramen minikube based test
environment running on a laptop.

Changes:
- Use latest cirros (0.6.2) from my repo that will be customized for
  testing ramen[2]
- Use executable userData since cirros does not support #cloud-config
- Inject ssh public key from secret
- Storage class used by ramen test environment
- Use bridge interface (see issue[1])
- Separate pvc and source resources
- Minimize memory allocation and disk size to make it easier to run with
  limited resources
- Use common labels to apply appname= label to all resources

[1] kubevirt/kubevirt#9059
[2] RamenDR/ramen#1105
@nirs nirs force-pushed the kubevirt-ramen-log branch 5 times, most recently from 7fa4e81 to f3ca78e Compare October 30, 2023 13:35
@nirs nirs marked this pull request as ready for review October 30, 2023 13:51
test/vms/cirros/download-cirros Outdated Show resolved Hide resolved
Copy link

@aglitke aglitke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

test/vms/cirros/Makefile Outdated Show resolved Hide resolved
nirs added a commit to nirs/ocm-kubevirt-samples that referenced this pull request Nov 1, 2023
This VM can be used to test DR flows in ramen minikube based test
environment running on a laptop.

Changes:
- Use latest cirros (0.6.2) from my repo, customized for testing
  ramen[2]
- Use executable userData since cirros does not support #cloud-config
- Inject ssh public key from secret
- Storage class used by ramen test environment
- Use bridge interface (see issue[1])
- Separate pvc and source resources
- Minimize memory allocation and disk size to make it easier to run with
  limited resources
- Use common labels to apply appname= label to all resources

[1] kubevirt/kubevirt#9059
[2] RamenDR/ramen#1105
test/vms/cirros/download-cirros Show resolved Hide resolved
test/vms/cirros/download-cirros Outdated Show resolved Hide resolved
test/vms/cirros/download-cirros Outdated Show resolved Hide resolved
@nirs nirs force-pushed the kubevirt-ramen-log branch 2 times, most recently from 5fd45b5 to ebd7b04 Compare November 2, 2023 00:28
@nirs nirs marked this pull request as draft November 6, 2023 23:28
@nirs nirs force-pushed the kubevirt-ramen-log branch 2 times, most recently from 7a6d8df to f2be7b2 Compare November 7, 2023 13:57
@nirs nirs marked this pull request as ready for review November 7, 2023 13:58
nirs added a commit to nirs/ocm-kubevirt-samples that referenced this pull request Nov 7, 2023
This VM can be used to test DR flows in ramen minikube based test
environment running on a laptop.

Changes:
- Use latest cirros (0.6.2) from my repo, customized for testing
  ramen[2]
- Use executable userData since cirros does not support #cloud-config
- Inject ssh public key from secret
- Storage class used by ramen test environment
- Use bridge interface (see issue[1])
- Separate pvc and source resources
- Minimize memory allocation and disk size to make it easier to run with
  limited resources
- Use common labels to apply appname= label to all resources

[1] kubevirt/kubevirt#9059
[2] RamenDR/ramen#1105
nirs added a commit to nirs/ocm-ramen-samples that referenced this pull request Nov 13, 2023
Log START, UPDATE, and STOP events to make it easier to verify failover
and relocate.

Based on ramen logger for the cirros image:
RamenDR/ramen#1105

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/ocm-ramen-samples that referenced this pull request Nov 13, 2023
Log START, UPDATE, and STOP events to make it easier to verify failover
and relocate.

Based on ramen logger for the cirros image:
RamenDR/ramen#1105

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/ocm-ramen-samples that referenced this pull request Nov 13, 2023
Log START, UPDATE, and STOP events to make it easier to verify failover
and relocate.

Based on ramen logger for the cirros image:
RamenDR/ramen#1105

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
nirs added a commit to nirs/ocm-ramen-samples that referenced this pull request Nov 14, 2023
Log START, UPDATE, and STOP events to make it easier to verify failover
and relocate.

Based on ramen logger for the cirros image:
RamenDR/ramen#1105

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
@nirs nirs mentioned this pull request Nov 19, 2023
So we can use "bin" directory in the source tree. For consistency ignore
also "testbin" at the root of the project.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Like the busybox pod, but run as a init.d service that can be installed
in a cirros VM.

The service includes:
- /usr/bin/ramen - the logging "deamon" script
- /etc/init.d/ramen - the service script
- install - install script running inside the vm

The service can be installed by copying the `ramen` directory into the
VM and running the install script.

The log can be inspected by running:

    $ virtctl ssh cirros@sample-vm -n kubevirt-sample --context dr1 --command "tail -f /var/log/ramen.log"
    Wed Oct 25 23:43:12 UTC 2023 UPDATE
    Wed Oct 25 23:43:22 UTC 2023 UPDATE
    Wed Oct 25 23:43:32 UTC 2023 UPDATE

Example logs after failover:

    Sun Oct 29 20:23:40 UTC 2023 UPDATE
    Sun Oct 29 20:23:50 UTC 2023 UPDATE
    Sun Oct 29 20:26:24 UTC 2023 START
    Sun Oct 29 20:26:34 UTC 2023 UPDATE
    Sun Oct 29 20:26:44 UTC 2023 UPDATE

In failover we lost data written since the last snapshot (20:23:50)
until the VM was started on the failover cluster (20:26:24).

Example log after relocate:

    Sun Oct 29 20:28:14 UTC 2023 UPDATE
    Sun Oct 29 20:28:24 UTC 2023 UPDATE
    Sun Oct 29 20:28:33 UTC 2023 STOP
    Sun Oct 29 20:31:20 UTC 2023 START
    Sun Oct 29 20:31:31 UTC 2023 UPDATE
    Sun Oct 29 20:31:41 UTC 2023 UPDATE

During relocate the application was terminated gracefully at 20:28:33
and start on the other cluster at 20:31:20 without loosing any data.

The STOP message depends on the cirros VM shutting down all processes
gracefully, so it may not be emitted in some cases. To ensure graceful
shutdown the user must stop the application gracefully.

Example graceful shutdown:

    virtctl ssh cirros@sample-vm -n kubevirt-sample --context dr1 --command 'sudo /etc/init.d/ramen stop'

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
The cirros image tiny (~30m) and starts in 4 seconds to login prompt. It
is basically bare bones VM with busybox and some cloud-init support.

The cirros image is not compatible with virt-customize (used to inject
stuff into the image) since it packs the root file system inside the
initrd file system. We fix this by booting the image once with qemu-kvm.
During the first boot cirros init system unpack the root file system
into the root partition (/dev/sda1).  From this point, the VM image is
compatible with virt-customize.

To start the VM quickly, avoiding few minutes delay looking up
cloud-init metadata server, we provide a temporary instance-id via
cloud-init noclound iso image.

We customize the VM with the ramen logger, to make the VM ready for DR
testing outside of the box.

The VM image is packed inside a container (called container disk in
kubvirt). Kubevirt can start vm or import images from these images.

The image is stored in my private quay.io repo for now, will move later
to ramedr repo.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
This verifies both ssh access and the ramen service inside the vm, and a
good example of accessing the ramen log.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
We can have many release using the same upstream version. Lets make this
easier to consume by adding a release number.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
This helps to understand downtime during failover and relocate. It can
also help to diagnose boot time.

Example log:

    Tue Oct 31 20:29:15 UTC 2023 START uptime=3.98

Boot time with different nested vm levels:

machine    nested     time
--------------------------
laptop          0     3.99s
laptop          1     6.11s
server          2    32.21s

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Stating the ramen logger first to minimize downtime when inspecting
failover or relocate flows.

This change decreases the ramen service boot time by 2.5 seconds with
local vm, 1.4 second in nested vm, but does not improve much double
nested vm.

Boot time with different nested vm levels:

machine    nested     before    after
-------------------------------------
laptop          0     3.99s     1.55s
laptop          1     6.11s     4.73s
server          2    33.21s    32.79s

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Adding the quiet[1] option disables most kernel log messages during boot
These message are not needed normally and available in dmesg if needed
later.

With this change the ramen service boot time is up to 1.47 times faster.

machine    nested     before    after
-------------------------------------
laptop          0     1.55s     1.39s
laptop          1     4.73s     3.28s
server          2    32.79s    22.33s

[1] https://www.kernel.org/doc/html/v4.14/admin-guide/kernel-parameters.html

Thanks: Peter Lauterbach <pelauter@redhat.com>
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
We can generate the secret during kustomization, but it is not possible
to refer to `~/.ssh/id_rsa.pub`. We cannot use a static symlink, and if
we create a symlink we need to use

    kustomize build --load-restrictor=LoadRestrictionsNone

which makes it harder to apply the kustomization manually for testing.

To keep things simple, we copy we copy the user default public key to
the vm kustomization directory during the test. From this point the
kustomization can be applied manually.

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Copy link
Member

@ShyamsundarR ShyamsundarR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire vm building section may be better in a different repo, as long as we have a VM container image to use for tests. Reduces maintaining VM creation code and artifacts in this repository.

@ShyamsundarR ShyamsundarR merged commit c95a5f0 into RamenDR:main Dec 11, 2023
13 checks passed
nirs added a commit to nirs/ocm-ramen-samples that referenced this pull request Mar 7, 2024
Log START, UPDATE, and STOP events to make it easier to verify failover
and relocate.

Based on ramen logger for the cirros image:
RamenDR/ramen#1105

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
ShyamsundarR pushed a commit to RamenDR/ocm-ramen-samples that referenced this pull request Mar 13, 2024
Log START, UPDATE, and STOP events to make it easier to verify failover
and relocate.

Based on ramen logger for the cirros image:
RamenDR/ramen#1105

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants