Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrate netobserv to reliability tests #798

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

memodi
Copy link

@memodi memodi commented Oct 29, 2024

NETOBSERV-1874 integrate netobserv to reliability tests
dependent on openshift-qe/ocp-qe-perfscale-ci#659

Additionally, it adds new improvements and fixes some bugs:

  • add trap/cleanup code.
  • convert dittybopper code into a function and fix a bug to check for performance-dashboard directory.
  • import dashboards.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 29, 2024
Copy link

openshift-ci bot commented Oct 29, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: memodi
Once this PR has been reviewed and has the lgtm label, please assign mffiedler for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@memodi
Copy link
Author

memodi commented Oct 29, 2024

/cc @qiliRedHat

reliability-v2/config/reliability-netobserv.yaml Outdated Show resolved Hide resolved
reliability-v2/tasks/Tasks.py Outdated Show resolved Hide resolved
reliability-v2/tasks/Tasks.py Outdated Show resolved Hide resolved
memodi and others added 4 commits October 30, 2024 11:25
Co-authored-by: Qiujie Li <81630527+qiliRedHat@users.noreply.github.com>
@qiliRedHat
Copy link
Contributor

I can add lgtm after openshift-qe/ocp-qe-perfscale-ci#659 is merged and integration test is done

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiliRedHat do you want this to be separate config? I was thinking to have these tasks under standard reliability tasks and we'd have these checks run always.
Of course that also means we'd have to change netobserv operator would be installed by default for all reliability
runs and optionally exclude it, current implementation in start.sh is inverse, wdyt? this would be the initial idea had discuss so that netobserv could piggyback off your standard reliability runs.

If we have netobserv install by default, we'd need to up standard instance types or somehow have configuration that would accommodate Loki resource requirements, couple of questions here:

  • Do you typically have infra nodes set up for reliability runs? if so, what are instance types of infra nodes? I can look if you can bring up that environment and I can see if Loki is able to fit on infra nodes.
  • would you be willing to up the reliability run instance types to m5.2xlarge or m5.4xlarge?

Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@memodi Reliability test has many test profiles, consider of the cost, I don't want to have noo on all profiles by default. Usually there is no infra nodes in reliability test. But reliability has the option to configure it with -i in start.sh, default size should be same as the worker, size can be configured.
If test it once (7 days) a release, I think we can do m5.2xlarge on one of the profiles.

@memodi
Copy link
Author

memodi commented Nov 8, 2024

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 8, 2024
rc_return = 1
elif rc == 1 and result == "":
self.logger.info(f"Flowcollector is Ready.")
rc_return = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@memodi You can add an 'else' to cover the rest of the cases: rc == 1 and result != "", or rc !=1. And log the result to see what you can get.

rc_return = 1
elif rc == 1 and result == "":
self.logger.info(f"Pods in ns ${ns} are healthy.")
rc_return = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants