Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployability testing tier 1 #4495

Closed
9 of 11 tasks
davidjiglesias opened this issue Sep 5, 2023 · 18 comments · Fixed by #4841, #4892, #4891, #5190 or #5299
Closed
9 of 11 tasks

Deployability testing tier 1 #4495

davidjiglesias opened this issue Sep 5, 2023 · 18 comments · Fixed by #4841, #4892, #4891, #5190 or #5299

Comments

@davidjiglesias
Copy link
Member

davidjiglesias commented Sep 5, 2023

Description

The objective of this issue is to thoroughly test Wazuh packages' deployment on tier 1 operating systems and architectures. This includes fully automated tests engrained in Wazuh's CI processes.

This testing should focus on reliability, lightweightness, and speed. We will be referring to Deployability testing tier1 as DTT1 from now on.

Functional requirements

DTT1 includes the following combination of operating systems, versions, and architectures:
Operating System Version Component Architectures
RedHat 7 agents, central components x86_64, aarch64
RedHat 8 agents, central components x86_64, aarch64
RedHat 9 agents, central components x86_64, aarch64
CentOS 7 agents, central components x86_64, aarch64
CentOS 8 agents, central components x86_64, aarch64
Debian 10 agents, central components x86_64, aarch64
Debian 11 agents, central components x86_64, aarch64
Debian 12 agents, central components x86_64, aarch64
Ubuntu 18 agents x86_64, aarch64
Ubuntu 20 agents, central components x86_64, aarch64
Ubuntu 22 agents, central components x86_64, aarch64
Oracle Linux 9 agents, central components x86_64, aarch64
Amazon Linux 2 agents, central components x86_64, aarch64
Amazon Linux 2023 agents, central components x86_64, aarch64
openSUSE 15 agents, central components x86_64, aarch64
SUSE 15 agents, central components x86_64, aarch64
Fedora 38 agents x86_64, aarch64
Windows 10 agents x86_64 , aarch64
Windows 11 agents x86_64, aarch64
Windows Server 2012 agents x86_64 , aarch64
Windows Server 2012 R2 agents x86_64 , aarch64
Windows Server 2016 agents x86_64 , aarch64
Windows Server 2019 agents x86_64 , aarch64
Windows Server 2022 agents x86_64 , aarch64
macOS Ventura agents x86_64, aarch64
macOS Sonoma agents x86_64, aarch64

The OS from Fedora onwards are included in tier 2, because the development has not been completed from the allocation

Agents

High-level phases Agents
  • DTT1 includes the following high-level phases:
    • Install
    • Registration
    • Connection
    • Basic info (OS, arch, version)
    • Uninstall
    • Restart
Phase Requirement
Install Install using Wazuh dashboard's Deploy new agent wizard section
Install Ensure files have appropriate permissions (Checkfiles close-world)
Install Start using wazuh-control binary
Registration Enroll using ossec.conf targeting a specific manager
Connection Establish a connection with a single manager via TCP
Basic info Ensure the OS is accurately reported
Basic info Ensure the architecture (arch) is accurately reported
Basic info Ensure the version is accurately reported
Upgrade Ensure file permissions are maintained post-upgrade (Checkfiles close-world)
Upgrade Ensure configuration is maintained post-upgrade (ossec.conf, agent.conf, local_internal_options.conf)
Restart Restart using wazuh-control binary
Restart Ensure successful reconnection post-restart
Stop Confirm no remnants post-stop (e.g., processes, services, ports)
Stop Ensure agent properly disconnects
Uninstall Confirm no remnants post-uninstallation (e.g., processes, services, ports)
Uninstall Ensure configuration is maintained post-uninstall (ossec.conf, local_internal_options.conf)

Central components

High-level phases Central components
  • DTT1 includes the following high-level phases:
    • Install
    • Connection
    • Uninstall
    • Restart
Phase Requirement
Install Install via Quickstart
Install Ensure files have appropriate permissions (Checkfiles close-world)
Install Start using service
Connection Ensure the component under test successfully connects with the other central components
Upgrade Confirm the new version is accurately reported
Restart Restart using service
Restart Ensure successful reconnection post-restart with the other central components
Stop Confirm no remnants post-stop (e.g., processes, services, ports)
Stop Ensure agent properly disconnects
Uninstall Confirm no remnants post-uninstallation (e.g., processes, services, ports, files)
Uninstall Ensure configuration is maintained post-uninstall (ossec.conf, local_internal_options.conf)

Non-functional requirements

  • All DDT1 test phases must comply with the following requirements:
    • Ensure the maximum time defined for the specific phase is not reached
    • Ensure no errors are found in logs for the specific phase
  • DTT1 tests must be deployed, provisioned, executed, and collected with a modular design
  • DTT1 tests CI executions must be monitored and reviewed by the QA team daily/weekly
  • DTT1 tests evaluation criteria must be defined and accessible by all QA team members
  • DTT1 tests escalation process must be defined and accessible by all QA team members

Hardware

Agent
  • Hardware:
    • CPU: 1
    • RAM: 500 Mb
  • Upgrade:
    • From the previous patch
    • From the previous minor
Central components
  • Hardware:
    • CPU: 4
    • RAM: 8 Gb
  • Upgrade:
    • From the previous patch
    • From the previous minor

Implementation restrictions

  • The DTT1 CI architecture and infrastructure must be designed and developed in Jenkins.
  • The DTT1 tests must be programmed in Python.
  • The DTT1 must use OSs deployed using virtual machines.

Plan


First iteration

Objetive:

The objective of this iteration is to generate the skeleton of the modules and begin to detect problems that may arise from the new architecture. For this, a PoC described in the issues will be carried out.

Results:

The PoC was carried out.
The modules were generated.
During the development the following problems were encountered:

  • Collector module is not necessary, it was absorbed by the Observability module.
  • An improvement is required on all modules, so that they:
    • Perform schema validation with pydantic. To validate the inputs they receive.
    • Be self-sufficient and independent, they can be called from any point without needing to receive too many parameters.
    • Make diagrams of each one with a certain level of detail, which allows the understanding of each one.
    • Redefine the inputs and outputs of each one, since it was not finalized in the PoC.
  • Investigate the need to implement a flow orchestrator, in order to be able to easily define the use cases at a high level, so that it can then execute each of the modules depending on the case.

Second iteration:

Objetive:

For this iteration, it is necessary to resolve the problems found in the previous one.
After the weekly #4495 (comment), it was decided to investigate tools that use the DAG methodology, to use it as an orchestrator.
Refine the modules, according to what was proposed.

Results:

All the problems or topics found in iteration 1 were completed. On the other hand, some points of improvement were found as the new functionalities were developed and implemented:

General

  1. Document the usage of each module (TaskFlow, Allocation, Provision, Test and Observability)
  2. Generate class or flow diagrams for each module
  3. Improve validations and error handling, since it is not clear when a module fails, the reason for the failure.
    3.1 TaskFlow
    3.2 Allocation
    3.3 Provision
    3.3 Test
  4. Define and implement a Logger
    4.1. Define centralized log
    4.2. Format
    4.3. Levels
    4.4. Output file for module (level debug) + Jenkins log (level info)

TaskFlow

  1. Delete the schema validator parameter and use it internally

Allocation

  1. Move the Inventory model to module generics so every module uses the same Inventory model
  2. Add more sizes and OS for Vagrant providers
  3. Validate the working OS in Vagrant
  4. Add more sizes and OS for AWS
  5. Validate the working OS in AWS
  6. Special VMS
  7. Enable custom VM config for providers for both vagrant and aws
  8. Improve or remove the function to load an existing Credential for a VM (currently is not working) Only for Vagrant
  9. AWS instances add name and type labels to perform cost calculations and have them controlled
  10. Unify size types for Vagrant and AWS

Provision

  1. Add the uninstaller action by parameter to uninstall the desired component
  2. Allow installing any version of wazuh with Package (Currently only allowed with AIO)
  3. Get ansible_os_family to render templates with jinja2. This makes it easier to reuse templates
  4. Validate dependency tree
    4.1. Validate the working OS in Vagrant
    4.2. Validate the working OS in AWS
    4.3. Adapt the dependencies installed for the tests so that they work on other systems such as CentOS 8
  5. Special VMS
  6. Improve or remove the function to load an existing Credential for a VM (currently is not working) Only for Vagrant

Testing

  1. Add Utils to test using the Wazuh API
  2. Add Utils to check all file permissions and ownership
  3. Add test for manager
  4. Test uninstall
  5. Remove the usage of the Playbook class to use just Ansible

Observability

  1. Define the usage of pytest-influxdb plugin for the test
    1.1 If we decide to use it, carry out the implementation
  2. Define the new dashboards to be implemented according to the new definitions of the modules. Requires analysis and definition of the dashboards
  3. Obtain new logs from the modules to view them on a dashboard. Depends on General 4
  4. Investigate to generate a dashboard that shows the DAG generated by Taskkflow

Jenkins

  1. Adapt the Jenkins pipeline to execute the Taskflow with dry-run to generate the DAG
  2. Adapt the Jenkins pipeline to execute the Taskflow to stop the process running

Iteration 3:

Objective:

After iteration 2, the following points emerged that will be the goal of the last iteration of the project.

Tasks:

General

Workflow engine

Provision

Allocation

Tests

Add Copyright

Release

Results:


Issue to include in DTT Tier 2

Devepot automated unit test

Objective:

The objective of this stage is to generate automated unit tests for each module.
It is expected to continue in DTT2. It is incorporated into DDT1 with the objective of beginning to define the test cases and the way they are implemented automatically.

Implement best practices

Jenkins implementation

Observability


Post-development:

  • Set and define a calendar and on-call schedules for the CI reviewers
  • Document the evaluation criteria
  • Document the escalation process

Branch

Approved by

DRI name: @davidjiglesias
CTO: @havidarou
Objective: Bulletproof deployability tier1

@rauldpm
Copy link
Member

rauldpm commented Sep 14, 2023

The order of execution of the tests must be modified, since an upgrade implies the installation of the previous version, with the current proposal, this would not be possible since first the installation of the version to test the proposal is done:

  1. Install
  2. Registration
  3. Connection
  4. Basic info
  5. Restart
  6. Stop
  7. Uninstall
  8. Upgrade

@jnasselle
Copy link
Member

jnasselle commented Oct 11, 2023

Requirements review

OS and architecture unavailability

  • Windows Servers are not available on arm64 arch
  • Windows 11 and Windows 10 only on Insider Preview

Agent's hardware requirements do not meet OS minimum requirements

Next OSes have higher hardware requirements

Known problems

  • Central components on ARM64 due to filebeat on arm
  • Windows arm64 native package does not exist yet

Mangers on those OSes that only support Agents

  • Define a fixer/dynamic policy of manager selection to improve testing coverage

Test order

  • From production to the current version
    • Install the last production version (based on criteria, last patch, or version)
  • From the current version to the future version (dummy, same but to test upgradeability)
  • From production to future version

Wazuh Manager and Wazuh Agent test interleaving

Wazuh Agent tests need some validation from the manager side(registration, connection) , but at the same time, the Wazuh Manager has their own testing. The idea is to determine/define the optimal and decoupled test flow that meets the requirements in the less available time

@davidjiglesias davidjiglesias self-assigned this Nov 23, 2023
@davidjiglesias
Copy link
Member Author

Requirements review

  • Amazon Linux latest is the OS chosen for the manager used in the agents testing
  • Agents and central components tests will be executed independently

@QU3B1M
Copy link
Member

QU3B1M commented Dec 4, 2023

Draw a high-level diagram of the modules workflow
test

@fcaffieri
Copy link
Member

Weekly Minutes DTT1

Participants:
Kevin, Victor, Raul, Nico, Fede and David.

Conclusions:
After the weekly on DTT, the need to incorporate DAG methodology was defined, in order to have an execution orchestrator which defines in a simple way and is user-friendly, the test cases that will be carried out. It must allow the flexibility to execute any use case in parallel and its output must be the yaml that will be used by the already defined modules (Allocation, Provision and Test).
An analysis of the proposed tools, advantages and disadvantages of each is required, to choose, together with the team, the tool that fits natively to our needs. Its use must be simple, intuitive and scalable.
To process this, the following issue was created #4766

@rauldpm
Copy link
Member

rauldpm commented Apr 11, 2024

Moved ETA to 16/04/2024 due to #5198 (based on issue ETA)

A new issue has been opened as we need to adapt the test module to use a single manager: #5202 (Same ETA)
Desirable, but not stopper: #5203

@fcaffieri
Copy link
Member

The automation section is removed because it will be worked on DTT2

Automation

  • DTT1 tests must run in Nightly CI
  • DTT1 tests must run in Weekly CI
  • DTT1 tests must run in pre-release testing
  • DTT1 tests costs must be measurable

@rauldpm
Copy link
Member

rauldpm commented Apr 16, 2024

Moved ETA to 29/04/2024 as we have to work on the following issues

We need the following issue from the DevOps team

As 4.9.0 is targeted to 2/05/2024, we plan to use the 30/04 and 2/05 to test and retrieve metrics

@rauldpm
Copy link
Member

rauldpm commented Apr 25, 2024

Moved the ETA to 3/5/2024 as 1/5/2024 is a holiday and we need some time to test the changes in the main branch (#5191). This has been discussed and approved with @davidjiglesias

@rauldpm
Copy link
Member

rauldpm commented May 2, 2024

Based on all DTT1 pending issues by each team and ETAs:

Team Issue Actual ETA
@wazuh/devel-devops #5295 7/5/2024
@wazuh/devel-devops #5311 10/5/2024
@wazuh/devel-qa-div1 #5240 2/5/2024
@wazuh/devel-qa-div1 #5230 3/5/2024
@wazuh/devel-qa-div1 #5218 3/5/2024
@wazuh/devel-qa-div1 #5219 6/5/2024
@wazuh/devel-qa-div1 #5191 15/5/2024
@wazuh/devel-qa-div1 #5323 3/5/2024

This issue will change the ETA to Monday 15/5/2024 so we can test all changes (issue #5191)

@rauldpm
Copy link
Member

rauldpm commented May 8, 2024

Removed Windows ARM from OS list as there is no Windows ARM available yet

@rauldpm rauldpm mentioned this issue May 15, 2024
2 tasks
@rauldpm rauldpm changed the title Deployability testing tier 1 Detasployability testing tier 1 May 15, 2024
@rauldpm rauldpm changed the title Detasployability testing tier 1 Deployability testing tier 1 May 16, 2024
This was referenced Jun 5, 2024
@rauldpm
Copy link
Member

rauldpm commented Jun 16, 2024

ETA moved to 18 June #5191 (comment)

@wazuhci wazuhci moved this from In progress to In review in Release 4.9.0 Jun 18, 2024
@wazuhci wazuhci moved this from In review to In progress in Release 4.9.0 Jun 18, 2024
@wazuhci wazuhci moved this from In progress to In review in Release 4.9.0 Jun 18, 2024
@wazuhci wazuhci moved this from In review to Pending final review in Release 4.9.0 Jun 18, 2024
@rauldpm
Copy link
Member

rauldpm commented Jun 19, 2024

LGTM

The branch must be kept alive until the Agent team changes the GHA workflow references

@rauldpm rauldpm closed this as completed Jun 19, 2024
@github-project-automation github-project-automation bot moved this from In progress to Done in Roadmap Jun 19, 2024
@wazuhci wazuhci moved this from Pending final review to Done in Release 4.9.0 Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment