Deployability testing tier 1 #4495

davidjiglesias · 2023-09-05T14:54:55Z

Description

The objective of this issue is to thoroughly test Wazuh packages' deployment on tier 1 operating systems and architectures. This includes fully automated tests engrained in Wazuh's CI processes.

This testing should focus on reliability, lightweightness, and speed. We will be referring to Deployability testing tier1 as DTT1 from now on.

Functional requirements

DTT1 includes the following combination of operating systems, versions, and architectures:

Operating System	Version	Component	Architectures
RedHat	7	agents, central components	x86_64, aarch64
RedHat	8	agents, central components	x86_64, aarch64
RedHat	9	agents, central components	x86_64, aarch64
CentOS	7	agents, central components	x86_64, aarch64
CentOS	8	agents, central components	x86_64, aarch64
Debian	10	agents, central components	x86_64, aarch64
Debian	11	agents, central components	x86_64, aarch64
Debian	12	agents, central components	x86_64, aarch64
Ubuntu	18	agents	x86_64, aarch64
Ubuntu	20	agents, central components	x86_64, aarch64
Ubuntu	22	agents, central components	x86_64, aarch64
Oracle Linux	9	agents, central components	x86_64, aarch64
Amazon Linux	2	agents, central components	x86_64, aarch64
Amazon Linux	2023	agents, central components	x86_64, aarch64
openSUSE	15	agents, ~~central components~~	x86_64, aarch64
~~SUSE~~	15	~~agents, central components~~	~~x86_64, aarch64~~
~~Fedora~~	38	~~agents~~	~~x86_64, aarch64~~
Windows	10	agents	x86_64 ~~, aarch64~~
Windows	11	~~agents~~	~~x86_64, aarch64~~
Windows	Server 2012	agents	x86_64 ~~, aarch64~~
Windows	Server 2012 R2	agents	x86_64 ~~, aarch64~~
Windows	Server 2016	agents	x86_64 ~~, aarch64~~
Windows	Server 2019	agents	x86_64 ~~, aarch64~~
Windows	Server 2022	agents	x86_64 ~~, aarch64~~
macOS	Ventura	agents	x86_64, aarch64
macOS	Sonoma	agents	x86_64, aarch64

The OS from Fedora onwards are included in tier 2, because the development has not been completed from the allocation

Agents

High-level phases Agents

DTT1 includes the following high-level phases:
- Install
- Registration
- Connection
- Basic info (OS, arch, version)
- Uninstall
- Restart

Phase	Requirement
Install	Install using Wazuh dashboard's `Deploy new agent` wizard section
Install	Ensure files have appropriate permissions (Checkfiles close-world)
Install	Start using `wazuh-control` binary
Registration	Enroll using `ossec.conf` targeting a specific manager
Connection	Establish a connection with a single manager via TCP
Basic info	Ensure the OS is accurately reported
Basic info	Ensure the architecture (arch) is accurately reported
Basic info	Ensure the version is accurately reported
~~Upgrade~~	~~Ensure file permissions are maintained post-upgrade (Checkfiles close-world)~~
~~Upgrade~~	~~Ensure configuration is maintained post-upgrade (ossec.conf, agent.conf, local_internal_options.conf)~~
Restart	Restart using `wazuh-control` binary
Restart	Ensure successful reconnection post-restart
Stop	Confirm no remnants post-stop (e.g., processes, services, ports)
Stop	Ensure agent properly disconnects
Uninstall	Confirm no remnants post-uninstallation (e.g., processes, services, ports)
Uninstall	Ensure configuration is maintained post-uninstall (ossec.conf, local_internal_options.conf)

Central components

High-level phases Central components

DTT1 includes the following high-level phases:
- Install
- Connection
- Uninstall
- Restart

Phase	Requirement
Install	Install via Quickstart
Install	Ensure files have appropriate permissions (Checkfiles close-world)
Install	Start using service
Connection	Ensure the component under test successfully connects with the other central components
~~Upgrade~~	~~Confirm the new version is accurately reported~~
Restart	Restart using service
Restart	Ensure successful reconnection post-restart with the other central components
Stop	Confirm no remnants post-stop (e.g., processes, services, ports)
Stop	Ensure agent properly disconnects
Uninstall	Confirm no remnants post-uninstallation (e.g., processes, services, ports, files)
Uninstall	Ensure configuration is maintained post-uninstall (ossec.conf, local_internal_options.conf)

Non-functional requirements

All DDT1 test phases must comply with the following requirements:
- Ensure the maximum time defined for the specific phase is not reached
- Ensure no errors are found in logs for the specific phase
DTT1 tests must be deployed, provisioned, executed, and collected with a modular design
DTT1 tests CI executions must be monitored and reviewed by the QA team daily/weekly
DTT1 tests evaluation criteria must be defined and accessible by all QA team members
DTT1 tests escalation process must be defined and accessible by all QA team members

Hardware

Agent

Hardware:
- CPU: 1
- RAM: 500 Mb
Upgrade:
- From the previous patch
- From the previous minor

Central components

Hardware:
- CPU: 4
- RAM: 8 Gb
Upgrade:
- From the previous patch
- From the previous minor

Implementation restrictions

The DTT1 CI architecture and infrastructure must be designed and developed in Jenkins.
The DTT1 tests must be programmed in Python.
The DTT1 must use OSs deployed using virtual machines.

Plan

First iteration

Objetive:

The objective of this iteration is to generate the skeleton of the modules and begin to detect problems that may arise from the new architecture. For this, a PoC described in the issues will be carried out.

Results:

The PoC was carried out.
The modules were generated.
During the development the following problems were encountered:

Collector module is not necessary, it was absorbed by the Observability module.
An improvement is required on all modules, so that they:
- Perform schema validation with pydantic. To validate the inputs they receive.
- Be self-sufficient and independent, they can be called from any point without needing to receive too many parameters.
- Make diagrams of each one with a certain level of detail, which allows the understanding of each one.
- Redefine the inputs and outputs of each one, since it was not finalized in the PoC.
Investigate the need to implement a flow orchestrator, in order to be able to easily define the use cases at a high level, so that it can then execute each of the modules depending on the case.

Second iteration:

Objetive:

For this iteration, it is necessary to resolve the problems found in the previous one.
After the weekly #4495 (comment), it was decided to investigate tools that use the DAG methodology, to use it as an orchestrator.
Refine the modules, according to what was proposed.

Results:

All the problems or topics found in iteration 1 were completed. On the other hand, some points of improvement were found as the new functionalities were developed and implemented:

General

Document the usage of each module (TaskFlow, Allocation, Provision, Test and Observability)
Generate class or flow diagrams for each module
Improve validations and error handling, since it is not clear when a module fails, the reason for the failure.
3.1 TaskFlow
3.2 Allocation
3.3 Provision
3.3 Test
Define and implement a Logger
4.1. Define centralized log
4.2. Format
4.3. Levels
4.4. Output file for module (level debug) + Jenkins log (level info)

TaskFlow

Delete the schema validator parameter and use it internally

Allocation

Move the Inventory model to module generics so every module uses the same Inventory model
Add more sizes and OS for Vagrant providers
Validate the working OS in Vagrant
Add more sizes and OS for AWS
Validate the working OS in AWS
Special VMS
Enable custom VM config for providers for both vagrant and aws
Improve or remove the function to load an existing Credential for a VM (currently is not working) Only for Vagrant
AWS instances add name and type labels to perform cost calculations and have them controlled
Unify size types for Vagrant and AWS

Provision

Add the uninstaller action by parameter to uninstall the desired component
Allow installing any version of wazuh with Package (Currently only allowed with AIO)
Get ansible_os_family to render templates with jinja2. This makes it easier to reuse templates
Validate dependency tree
4.1. Validate the working OS in Vagrant
4.2. Validate the working OS in AWS
4.3. Adapt the dependencies installed for the tests so that they work on other systems such as CentOS 8
Special VMS
Improve or remove the function to load an existing Credential for a VM (currently is not working) Only for Vagrant

Testing

Add Utils to test using the Wazuh API
Add Utils to check all file permissions and ownership
Add test for manager
Test uninstall
Remove the usage of the Playbook class to use just Ansible

Observability

Define the usage of pytest-influxdb plugin for the test
1.1 If we decide to use it, carry out the implementation
Define the new dashboards to be implemented according to the new definitions of the modules. Requires analysis and definition of the dashboards
Obtain new logs from the modules to view them on a dashboard. Depends on General 4
Investigate to generate a dashboard that shows the DAG generated by Taskkflow

Jenkins

Adapt the Jenkins pipeline to execute the Taskflow with dry-run to generate the DAG
Adapt the Jenkins pipeline to execute the Taskflow to stop the process running

Iteration 3:

Objective:

After iteration 2, the following points emerged that will be the goal of the last iteration of the project.

Tasks:

Add Copyright

DTT1 - Add Copyright Headers. #5141

Release

DTT1 - Iteration 3 - Test module - Battery test #5125

Results:

Issue to include in DTT Tier 2

Devepot automated unit test

Objective:

The objective of this stage is to generate automated unit tests for each module.
It is expected to continue in DTT2. It is incorporated into DDT1 with the objective of beginning to define the test cases and the way they are implemented automatically.

DTT2 - Iteration 3 - Unit test - Unit tests for each module #4993

Implement best practices

DTT2 - General - Python best practice and clean code #5116

Jenkins implementation

DTT2 - Jenkins - Pipeline #4849

Observability

DTT2 - Observability module - Improvements and adaptations #4837

Post-development:

Set and define a calendar and on-call schedules for the CI reviewers
Document the evaluation criteria
Document the escalation process

Branch

https://github.com/wazuh/wazuh-qa/tree/enhancement/4495-DTT1

Approved by

DRI name: @davidjiglesias
CTO: @havidarou
Objective: Bulletproof deployability tier1

The text was updated successfully, but these errors were encountered:

rauldpm · 2023-09-14T19:45:24Z

The order of execution of the tests must be modified, since an upgrade implies the installation of the previous version, with the current proposal, this would not be possible since first the installation of the version to test the proposal is done:

Install
Registration
Connection
Basic info
Restart
Stop
Uninstall
Upgrade

jnasselle · 2023-10-11T08:52:46Z

Requirements review

OS and architecture unavailability

Windows Servers are not available on arm64 arch
Windows 11 and Windows 10 only on Insider Preview

Agent's hardware requirements do not meet OS minimum requirements

Next OSes have higher hardware requirements

Windows Server 2019: 1.5 GB
Windows Server 2022: 2 GB
macOS Sonoma https://support.apple.com/en-us/HT213772 (not direct, should check those recommended systems)
macOS Ventura https://support.apple.com/en-us/HT213264 (not direct, should check those recommended systems)

Known problems

Central components on ARM64 due to filebeat on arm
Windows arm64 native package does not exist yet

Mangers on those OSes that only support Agents

Define a fixer/dynamic policy of manager selection to improve testing coverage

Test order

From production to the current version
- Install the last production version (based on criteria, last patch, or version)
From the current version to the future version (dummy, same but to test upgradeability)
From production to future version

Wazuh Manager and Wazuh Agent test interleaving

Wazuh Agent tests need some validation from the manager side(registration, connection) , but at the same time, the Wazuh Manager has their own testing. The idea is to determine/define the optimal and decoupled test flow that meets the requirements in the less available time

davidjiglesias · 2023-11-28T13:01:05Z

Requirements review

Amazon Linux latest is the OS chosen for the manager used in the agents testing
Agents and central components tests will be executed independently

QU3B1M · 2023-12-04T21:30:14Z

Draw a high-level diagram of the modules workflow

fcaffieri · 2023-12-12T23:47:23Z

Weekly Minutes DTT1

Participants:
Kevin, Victor, Raul, Nico, Fede and David.

Conclusions:
After the weekly on DTT, the need to incorporate DAG methodology was defined, in order to have an execution orchestrator which defines in a simple way and is user-friendly, the test cases that will be carried out. It must allow the flexibility to execute any use case in parallel and its output must be the yaml that will be used by the already defined modules (Allocation, Provision and Test).
An analysis of the proposed tools, advantages and disadvantages of each is required, to choose, together with the team, the tool that fits natively to our needs. Its use must be simple, intuitive and scalable.
To process this, the following issue was created #4766

rauldpm · 2024-04-11T11:29:27Z

Moved ETA to 16/04/2024 due to #5198 (based on issue ETA)

A new issue has been opened as we need to adapt the test module to use a single manager: #5202 (Same ETA)
Desirable, but not stopper: #5203

fcaffieri · 2024-04-16T12:48:35Z

The automation section is removed because it will be worked on DTT2

Automation

DTT1 tests must run in Nightly CI
DTT1 tests must run in Weekly CI
DTT1 tests must run in pre-release testing
DTT1 tests costs must be measurable

rauldpm · 2024-04-16T13:00:43Z

Moved ETA to 29/04/2024 as we have to work on the following issues

DTT1 - Test Module - macOS and Windows tests #5218 (EPIC)
- Will have two issues: macOS and Windows
DTT1 - Test Module - Central component tests #5219
DTT1 - Workflow module - Bug with threads #5220
DTT1 - Test Module - Improve test module execution time #5221

We need the following issue from the DevOps team

DTT1 - Allocation bug - Instance connection error #5198

As 4.9.0 is targeted to 2/05/2024, we plan to use the 30/04 and 2/05 to test and retrieve metrics

rauldpm · 2024-04-25T13:08:36Z

Moved the ETA to 3/5/2024 as 1/5/2024 is a holiday and we need some time to test the changes in the main branch (#5191). This has been discussed and approved with @davidjiglesias

rauldpm · 2024-05-02T19:12:06Z

Based on all DTT1 pending issues by each team and ETAs:

Team	Issue	Actual ETA
@wazuh/devel-devops	#5295	7/5/2024
@wazuh/devel-devops	#5311	10/5/2024
@wazuh/devel-qa-div1	#5240	2/5/2024
@wazuh/devel-qa-div1	#5230	3/5/2024
@wazuh/devel-qa-div1	#5218	3/5/2024
@wazuh/devel-qa-div1	#5219	6/5/2024
@wazuh/devel-qa-div1	#5191	15/5/2024
@wazuh/devel-qa-div1	#5323	3/5/2024

This issue will change the ETA to Monday 15/5/2024 so we can test all changes (issue #5191)

rauldpm · 2024-05-08T19:46:56Z

Removed Windows ARM from OS list as there is no Windows ARM available yet

rauldpm · 2024-06-16T14:43:29Z

ETA moved to 18 June #5191 (comment)

rauldpm · 2024-06-19T17:15:32Z

LGTM

The branch must be kept alive until the Agent team changes the GHA workflow references

davidjiglesias added level/epic type/enhancement labels Sep 5, 2023

rauldpm self-assigned this Sep 12, 2023

jnasselle self-assigned this Sep 13, 2023

rauldpm mentioned this issue Sep 13, 2023

DTT1 - Iteration 1 - Design solution #4519

Closed

rauldpm mentioned this issue Sep 15, 2023

DTT1 - Design and develop PoC #4524

Closed

rauldpm assigned fcaffieri Oct 6, 2023

davidjiglesias self-assigned this Nov 23, 2023

davidcr01 mentioned this issue Nov 29, 2023

[Feature Request] - Add Alma Linux as a supported OS. wazuh/wazuh#20491

Closed

QU3B1M self-assigned this Nov 29, 2023

This was referenced Dec 6, 2023

DTT1 - Test module - pytest-reporter plugin #4736

Closed

DTT1 - Iteration 2 - Allocator module #4746

Closed

DTT1 - Allocator module - Code refactor #4747

Closed

This was referenced Dec 6, 2023

DTT1 - Iteration 2 - Provision module #4749

Closed

DTT1 - Iteration 2 - Test module #4750

Closed

DTT1 - Iteration 2 - PoC with task flow #4751

Closed

Design DTT1 - Define (DAG Directed Acyclic Graph) #4766

Closed

jnasselle mentioned this issue Dec 26, 2023

DTT1 - Iteration 2 - QA Workflow Engine #4796

Closed

20 tasks

fcaffieri mentioned this issue Dec 29, 2023

Upload allocator improvement to PoC #4800

Merged

fcaffieri mentioned this issue Jan 12, 2024

DTT2 - Observability module - Improvements and adaptations #4837

Closed

8 tasks

fcaffieri linked a pull request Jan 16, 2024 that will close this issue

Enhancement/4751 dtt1 iteration 2 poc #4841

Merged

wazuhci moved this from In review to On hold in Release 4.9.0 Apr 11, 2024

rauldpm mentioned this issue Apr 11, 2024

DTT1 - Testing - Allow single Wazuh manager test #5202

Closed

mhamra mentioned this issue Apr 12, 2024

DTT1. Add vagrant resource utilization examples to the documentation #5206

Closed

wazuhci moved this from On hold to In progress in Release 4.9.0 Apr 12, 2024

rauldpm linked a pull request Apr 26, 2024 that will close this issue

Removing python3-pip from remote_requirements.txt in deps #5299

Merged

rauldpm mentioned this issue May 2, 2024

DTT1 - Identify Allocator module resources uniquely #5311

Closed

4 tasks

fcaffieri mentioned this issue May 13, 2024

DTT1 - Validate that DTT allows using the live and pre-release repository #5366

Closed

3 tasks

rauldpm mentioned this issue May 15, 2024

DTT1 - 4.8.0 adaptation #5391

Closed

2 tasks

rauldpm changed the title ~~Deployability testing tier 1~~ Detasployability testing tier 1 May 15, 2024

rauldpm changed the title ~~Detasployability testing tier 1~~ Deployability testing tier 1 May 16, 2024

fcaffieri mentioned this issue May 21, 2024

DTT1 - Test module if Ansible execution fails not report to Workflow #5411

Closed

2 tasks

rauldpm mentioned this issue Jun 4, 2024

Add DTT1 workflow tool name #5458

Closed

2 tasks

This was referenced Jun 5, 2024

Fix CentOS 8 AMI #5463

Closed

Allocation - Fix disk allocation #5464

Closed

wazuhci moved this from In progress to In review in Release 4.9.0 Jun 18, 2024

wazuhci moved this from In review to In progress in Release 4.9.0 Jun 18, 2024

wazuhci moved this from In progress to In review in Release 4.9.0 Jun 18, 2024

wazuhci moved this from In review to Pending final review in Release 4.9.0 Jun 18, 2024

rauldpm closed this as completed Jun 19, 2024

github-project-automation bot moved this from In progress to Done in Roadmap Jun 19, 2024

wazuhci moved this from Pending final review to Done in Release 4.9.0 Jun 19, 2024

QU3B1M mentioned this issue Jul 10, 2024

Migrate new Jenkins and QA tools to the new QA repository #5557

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployability testing tier 1 #4495

Deployability testing tier 1 #4495

davidjiglesias commented Sep 5, 2023 •

edited by rauldpm

Loading

rauldpm commented Sep 14, 2023

jnasselle commented Oct 11, 2023 •

edited

Loading

davidjiglesias commented Nov 28, 2023

QU3B1M commented Dec 4, 2023

fcaffieri commented Dec 12, 2023

rauldpm commented Apr 11, 2024 •

edited

Loading

fcaffieri commented Apr 16, 2024

rauldpm commented Apr 16, 2024 •

edited

Loading

rauldpm commented Apr 25, 2024 •

edited

Loading

rauldpm commented May 2, 2024 •

edited

Loading

rauldpm commented May 8, 2024

rauldpm commented Jun 16, 2024

rauldpm commented Jun 19, 2024 •

edited

Loading

Deployability testing tier 1 #4495

Deployability testing tier 1 #4495

Comments

davidjiglesias commented Sep 5, 2023 • edited by rauldpm Loading

Description

Functional requirements

Agents

Central components

Non-functional requirements

Hardware

Implementation restrictions

Plan

First iteration

Objetive:

Results:

Second iteration:

Objetive:

Results:

Iteration 3:

Objective:

Tasks:

General

Workflow engine

Provision

Allocation

Tests

Add Copyright

Release

Results:

Issue to include in DTT Tier 2

Devepot automated unit test

Objective:

Implement best practices

Jenkins implementation

Observability

Post-development:

Branch

Approved by

rauldpm commented Sep 14, 2023

jnasselle commented Oct 11, 2023 • edited Loading

Requirements review

OS and architecture unavailability

Agent's hardware requirements do not meet OS minimum requirements

Known problems

Mangers on those OSes that only support Agents

Test order

Wazuh Manager and Wazuh Agent test interleaving

davidjiglesias commented Nov 28, 2023

Requirements review

QU3B1M commented Dec 4, 2023

fcaffieri commented Dec 12, 2023

rauldpm commented Apr 11, 2024 • edited Loading

fcaffieri commented Apr 16, 2024

Automation

rauldpm commented Apr 16, 2024 • edited Loading

rauldpm commented Apr 25, 2024 • edited Loading

rauldpm commented May 2, 2024 • edited Loading

rauldpm commented May 8, 2024

rauldpm commented Jun 16, 2024

rauldpm commented Jun 19, 2024 • edited Loading

davidjiglesias commented Sep 5, 2023 •

edited by rauldpm

Loading

jnasselle commented Oct 11, 2023 •

edited

Loading

rauldpm commented Apr 11, 2024 •

edited

Loading

rauldpm commented Apr 16, 2024 •

edited

Loading

rauldpm commented Apr 25, 2024 •

edited

Loading

rauldpm commented May 2, 2024 •

edited

Loading

rauldpm commented Jun 19, 2024 •

edited

Loading