Sustained High CPU when scaling loadtest #22367

mostlikelee · 2024-09-25T12:40:23Z

Goal

User story
As an engineer running Fleet loadtests
I want to create a stable large loadtest environment in a short amount of time
so that I can receive loadtest feedback quickly and with lower cost.

Context

Requestor(s): @mostlikelee
Product designer: _________________________

Using the existing script to scale up 100K osquery-perf agents over ~30 min resulted in sustained high Fleet CPU and DB reader CPU. As a workaround, I updated the script to scale up over 2 hours with success. The original cadence worked in the past, so there may be a performance regression when adding too many agents too quickly to Fleet. I imagine this is not affecting large customers because they are not adding agents in this quickly, but it has a high impact on test velocity.

Changes

Product

Engineering

Feature guide changes: TODO
Database schema migrations: TODO
Load testing: TODO

ℹ️ Please read this issue carefully and understand it. Pay special attention to UI wireframes, especially "dev notes".

QA

Risk assessment

Requires load testing: TODO
Risk level: Low / High TODO
Risk description: TODO

Manual testing steps

Step 1
Step 2
Step 3

Testing notes

Confirmation

Engineer (@____): Added comment to user story confirming successful completion of QA.
QA (@____): Added comment to user story confirming successful completion of QA.

lukeheath · 2024-09-25T14:50:49Z

@mostlikelee Thanks for filing this. I am prioritizing to the drafting board and assigning to @sharon-fdm for estimation.

so there may be a performance regression when adding too many agents too quickly to Fleet

I'm seeing a series of issues coming through since 4.56.0 with reports of new performance issues (#22291, #22122) that seem related. It's important we dig in and figure out why this is happening before it results in a production issue.

mostlikelee added story A user story defining an entire feature ~engineering-initiated Engineering-initiated story, such as a bug, refactor, or contributor experience improvement. labels Sep 25, 2024

mostlikelee assigned lukeheath Sep 25, 2024

lukeheath added :product Product Design department (shows up on 🦢 Drafting board) ~eng-priority Engineering-initiated story that was prioritized. labels Sep 25, 2024

lukeheath assigned sharon-fdm and unassigned lukeheath Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sustained High CPU when scaling loadtest #22367

Sustained High CPU when scaling loadtest #22367

mostlikelee commented Sep 25, 2024

lukeheath commented Sep 25, 2024

Sustained High CPU when scaling loadtest #22367

Sustained High CPU when scaling loadtest #22367

Comments

mostlikelee commented Sep 25, 2024

Goal

Context

Changes

Product

Engineering

QA

Risk assessment

Manual testing steps

Testing notes

Confirmation

lukeheath commented Sep 25, 2024