Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

koordlet: add resctrl qos collector #2005

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Rouzip
Copy link

@Rouzip Rouzip commented Apr 15, 2024

Ⅰ. Describe what this PR does

Add resctrl qos collector.

Ⅱ. Does this pull request fix one issue?

fixes #1832

Ⅲ. Describe how to verify it

After enable resctrl flag in config:

curl http://localhost:9316/metrics|grep resctrl

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

Copy link

codecov bot commented Apr 15, 2024

Codecov Report

Attention: Patch coverage is 57.64192% with 97 lines in your changes missing coverage. Please review.

Project coverage is 67.14%. Comparing base (83267af) to head (1dca501).

Files with missing lines Patch % Lines
...icsadvisor/collectors/resctrl/resctrl_collector.go 19.40% 54 Missing ⚠️
pkg/koordlet/resourceexecutor/resctrl.go 69.44% 12 Missing and 10 partials ⚠️
pkg/koordlet/metrics/resctrl.go 0.00% 9 Missing ⚠️
pkg/koordlet/metriccache/metric_types.go 0.00% 7 Missing ⚠️
pkg/koordlet/util/system/resctrl.go 90.56% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2005      +/-   ##
==========================================
- Coverage   67.19%   67.14%   -0.06%     
==========================================
  Files         451      454       +3     
  Lines       43468    43686     +218     
==========================================
+ Hits        29208    29331     +123     
- Misses      11714    11798      +84     
- Partials     2546     2557      +11     
Flag Coverage Δ
unittests 67.14% <57.64%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Rouzip <1226015390@qq.com>
@koordinator-bot koordinator-bot bot added size/XXL and removed size/XL labels May 9, 2024
@Rouzip Rouzip changed the title koordlet: add resctrl qos collector(WIP) koordlet: add resctrl qos collector May 9, 2024
@Rouzip
Copy link
Author

Rouzip commented May 9, 2024

Sorry for the late pr, any comments is welcome.

Signed-off-by: Rouzip <1226015390@qq.com>
pkg/koordlet/metrics/resctrl.go Outdated Show resolved Hide resolved
pkg/koordlet/util/system/resctrl.go Outdated Show resolved Hide resolved
pkg/koordlet/resourceexecutor/resctrl.go Outdated Show resolved Hide resolved
pkg/koordlet/resourceexecutor/resctrl.go Show resolved Hide resolved
@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign fillzpp, zwzhang0107 after the PR has been reviewed.
You can assign the PR to them by writing /assign @fillzpp @zwzhang0107 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Rouzip Rouzip force-pushed the collect branch 2 times, most recently from 2d1efd6 to c28c849 Compare May 29, 2024 03:29
@Rouzip
Copy link
Author

Rouzip commented Jun 25, 2024

hello, PTAL 😊 @saintube

Signed-off-by: Rouzip <1226015390@qq.com>
@saintube
Copy link
Member

@Rouzip Thanks for your great contributions. Since it is a large patch, we need to make some tests before it is merged.

@Rouzip
Copy link
Author

Rouzip commented Jul 15, 2024

@Rouzip Thanks for your great contributions. Since it is a large patch, we need to make some tests before it is merged.

😊 Anything I can do?

@saintube
Copy link
Member

saintube commented Jul 16, 2024

@Rouzip Thanks for your great contributions. Since it is a large patch, we need to make some tests before it is merged.

😊 Anything I can do?

We will verify this patch on some test environments later. It would also be appreciated if you could add more UTs to increase the patch coverage to no less than the target of 70%.

Signed-off-by: Rouzip <1226015390@qq.com>
Signed-off-by: Frame <saintube@foxmail.com>
Signed-off-by: Frame <saintube@foxmail.com>
Copy link
Member

@saintube saintube left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
PTAL /cc @zwzhang0107 @hormes

@saintube saintube added the lgtm label Sep 30, 2024
"github.com/koordinator-sh/koordinator/pkg/koordlet/metriccache"
"github.com/koordinator-sh/koordinator/pkg/koordlet/metrics"
"github.com/koordinator-sh/koordinator/pkg/koordlet/metricsadvisor/framework"
"github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/resctrl"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better not not import qos plugin
these const can be moved to utils

return &ResctrlAMDReader{}
}

func (rr *ResctrlBaseReader) ReadResctrlL3Stat(parent string) (map[CacheId]uint64, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more comments about ReadResctrlL3Stat/ReadResctrlMBStat can make the reader easy understand

@@ -48,7 +50,9 @@ func NewDefaultConfig() *Config {
PSICollectorInterval: 10 * time.Second,
CPICollectorTimeWindow: 10 * time.Second,
ColdPageCollectorInterval: 5 * time.Second,
ResctrlCollectorInterval: 1 * time.Second,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now 1 second seems to short and not necessary now, how about 10 seconds by default?

QosResctrl = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Subsystem: KoordletSubsystem,
Name: "qos_resctrl",
Help: "qos resctrl collected by koordlet",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detailed help msg will be helpful since resctrl is really an advanced metrics.

var (
QosResctrl = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Subsystem: KoordletSubsystem,
Name: "qos_resctrl",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"qos" is already included in labels, maybe define as two metric such as "llc_occupancy" and "mbm_occupancy". so that we don't need to set "MetricPropertyResctrlMbType="") during record metrics.

so does the MetricPropertiesFunc

@@ -102,6 +106,12 @@ const (

MetricPropertyCPIResource MetricProperty = "cpi_resource"

MetricPropertyNodeQos MetricProperty = "node_qos"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetricPropertyQos MetricProperty = "qos"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[proposal] QoS class level LLC/MBA metrics collector
3 participants