Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-4786: direct cgroup status collection on Node #4792

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

linxiulei
Copy link

  • One-line PR description: Direct cgroup stats collection on Node
  • Other comments:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 19, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: linxiulei
Once this PR has been reviewed and has the lgtm label, please assign mrunalp for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Aug 19, 2024
@kannon92
Copy link
Contributor

/retitle KEP-4786: direct cgroup status collection on Node

@k8s-ci-robot k8s-ci-robot changed the title Collect cgroup stats directly by Kubelet KEP-4786: direct cgroup status collection on Node Aug 19, 2024
After enabling feature `PodAndContainerStatsFromCRI`, only
[summary API][summary-api] invokes cAdvisor for stats of:

* Root filesystem
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image filesystem also.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls correct me if I'm wrong. I think imagefs is taken care of by CRI, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at that code, we are using cadvisor for Availability and Capacity.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that cadvisor will be replaced to not use cadvisor

Comment on lines +89 to +90
This KEP aims to eliminate the need to run cAdvisor with enablement of KEP-2371
for better performance and simplicity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, IIUC, KEP-2371 is about implementing the missing bits from the CRI/CRI-API perspective and this KEP would actually remove the use of cadvisor wherever it's necessary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but not entirely removing the use of cadvisor. More specifically, this KEP will remove the background task that runs cadvisor routines but still call cadvisor code on demand.

### Goals

* Improve performance in Kubelet without running cAdvisor.
* Do not introduce breaking changes to the Summary API or eviction function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this KEP will cover both cgroup v1 and v2?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should only cover v2, as v1 is feature frozen

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does support both cgroup versions but it's not intentional. The implementation would take advantage of libraries that support collecting stats for both versions so it's agnostic to cgroup versions.

@haircommander
Copy link
Contributor

I think something we have to be wary about this approach is how metrics are undocumented/unintended GA APIs in kubernetes. While it seems natural to exclude cgroup stats collection for all cgroups other than the ones kubelet is aware of, there are likely users who are relying on cadvisor collecting those stats and will be upset by us dropping them.

I wonder if the same performance gains can be made by adding a tunable in the kubelet configuration that allows a user to say which cgroups we collect metrics about. Then we could get data from users on whether they want the non-kube cgroups to have stats collected for them. WDYT @linxiulei

@linxiulei
Copy link
Author

wonder if the same performance gains can be made by adding a tunable in the kubelet configuration that allows a user to say which cgroups we collect metrics about.

The cgroups we will drop in this KEP are individual Pods' cgroups. I am not sure how configurable they are since they are all under */kubepod/ path. Also adding a tunable for this KEP will significantly increase the complexity so I'd refrain doing so.

Then we could get data from users on whether they want the non-kube cgroups to have stats collected for them.

Alternatively, we can make this KEP an opt-in feature, so users who still want non-kube cgroup stats, they can opt out this feature or until they find an alternative to collect non-kube cgroup stats, which I genuinely think should not be part of kubelet.

@haircommander
Copy link
Contributor

 The cgroups we will drop in this KEP are individual Pods' cgroups. I am not sure how configurable they are since they are all under */kubepod/ path. Also adding a tunable for this KEP will significantly increase the complexity so I'd refrain doing so.

pod cgroups like the pod slice, or the container scopes as well? theoretically, CRI stats KEP should cover the container scope piece (I don't think it does today but it should). if you're talking about the pod cgroup, then I still maintain there may be users relying on these metrics (along with any others we may be dropping, unfortunately)

@linxiulei
Copy link
Author

Sorry for the confusion. Let me clarify, currently there are following cgroup stats collected by kubelet and cAdvisor

  • Root cgroup
  • All non-pod cgroups
    • Including non-pod cgroups as specified in --runtime-cgroups,
      --system-cgroups and --kubelet-cgroups
  • Root cgroup for pods (e.g. /sys/fs/cgroup/kubepods.slice/)
  • Pod cgroups

This KEP won't drop any of them. However, Pod cgroups are collected by cAdvisor and CRI stats KEP (if enabled) at the same time. Therefore, this KEP removes the cAdvisor's collection by collecting what CRI stats KEP is not yet collecting.

So after this KEP, kubelet collects

  • Root cgroup
  • All non-pod cgroups
    • Including non-pod cgroups as specified in --runtime-cgroups,
      --system-cgroups and --kubelet-cgroups
  • Root cgroup for pods (e.g. /sys/fs/cgroup/kubepods.slice/)

(here what I meant dropping pod cgroups incorrectly)

And CRI stats collects

  • Pod cgroups

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants