Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cAdvisor tries determining unavailable fields for offline CPUs. #3317

Closed
kishen-v opened this issue May 30, 2023 · 2 comments
Closed

cAdvisor tries determining unavailable fields for offline CPUs. #3317

kishen-v opened this issue May 30, 2023 · 2 comments

Comments

@kishen-v
Copy link
Contributor

kishen-v commented May 30, 2023

When CPUs are marked as offline, certain fields such as physical_package_id and core_id are set to 0 which should have been skipped as they're unavailable under the/sys/bus/cpu/devices/cpuX/topology directory, and the existing testcases too expect the same behaviour.

Note: CPUs were marked offline using the following command echo 0 > /sys/bus/cpu/devices/cpu15/online

Expected behaviour:
The expected behaviour, based on logic and unit tests, is for cAdvisor to skip determining the fields for offline CPUs. However, this behaviour is not observed due to the construction of absolute paths based on the parameters sent, which can lead to construction of non-existent paths.



Observed behaviour:
In environments where node topology is used, the constructed path /sys/devices/system/node/nodeX/online does not exist. This leads to erroneously marking the CPU online due to the file's unavailability in the environment based on the existing design.

cpuDirs, err := sysFs.GetCPUsPaths(nodeDir)
if len(cpuDirs) == 0 {
klog.Warningf("Found node without any CPU, nodeDir: %s, number of cpuDirs %d, err: %v", nodeDir, len(cpuDirs), err)
} else {
cores, err := getCoresInfo(sysFs, cpuDirs)

As a consequence of the above issue, warning log messages are raised related to missing files under the /cpuX/topology directory.

// Quick check to determine if file exists: if it does not then kernel CPU hotplug is disabled and all CPUs are online.
_, err = os.Stat(onlinePath)
if err != nil && os.IsNotExist(err) {
return true
}

Since the above logic returns true, the CPUs that are unavailable are not skipped but are also considered for retrieving information.
if !sysFs.IsCPUOnline(cpuDir) {
continue
}
rawPhysicalID, err := sysFs.GetCoreID(cpuDir)
if os.IsNotExist(err) {
klog.Warningf("Cannot read core id for %s, core_id file does not exist, err: %s", cpuDir, err)
continue


Since the path /sys/bus/cpu/devices/online does not exist , isCpuOnline function returns false along with the no such file or directory error.

func isCPUOnline(path string, cpuID uint16) (bool, error) {
fileContent, err := ioutil.ReadFile(path)
if err != nil {
return false, err
}

physical_package_id and core_id are determined for CPUs that are marked offline, compared to the expected behaviour where it needs to be skipped.

isOnline, err := isCPUOnline(onlinePath, cpuID)

!isOnline && !os.IsNotExist(err) is not satisfied to skip determining fields for CPUs that are marked offline.
if !isOnline && !os.IsNotExist(err) {
continue


Possible solution:
This issue can be mitigated by setting the cpuOnlinePath to specific file, which would enable the identification of the CPUs that are available/unavailable hence, suppressing the warning messages regarding the missing files under the CPU topology directory. This will avoid multiple warning messages raised in the case of large compute nodes and the associated noise in the logs.

Such as :
W0523 13:54:11.121383 1403010 sysinfo.go:434] Cannot read core id for /sys/devices/system/node/nodeX/cpuX, core_id file does not exist, err: open /sys/devices/system/node/nodeX/cpuX/topology/core_id: no such file or directory

W0523 13:54:11.122672 1403010 sysfs.go:512] Cannot open /sys/bus/cpu/devices/cpuX/topology/core_id, assuming 0 for core_id of CPU X

W0523 13:54:11.122701 1403010 sysfs.go:518] Cannot open /sys/bus/cpu/devices/cpuX/topology/physical_package_id, assuming 0 physical_package_id of CPU X

I'm really looking forward to any suggestions or insights from the community :)

@kishen-v
Copy link
Contributor Author

/assign

@iwankgb
Copy link
Collaborator

iwankgb commented Jun 27, 2023

Fixed in #3318.

@iwankgb iwankgb closed this as completed Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants