You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When CPUs are marked as offline, certain fields such as physical_package_id and core_id are set to 0 which should have been skipped as they're unavailable under the/sys/bus/cpu/devices/cpuX/topology directory, and the existing testcases too expect the same behaviour.
Note: CPUs were marked offline using the following command echo 0 > /sys/bus/cpu/devices/cpu15/online
Expected behaviour:
The expected behaviour, based on logic and unit tests, is for cAdvisor to skip determining the fields for offline CPUs. However, this behaviour is not observed due to the construction of absolute paths based on the parameters sent, which can lead to construction of non-existent paths.
Observed behaviour:
In environments where node topology is used, the constructed path /sys/devices/system/node/nodeX/online does not exist. This leads to erroneously marking the CPU online due to the file's unavailability in the environment based on the existing design.
Possible solution:
This issue can be mitigated by setting the cpuOnlinePath to specific file, which would enable the identification of the CPUs that are available/unavailable hence, suppressing the warning messages regarding the missing files under the CPU topology directory. This will avoid multiple warning messages raised in the case of large compute nodes and the associated noise in the logs.
Such as : W0523 13:54:11.121383 1403010 sysinfo.go:434] Cannot read core id for /sys/devices/system/node/nodeX/cpuX, core_id file does not exist, err: open /sys/devices/system/node/nodeX/cpuX/topology/core_id: no such file or directory
W0523 13:54:11.122672 1403010 sysfs.go:512] Cannot open /sys/bus/cpu/devices/cpuX/topology/core_id, assuming 0 for core_id of CPU X
W0523 13:54:11.122701 1403010 sysfs.go:518] Cannot open /sys/bus/cpu/devices/cpuX/topology/physical_package_id, assuming 0 physical_package_id of CPU X
I'm really looking forward to any suggestions or insights from the community :)
The text was updated successfully, but these errors were encountered:
When CPUs are marked as offline, certain fields such as physical_package_id and core_id are set to 0 which should have been skipped as they're unavailable under the
/sys/bus/cpu/devices/cpuX/topology
directory, and the existing testcases too expect the same behaviour.Expected behaviour:
The expected behaviour, based on logic and unit tests, is for cAdvisor to skip determining the fields for offline CPUs. However, this behaviour is not observed due to the construction of absolute paths based on the parameters sent, which can lead to construction of non-existent paths.
Observed behaviour:
In environments where node topology is used, the constructed path
/sys/devices/system/node/nodeX/online
does not exist. This leads to erroneously marking the CPU online due to the file's unavailability in the environment based on the existing design.cadvisor/utils/sysinfo/sysinfo.go
Lines 214 to 218 in 137032c
As a consequence of the above issue, warning log messages are raised related to missing files under the /cpuX/topology directory.
cadvisor/utils/sysfs/sysfs.go
Lines 384 to 388 in 137032c
Since the above logic returns true, the CPUs that are unavailable are not skipped but are also considered for retrieving information.
cadvisor/utils/sysinfo/sysinfo.go
Lines 428 to 435 in 137032c
Since the path
/sys/bus/cpu/devices/online
does not exist , isCpuOnline function returnsfalse
along with theno such file or directory
error.cadvisor/utils/sysfs/sysfs.go
Lines 424 to 428 in 137032c
physical_package_id
andcore_id
are determined for CPUs that are marked offline, compared to the expected behaviour where it needs to be skipped.cadvisor/utils/sysfs/sysfs.go
Line 501 in 137032c
!isOnline && !os.IsNotExist(err) is not satisfied to skip determining fields for CPUs that are marked offline.
cadvisor/utils/sysfs/sysfs.go
Lines 506 to 507 in 137032c
Possible solution:
This issue can be mitigated by setting the
cpuOnlinePath
to specific file, which would enable the identification of the CPUs that are available/unavailable hence, suppressing the warning messages regarding the missing files under the CPU topology directory. This will avoid multiple warning messages raised in the case of large compute nodes and the associated noise in the logs.Such as :
W0523 13:54:11.121383 1403010 sysinfo.go:434] Cannot read core id for /sys/devices/system/node/nodeX/cpuX, core_id file does not exist, err: open /sys/devices/system/node/nodeX/cpuX/topology/core_id: no such file or directory
W0523 13:54:11.122672 1403010 sysfs.go:512] Cannot open /sys/bus/cpu/devices/cpuX/topology/core_id, assuming 0 for core_id of CPU X
W0523 13:54:11.122701 1403010 sysfs.go:518] Cannot open /sys/bus/cpu/devices/cpuX/topology/physical_package_id, assuming 0 physical_package_id of CPU X
I'm really looking forward to any suggestions or insights from the community :)
The text was updated successfully, but these errors were encountered: