Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should process.cpu.utilization and system.cpu.utilization be opt-in? #1130

Closed
ChrsMark opened this issue Jun 7, 2024 · 9 comments · Fixed by #1308
Closed

Should process.cpu.utilization and system.cpu.utilization be opt-in? #1130

ChrsMark opened this issue Jun 7, 2024 · 9 comments · Fixed by #1308
Assignees
Labels
area:system enhancement New feature or request

Comments

@ChrsMark
Copy link
Member

ChrsMark commented Jun 7, 2024

Area(s)

area:system

Is your change request related to a problem? Please describe.

Following up from #647 (comment) we should decide on the requirement level for the process.cpu.utilization and system.cpu.utilization. At the moment these are recommended metrics but maybe we can consider if those can be opt-in since both of them can be calculated from their respective *.cpu.time metric.

JVM metrics define the jvm.cpu.recent_utilization as recommended however reading its description I'm not sure if that one can be derived from the jvm.cpu.time, so that might be a different case.

/cc @open-telemetry/semconv-system-approvers @braydonk @open-telemetry/semconv-jvm-approvers

Describe the solution you'd like

Having an alignment (if possible) on the requirement level for the *.cpu.utilization metrics across SemConv.

Describe alternatives you've considered

No response

Additional context

No response

@mx-psi
Copy link
Member

mx-psi commented Jun 7, 2024

I support making them opt-in

@trask
Copy link
Member

trask commented Jun 7, 2024

there was a semi-related discussion a while back:

open-telemetry/opentelemetry-specification#2392 (comment)

We had the same debate for system.cpu.utilization and the result of the discussion is that system.cpu.utilization was simpler for some backends than calculating from two metrics.

heyams pushed a commit to heyams/semantic-conventions that referenced this issue Jun 11, 2024
heyams pushed a commit to heyams/semantic-conventions that referenced this issue Jun 11, 2024
@ChrsMark
Copy link
Member Author

I have mixed feelings about this tbh.
I would feel better to make those opt-in if we had system.cpu.usage and process.cpu.usage metrics (marked as recommended) to provide the deltas out of the box. Then the actual triangle is cpu.usage-cpu.utilization-cpu.count where we can standardize the cpu.usage and cpu.count while leave the cpu.utilization as opt-in.

@open-telemetry/semconv-system-approvers any thoughts on this?

@braydonk
Copy link
Contributor

@ChrsMark could you clarify what cpu.usage is? There is already cpu.time which can be used along with cpu.count to get a percentage for cpu.utilization. I'm not sure what cpu.usage is representing in this scenario.

@ChrsMark
Copy link
Member Author

ChrsMark commented Jul 30, 2024

@ChrsMark could you clarify what cpu.usage is? There is already cpu.time which can be used along with cpu.count to get a percentage for cpu.utilization. I'm not sure what cpu.usage is representing in this scenario.

Please check me on this but I'm referring to the delta metric that we can calculate from the cumulative cpu.time. As it's also explained at open-telemetry/opentelemetry-specification#2392 (comment) it is:

cpu.usage = (cpu.time 2 - cpu.time 1) / (Timestamp 2 - Timestamp 1)

Where dividing this by cpu.count gives you the cpu.utilization.

If system.cpu.utilization is:

Difference in system.cpu.time since the last measurement, divided by the elapsed time and number of logical CPUs

Then system.cpu.usage would be:

Difference in system.cpu.time since the last measurement, divided by the elapsed time

My point point here is to ensure consistency with other metrics like the container's ones: #1128

However at https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/c4bee406f5db233a2bc7f0c69185764fea9fb033/receiver/hostmetricsreceiver/internal/scraper/cpuscraper/ucal/cpu_utilization_calculator.go#L54 I see we don't really divide by the cpu.count 🤔 so I wonder if I miss anything here.

@dmitryax
Copy link
Member

dmitryax commented Aug 1, 2024

I support this. Thanks @ChrsMark!

@ChrsMark
Copy link
Member Author

ChrsMark commented Aug 1, 2024

However at https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/c4bee406f5db233a2bc7f0c69185764fea9fb033/receiver/hostmetricsreceiver/internal/scraper/cpuscraper/ucal/cpu_utilization_calculator.go#L54 I see we don't really divide by the cpu.count 🤔 so I wonder if I miss anything here.

We discussed that during today's SIG meeting (1st Aug 2024). Since we report this per core that's why we don't divide by the number of cores (it's actually a division by 1). The correct aggregation for the total cpu.utilization would be the avg.

For processes it's also calculated properly. Divided by number of cores: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/c4bee406f5db233a2bc7f0c69185764fea9fb033/receiver/hostmetricsreceiver/internal/scraper/processscraper/ucal/cpu_utilization_calculator.go#L60-L65

Hence, we are consistent here with other areas like k8s/containers. It's also not a real need to introduce the cpu.usage now.
We can proceed with switching to opt-in here. I will file a PR for this.

@braydonk
Copy link
Contributor

braydonk commented Aug 1, 2024

Thanks for the info. I support as well, thanks!

@ChrsMark
Copy link
Member Author

ChrsMark commented Aug 2, 2024

PR raised: #1308

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:system enhancement New feature or request
Projects
Development

Successfully merging a pull request may close this issue.

6 participants