-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add VCS metrics from Github receiver #1383
base: main
Are you sure you want to change the base?
Add VCS metrics from Github receiver #1383
Conversation
|
model/metrics/vcs.yaml
Outdated
type: metric | ||
metric_name: vcs.repository.change.count | ||
brief: 'The number of changes (pull requests) in a repository, categorized by their state (either open or merged)' | ||
instrument: gauge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's probably interesting to have two separate metrics - one is updadowncounter
for number of open PRs and another one (counter or maybe up-down-too since I can reopen PRs) for closed PRs depending on the state (merged or not merged)
Otherwise we're having very different scales and different types of things we measure (almost monotonically incrementing and eventually big number of closed/merged PRs vs probably very small and constant (or so) number of open ones).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion in the CICD working group, we think this would be best served by a histogram and will make that change pending any objections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some further thought and reading Instrument selection, updowncounter
seems the best instrument for vcs.repository.ref.count
and vcs.repository.change.count
metrics since the measurements are additive over the attributes.
I made this change in aab9f90.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lmolkova Why is it an issue if the metric has vastly different values for the different values of the vcs.change.state
attribute (eg. open
, closed
, merged
)?
We can filter by this attribute if we want to consider open vs closed/merged PRs separately.
model/metrics/vcs.yaml
Outdated
type: metric | ||
metric_name: vcs.repository.change.time_open | ||
brief: 'The amount of time a change (pull request) has been open' | ||
instrument: gauge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be a histogram.
BTW if it is, then number of closed/merged PRs can be derived from it and all that's necessary is the number of open PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lmolkova - would it that there would be two histogram buckets based on state? ie. a bucket for opened pre, and a bucket for closed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
histograms have buckets per combination of attribute values, i.e. you can see percentiles per state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I'd be fine with this being a histogram bucket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion in the CICD working group, we think this would be best served by a histogram and will make that change pending any objections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 5a9ffe7 I have combined vcs.repository.change.time_open
and vcs.repository.change.time_to_merge
to be instrument: histogram
in metric vcs.repository.change.time
.
I distinguish them by the added vcs.change.state
attribute.
Is that better or should we keep them as separate metrics?
model/metrics/vcs.yaml
Outdated
- id: metric.vcs.repository.change.time_open | ||
type: metric | ||
metric_name: vcs.repository.change.time_open | ||
brief: 'The amount of time a change (pull request) has been open' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it cumulative? E.g. if I open and then close and then reopen, what would it count??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting question because it shows the gap between interval based scraping and event counting. These metrics originate from a scraping, that has to run through all the queries necessary to populate these metrics. As such, it's not looking back at history but solely looking at state, point in time. It depends on if the API in this case changes the opened date to be the re-opened date.
model/metrics/vcs.yaml
Outdated
requirement_level: required | ||
- id: metric.vcs.repository.change.time_open | ||
type: metric | ||
metric_name: vcs.repository.change.time_open |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggesting vcs.repository.change.time_to_close
or vcs.repository.change.request.duration
It might be interesting to look into CHAOSS metrics and see what would apply https://github.com/chaoss/wg-evolution/blob/main/focus-areas/code-development-process-quality/change-request-reviews.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've stayed away from aggregation metrics purposefully because getting the raw telemetry enables those calculations. Generally speaking, all these metrics come out of openo11y's leading indicators which are closely aligned, with dora.dev's capabilities.
Would you like frequency / percentage style metrics (calculations on top of existing metrics/telemetry) to be included eventually in semconv?
model/metrics/vcs.yaml
Outdated
requirement_level: required | ||
- ref: vcs.repository.ref.name | ||
requirement_level: required | ||
- id: metric.vcs.repository.count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it really necessary as a metric? I assume you'd report it daily and it'd be wiser/more frugal to report it as an event or just call GH API whenever you need this info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These metrics in general all come from a component that scrapes the GH API. The API returns a list of repositories to iterate on because to gather the other metrics through the API, one has to know the repository. I don't see why we wouldn't include it as a valid metric, but I would say it's not a required metric. Thoughts? See this comment for additional context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the component we own and created?
There are multiple ways to write it - e.g. using webhooks vs scraping.
All of the GH things are really rare - even huge monorepos get maybe 100s of PRs opened a day.
If you have a hook a report every even as it arrives (even per-comment!) it's maybe 1000s of events per day. If you exclude comments/issues (that are not yet in the scope of this PR), it'd probably be in 100s order of magnitude.
So it's cheap.
Reporting them as events allows you to include high-cardinality data, do any aggregation that you want at the query time. E.g. you might want to report contributor (or it's anonymized name) as an attribute, or number of comments the PR received - it'd allow to derive almost anything.
You'd need an event for just a few things - repo, PR, issue, comment, branch lifecycles - maybe ~10 distinct events.
Metrics, if you scrape them periodically (e.g. once an hour), would report 24 points for each metric a day and there are 12 metrics already (not even including issues or comments). It's 300 points a day.
But you already pre-aggregated everything and some information got lost - you have to change the component code to create yet another metric that users would be able to report if they wanted to.
TL;DR:
- if VCS things were reported as events
- it would still be cheap (low volume)
- very flexible (you aggregate them at query time)
- could have been reported in real time from hooks (low volume)
- you would have easy-to-understand things like "pr_opened", "pr_closed"
- you could write collector processor to aggregate them to metrics so that dashboards scale better. The cool part users write any aggregation they want or not aggregate at all.
- if VCS things are reported as metrics
- it's at least as expensive as events (low volume, but metrics are reported periodically),
- rigid (you need to envision every aggregation users want to do),
- you'd end up with a lot of very specific metrics like repo count or number of lines of code, it's a bigger mental load to understand all that's available
- reported rarely (API scrapes are more expensive then hooks)
So I strongly suggest to revisit the approach and consider using events instead of metrics.
Also it seems GH would suggest using webhooks here https://docs.github.com/en/webhooks/about-webhooks#choosing-webhooks-or-the-rest-api
If it's some 3rd party component that emits metrics and we can't change them - we had something similar in the past around legacy Kafka metrics which were not quite aligned with OTel - we ended up removing them from semantic conventions and recommended documenting them in the corresponding collector receiver:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Events are not a catch all be all. There are valid reasons to use scraping for some of the metrics here. There are certainly metrics I've found to be effective that comes from the GitHub event system.
I wrote this component & contributed it to OTEL. I am happy to make adjustments (traces from events are in the works, and I would also like to add transformed events from my WebHook Event receiver + transform processor solution in the future).
To the original question, I think it'd be ok to change some to histogram. I also don't think these are "required metrics." The purpose of my comment was to bring context & seek to understand. The metrics originate out of openo11y.dev & dora capabilities. I also don't think the "approach" of how you get the metrics should hinder this pull request. I think we can make suggestions, and some aspects of one approach may be better than others. Am I wrong in understanding that semconv doesn't purport defining how you collect a metric, only defines the metric itself? I might just not have seen an example of that yet, so forgive my ignorance if I missed it.
@christophe-kamphaus-jemmic - please sign the CLA if you haven't already. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm suggesting to change the approach and design those as events given that they would have low volume, but would be very flexible since events can include a lot of context that can be aggregated based on user needs.
See more details in the comment #1383 (comment)
change vcs.change.time_to_approval also into a histogram
Re:
I'm not a huge fan of GitHub's insights, and most folks I've talked to aren't either. They don't get much, if any, value out of those insights as they don't give enough information to ask questions and analyse. Additionally, I have concerns about including PII (contributor emails) in the metrics. Time after time I've seen companies misuse this type of data to incorrectly pass judgement against engineers. I feel so strongly about this that I wrote an observing human systems RFC 2119 specification. My take is that we should approach these metrics (and I think we are already) from the perspective of what can come from a VCS, not from what comes from GitHub only. I think the metrics we have closely align with dora.dev's capabilities. Additionally, I think these metrics could be considered as foundational to aggregation style metrics in other series CHAOSS. |
type: metric | ||
metric_name: vcs.repository.ref.lines_added | ||
brief: 'The number of lines added in a ref (branch) relative to the default branch (trunk)' | ||
instrument: gauge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a histogram?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel about the claim, here which says that we can form a histogram out of any fundamental instrument type? For "lines_delta" specifically, it sounds like an Counter-- if you were to aggregate a number of refs or repositories together, would you want to sum the lines added and removed.
Is there an attribute to distinguish "add" from "remove" in this situation? If you were to aggregate-away this attribute, would you expect to have a single number that is added+removed?
Renamed vcs.repository.change.time to vcs.repository.change.duration
73636fd
to
84b2b6a
Compare
model/vcs/metrics.yaml
Outdated
type: metric | ||
metric_name: vcs.change.duration | ||
brief: 'The time duration a change (pull request/merge request/changelist) has been in a given state.' | ||
instrument: histogram |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmacd - I think we have an instance of a gauge histogram here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice to have a concrete "gauge-histogram" example to discuss, though we haven't exactly defined this term. I could not w/o approval from a manager commit to working on protocol and specification-related questions for gauge histograms, so I'll be brief.
When a convention calls for a spatial aggregation of measurements, we expect a histogram, but we also care about the fundamental instrument involved and what instrumentation might look like in synchronous and asynchronous use-cases.
The measurement is a duration, so I believe it is impractical to model synchronous instrumentation--timing measurements are a special case that require asynchronous instrumentation. (This topic is substantially different when the quantity is not a timing measurement, for example if you ask to model the current sizes of active requests in flight, then it is possible to formulate synchronous instrumentation because the value is a constant w/ respect to time.)
One way to address the instrumentation question is to ask which asynchronous instrument you would use if there were only a single measurement: Counter, UpDownCounter, or Gauge. I would agree that timing is something you measure, not something you count, so I could model the semantic proposed above with an async gauge:
func Observe() {
for _, pr := range pullRequests {
asyncGauge.Observe(pr.Duration(), attribute.Int("number", pr.Number())
}
}
Now, the aggregation you are asking for us to erase the "number" attribute from multiple series, yielding a single histogram of Gauge measurements.
If the question had been about active amounts of memory used by running tasks, let's say, then the fundamental instrument would be an UpDownCounter not a gauge--the aggregation yields a histogram of cumulative, non-monotonic Sum measurements. If the question had been about requests made by active tasks, then the fundamental instrument would be Counter--the aggregation yields a histogram of cumulative monotonic sum measurements.
It's OK with me to call the result of every spatial aggregation a "gauge histogram", but I think it would be better if we preserved the fundamental instrument that is notionally involved. In fact, why not define semantic conventions in terms of the fundamental instrument and allow the user to request histogram aggregation, not specify the semantic convention as an aggregation.
IOW, in this case the semantic conventional instrument is Gauge. You don't have to add Histogram, it's presumed you can aggregate multiple gauges into a histogram.
Also split these metrics for ref and change because the semantic meaning is different (what we do we compare these against)
I have tried to summarize the discussion we had last Monday in the Semantic Convention SIG as best I can here below: Summary of Semantic Convention SIG meeting 2024-10-14https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md In this PR we do not do that: Gauges best correspond to the data model of the metrics. It accurately represents the data (with no aggregation). Let's consider usability (eg existing dashboard, recordings of demo) @adrielp. Conclusion by @jsuereth : stick with gauges for now and sort out the gauge histogram data model later -> create SemConv CICD issue, link to existing gauge histogram issues in spec. |
Fixes #1372
Changes
This PR adds the VCS metrics from the GitHub Receiver to Semantic Conventions.
These metrics are based on https://github.com/adrielp/opentelemetry-collector-contrib/blob/903eec5382352e4d663fadd2dc7f1cc1e37f62fd/receiver/githubreceiver/documentation.md
Merge requirement checklist
[chore]
TODO
docs/cicd/cicd-metrics.md
after receiving changes from Proposal: Unify metric templates #1411