Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VCS metrics from Github receiver #1383

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

christophe-kamphaus-jemmic
Copy link

@christophe-kamphaus-jemmic christophe-kamphaus-jemmic commented Sep 3, 2024

Fixes #1372

Changes

This PR adds the VCS metrics from the GitHub Receiver to Semantic Conventions.
These metrics are based on https://github.com/adrielp/opentelemetry-collector-contrib/blob/903eec5382352e4d663fadd2dc7f1cc1e37f62fd/receiver/githubreceiver/documentation.md

Merge requirement checklist

TODO

Copy link

linux-foundation-easycla bot commented Sep 3, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

docs/README.md Show resolved Hide resolved
docs/README.md Show resolved Hide resolved
model/registry/vcs.yaml Outdated Show resolved Hide resolved
model/registry/vcs.yaml Outdated Show resolved Hide resolved
model/metrics/vcs.yaml Outdated Show resolved Hide resolved
docs/cicd/cicd-metrics.md Outdated Show resolved Hide resolved
model/metrics/vcs.yaml Outdated Show resolved Hide resolved
model/metrics/vcs.yaml Outdated Show resolved Hide resolved
type: metric
metric_name: vcs.repository.change.count
brief: 'The number of changes (pull requests) in a repository, categorized by their state (either open or merged)'
instrument: gauge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably interesting to have two separate metrics - one is updadowncounter for number of open PRs and another one (counter or maybe up-down-too since I can reopen PRs) for closed PRs depending on the state (merged or not merged)

Otherwise we're having very different scales and different types of things we measure (almost monotonically incrementing and eventually big number of closed/merged PRs vs probably very small and constant (or so) number of open ones).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion in the CICD working group, we think this would be best served by a histogram and will make that change pending any objections.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some further thought and reading Instrument selection, updowncounter seems the best instrument for vcs.repository.ref.count and vcs.repository.change.count metrics since the measurements are additive over the attributes.
I made this change in aab9f90.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmolkova Why is it an issue if the metric has vastly different values for the different values of the vcs.change.state attribute (eg. open, closed, merged)?
We can filter by this attribute if we want to consider open vs closed/merged PRs separately.

type: metric
metric_name: vcs.repository.change.time_open
brief: 'The amount of time a change (pull request) has been open'
instrument: gauge
Copy link
Contributor

@lmolkova lmolkova Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be a histogram.

BTW if it is, then number of closed/merged PRs can be derived from it and all that's necessary is the number of open PRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmolkova - would it that there would be two histogram buckets based on state? ie. a bucket for opened pre, and a bucket for closed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

histograms have buckets per combination of attribute values, i.e. you can see percentiles per state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'd be fine with this being a histogram bucket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion in the CICD working group, we think this would be best served by a histogram and will make that change pending any objections.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 5a9ffe7 I have combined vcs.repository.change.time_open and vcs.repository.change.time_to_merge to be instrument: histogram in metric vcs.repository.change.time.

I distinguish them by the added vcs.change.state attribute.

Is that better or should we keep them as separate metrics?

- id: metric.vcs.repository.change.time_open
type: metric
metric_name: vcs.repository.change.time_open
brief: 'The amount of time a change (pull request) has been open'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it cumulative? E.g. if I open and then close and then reopen, what would it count??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting question because it shows the gap between interval based scraping and event counting. These metrics originate from a scraping, that has to run through all the queries necessary to populate these metrics. As such, it's not looking back at history but solely looking at state, point in time. It depends on if the API in this case changes the opened date to be the re-opened date.

requirement_level: required
- id: metric.vcs.repository.change.time_open
type: metric
metric_name: vcs.repository.change.time_open
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggesting vcs.repository.change.time_to_close or vcs.repository.change.request.duration

It might be interesting to look into CHAOSS metrics and see what would apply https://github.com/chaoss/wg-evolution/blob/main/focus-areas/code-development-process-quality/change-request-reviews.md

Copy link
Contributor

@adrielp adrielp Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've stayed away from aggregation metrics purposefully because getting the raw telemetry enables those calculations. Generally speaking, all these metrics come out of openo11y's leading indicators which are closely aligned, with dora.dev's capabilities.

Would you like frequency / percentage style metrics (calculations on top of existing metrics/telemetry) to be included eventually in semconv?

model/metrics/vcs.yaml Outdated Show resolved Hide resolved
requirement_level: required
- ref: vcs.repository.ref.name
requirement_level: required
- id: metric.vcs.repository.count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it really necessary as a metric? I assume you'd report it daily and it'd be wiser/more frugal to report it as an event or just call GH API whenever you need this info

Copy link
Contributor

@adrielp adrielp Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These metrics in general all come from a component that scrapes the GH API. The API returns a list of repositories to iterate on because to gather the other metrics through the API, one has to know the repository. I don't see why we wouldn't include it as a valid metric, but I would say it's not a required metric. Thoughts? See this comment for additional context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the component we own and created?
There are multiple ways to write it - e.g. using webhooks vs scraping.

All of the GH things are really rare - even huge monorepos get maybe 100s of PRs opened a day.

If you have a hook a report every even as it arrives (even per-comment!) it's maybe 1000s of events per day. If you exclude comments/issues (that are not yet in the scope of this PR), it'd probably be in 100s order of magnitude.
So it's cheap.

Reporting them as events allows you to include high-cardinality data, do any aggregation that you want at the query time. E.g. you might want to report contributor (or it's anonymized name) as an attribute, or number of comments the PR received - it'd allow to derive almost anything.

You'd need an event for just a few things - repo, PR, issue, comment, branch lifecycles - maybe ~10 distinct events.

Metrics, if you scrape them periodically (e.g. once an hour), would report 24 points for each metric a day and there are 12 metrics already (not even including issues or comments). It's 300 points a day.

But you already pre-aggregated everything and some information got lost - you have to change the component code to create yet another metric that users would be able to report if they wanted to.


TL;DR:

  • if VCS things were reported as events
    • it would still be cheap (low volume)
    • very flexible (you aggregate them at query time)
    • could have been reported in real time from hooks (low volume)
    • you would have easy-to-understand things like "pr_opened", "pr_closed"
    • you could write collector processor to aggregate them to metrics so that dashboards scale better. The cool part users write any aggregation they want or not aggregate at all.
  • if VCS things are reported as metrics
    • it's at least as expensive as events (low volume, but metrics are reported periodically),
    • rigid (you need to envision every aggregation users want to do),
    • you'd end up with a lot of very specific metrics like repo count or number of lines of code, it's a bigger mental load to understand all that's available
    • reported rarely (API scrapes are more expensive then hooks)

So I strongly suggest to revisit the approach and consider using events instead of metrics.

Also it seems GH would suggest using webhooks here https://docs.github.com/en/webhooks/about-webhooks#choosing-webhooks-or-the-rest-api

If it's some 3rd party component that emits metrics and we can't change them - we had something similar in the past around legacy Kafka metrics which were not quite aligned with OTel - we ended up removing them from semantic conventions and recommended documenting them in the corresponding collector receiver:

Copy link
Contributor

@adrielp adrielp Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Events are not a catch all be all. There are valid reasons to use scraping for some of the metrics here. There are certainly metrics I've found to be effective that comes from the GitHub event system.

I wrote this component & contributed it to OTEL. I am happy to make adjustments (traces from events are in the works, and I would also like to add transformed events from my WebHook Event receiver + transform processor solution in the future).

To the original question, I think it'd be ok to change some to histogram. I also don't think these are "required metrics." The purpose of my comment was to bring context & seek to understand. The metrics originate out of openo11y.dev & dora capabilities. I also don't think the "approach" of how you get the metrics should hinder this pull request. I think we can make suggestions, and some aspects of one approach may be better than others. Am I wrong in understanding that semconv doesn't purport defining how you collect a metric, only defines the metric itself? I might just not have seen an example of that yet, so forgive my ignorance if I missed it.

@adrielp
Copy link
Contributor

adrielp commented Sep 5, 2024

CLA Not Signed

@christophe-kamphaus-jemmic - please sign the CLA if you haven't already.

model/registry/vcs.yaml Outdated Show resolved Hide resolved
model/registry/vcs.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@lmolkova lmolkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting to change the approach and design those as events given that they would have low volume, but would be very flexible since events can include a lot of context that can be aggregated based on user needs.

See more details in the comment #1383 (comment)

@adrielp
Copy link
Contributor

adrielp commented Sep 19, 2024

Re:

Review statistics provided by Github (Pulse / Insights).
Is there overlap with the metrics defined here?
....

I'm not a huge fan of GitHub's insights, and most folks I've talked to aren't either. They don't get much, if any, value out of those insights as they don't give enough information to ask questions and analyse. Additionally, I have concerns about including PII (contributor emails) in the metrics. Time after time I've seen companies misuse this type of data to incorrectly pass judgement against engineers. I feel so strongly about this that I wrote an observing human systems RFC 2119 specification.

My take is that we should approach these metrics (and I think we are already) from the perspective of what can come from a VCS, not from what comes from GitHub only. I think the metrics we have closely align with dora.dev's capabilities. Additionally, I think these metrics could be considered as foundational to aggregation style metrics in other series CHAOSS.

model/vcs/registry.yaml Show resolved Hide resolved
model/vcs/metrics.yaml Outdated Show resolved Hide resolved
model/vcs/metrics.yaml Show resolved Hide resolved
model/vcs/metrics.yaml Outdated Show resolved Hide resolved
model/vcs/metrics.yaml Outdated Show resolved Hide resolved
model/vcs/metrics.yaml Outdated Show resolved Hide resolved
model/vcs/metrics.yaml Outdated Show resolved Hide resolved
model/vcs/metrics.yaml Outdated Show resolved Hide resolved
model/vcs/metrics.yaml Show resolved Hide resolved
model/vcs/registry.yaml Show resolved Hide resolved
type: metric
metric_name: vcs.repository.ref.lines_added
brief: 'The number of lines added in a ref (branch) relative to the default branch (trunk)'
instrument: gauge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a histogram?

Copy link
Contributor

@jmacd jmacd Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about the claim, here which says that we can form a histogram out of any fundamental instrument type? For "lines_delta" specifically, it sounds like an Counter-- if you were to aggregate a number of refs or repositories together, would you want to sum the lines added and removed.

Is there an attribute to distinguish "add" from "remove" in this situation? If you were to aggregate-away this attribute, would you expect to have a single number that is added+removed?

model/vcs/metrics.yaml Outdated Show resolved Hide resolved
model/vcs/metrics.yaml Outdated Show resolved Hide resolved
type: metric
metric_name: vcs.change.duration
brief: 'The time duration a change (pull request/merge request/changelist) has been in a given state.'
instrument: histogram
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmacd - I think we have an instance of a gauge histogram here.

Copy link
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice to have a concrete "gauge-histogram" example to discuss, though we haven't exactly defined this term. I could not w/o approval from a manager commit to working on protocol and specification-related questions for gauge histograms, so I'll be brief.

When a convention calls for a spatial aggregation of measurements, we expect a histogram, but we also care about the fundamental instrument involved and what instrumentation might look like in synchronous and asynchronous use-cases.

The measurement is a duration, so I believe it is impractical to model synchronous instrumentation--timing measurements are a special case that require asynchronous instrumentation. (This topic is substantially different when the quantity is not a timing measurement, for example if you ask to model the current sizes of active requests in flight, then it is possible to formulate synchronous instrumentation because the value is a constant w/ respect to time.)

One way to address the instrumentation question is to ask which asynchronous instrument you would use if there were only a single measurement: Counter, UpDownCounter, or Gauge. I would agree that timing is something you measure, not something you count, so I could model the semantic proposed above with an async gauge:

func Observe() {
   for _, pr := range pullRequests {
     asyncGauge.Observe(pr.Duration(), attribute.Int("number", pr.Number())
  }
}

Now, the aggregation you are asking for us to erase the "number" attribute from multiple series, yielding a single histogram of Gauge measurements.

If the question had been about active amounts of memory used by running tasks, let's say, then the fundamental instrument would be an UpDownCounter not a gauge--the aggregation yields a histogram of cumulative, non-monotonic Sum measurements. If the question had been about requests made by active tasks, then the fundamental instrument would be Counter--the aggregation yields a histogram of cumulative monotonic sum measurements.

It's OK with me to call the result of every spatial aggregation a "gauge histogram", but I think it would be better if we preserved the fundamental instrument that is notionally involved. In fact, why not define semantic conventions in terms of the fundamental instrument and allow the user to request histogram aggregation, not specify the semantic convention as an aggregation.

IOW, in this case the semantic conventional instrument is Gauge. You don't have to add Histogram, it's presumed you can aggregate multiple gauges into a histogram.

@christophe-kamphaus-jemmic
Copy link
Author

I have tried to summarize the discussion we had last Monday in the Semantic Convention SIG as best I can here below:

Summary of Semantic Convention SIG meeting 2024-10-14

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md
Histograms aggregate multiple events as they occur. For each reporting time they return the aggregated distribution.

In this PR we do not do that:
If we reported the metrics in question as histogram we would sample all data points (ie. PR counts and duration per change state) at each reporting time and the counts could decrease (ie if a PR moves from one state to another).
We would sample the entire distribution for each reporting time.
Cumulative counts would be incorrect since we would consider all PRs as new events for each reporting.

Gauges best correspond to the data model of the metrics. It accurately represents the data (with no aggregation).
A drawback is that the number of gauges scales with the number of PRs instead of having a single histogram aggregating that data.
Ie we have N gauges instead of 1 gauge histogram.
There might also be an usability issue to transform multiple gauges into a histogram (eg in Prometheus / Grafana).
For histograms Prometheus has several functions available (eg. percentiles, 99th percentile, 50th percentile, averages, counts, sums). We can do heatmaps for histograms.
For N gauges we don't get the usability of having a histogram.

Let's consider usability (eg existing dashboard, recordings of demo) @adrielp.

Conclusion by @jsuereth : stick with gauges for now and sort out the gauge histogram data model later -> create SemConv CICD issue, link to existing gauge histogram issues in spec.
@lmolkova : Add comment somewhere in the Markdown or Yaml, saying that instrument type might change, and this will be a prerequisite to stability, but for now it's good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Add VCS metrics from GitHub receiver in collector contrib to Semantic Conventions
9 participants