Add VCS metrics from Github receiver #1383

christophe-kamphaus-jemmic · 2024-09-03T20:59:06Z

Changes

This PR adds the VCS metrics from the GitHub Receiver to Semantic Conventions.
These metrics are based on https://github.com/adrielp/opentelemetry-collector-contrib/blob/903eec5382352e4d663fadd2dc7f1cc1e37f62fd/receiver/githubreceiver/documentation.md

Merge requirement checklist

CONTRIBUTING.md guidelines followed.
- Sign CLA
Change log entry added, according to the guidelines in When to add a changelog entry.
- If your PR does not need a change log, start the PR title with [chore]
schema-next.yaml updated with changes to existing conventions.

TODO

Update docs/cicd/cicd-metrics.md after receiving changes from Proposal: Unify metric templates #1411

https://github.com/adrielp/opentelemetry-collector-contrib/blob/903eec5382352e4d663fadd2dc7f1cc1e37f62fd/receiver/githubreceiver/documentation.md

linux-foundation-easycla · 2024-09-03T20:59:11Z

The committers listed above are authorized under a signed CLA.

✅ login: christophe-kamphaus-jemmic / name: Christophe Kamphaus (347b15b, f712312, 592483f, aab9f90, 30c8fdb, 163e498, e174ef3, 344eeed, 7d114d3, 9bd4a95, 15fe1fb, a5eff73, 41543cd, a148a4e, cfcdaf1, afaa1f5, 84b2b6a, 4d8e3a9, d2ce62f, b92cc08, c232041, 2da96a5, 3e122e8, 5a9ffe7, 7f35251, ef91dfa, 44a1c81, 7107911, b5c4660, dafdad7, 8c2e8a7, 700412a, cfffb4f, 0e02a47, 238c764, 536a2a7, 513ae8b, 1a89934)

docs/README.md

model/registry/vcs.yaml

model/metrics/vcs.yaml

docs/cicd/cicd-metrics.md

model/metrics/vcs.yaml

lmolkova · 2024-09-04T22:11:18Z

model/metrics/vcs.yaml

+    type: metric
+    metric_name: vcs.repository.change.count
+    brief: 'The number of changes (pull requests) in a repository, categorized by their state (either open or merged)'
+    instrument: gauge


it's probably interesting to have two separate metrics - one is updadowncounter for number of open PRs and another one (counter or maybe up-down-too since I can reopen PRs) for closed PRs depending on the state (merged or not merged)

Otherwise we're having very different scales and different types of things we measure (almost monotonically incrementing and eventually big number of closed/merged PRs vs probably very small and constant (or so) number of open ones).

After discussion in the CICD working group, we think this would be best served by a histogram and will make that change pending any objections.

After some further thought and reading Instrument selection, updowncounter seems the best instrument for vcs.repository.ref.count and vcs.repository.change.count metrics since the measurements are additive over the attributes.
I made this change in aab9f90.

@lmolkova Why is it an issue if the metric has vastly different values for the different values of the vcs.change.state attribute (eg. open, closed, merged)?
We can filter by this attribute if we want to consider open vs closed/merged PRs separately.

lmolkova · 2024-09-04T22:11:59Z

model/metrics/vcs.yaml

+    type: metric
+    metric_name: vcs.repository.change.time_open
+    brief: 'The amount of time a change (pull request) has been open'
+    instrument: gauge


it should be a histogram.

BTW if it is, then number of closed/merged PRs can be derived from it and all that's necessary is the number of open PRs.

@lmolkova - would it that there would be two histogram buckets based on state? ie. a bucket for opened pre, and a bucket for closed?

histograms have buckets per combination of attribute values, i.e. you can see percentiles per state.

I see. I'd be fine with this being a histogram bucket.

After discussion in the CICD working group, we think this would be best served by a histogram and will make that change pending any objections.

In 5a9ffe7 I have combined vcs.repository.change.time_open and vcs.repository.change.time_to_merge to be instrument: histogram in metric vcs.repository.change.time.

I distinguish them by the added vcs.change.state attribute.

Is that better or should we keep them as separate metrics?

lmolkova · 2024-09-04T22:12:32Z

model/metrics/vcs.yaml

+  - id: metric.vcs.repository.change.time_open
+    type: metric
+    metric_name: vcs.repository.change.time_open
+    brief: 'The amount of time a change (pull request) has been open'


is it cumulative? E.g. if I open and then close and then reopen, what would it count??

That's an interesting question because it shows the gap between interval based scraping and event counting. These metrics originate from a scraping, that has to run through all the queries necessary to populate these metrics. As such, it's not looking back at history but solely looking at state, point in time. It depends on if the API in this case changes the opened date to be the re-opened date.

lmolkova · 2024-09-04T22:15:15Z

model/metrics/vcs.yaml

+        requirement_level: required
+  - id: metric.vcs.repository.change.time_open
+    type: metric
+    metric_name: vcs.repository.change.time_open


suggesting vcs.repository.change.time_to_close or vcs.repository.change.request.duration

It might be interesting to look into CHAOSS metrics and see what would apply https://github.com/chaoss/wg-evolution/blob/main/focus-areas/code-development-process-quality/change-request-reviews.md

I've stayed away from aggregation metrics purposefully because getting the raw telemetry enables those calculations. Generally speaking, all these metrics come out of openo11y's leading indicators which are closely aligned, with dora.dev's capabilities.

Would you like frequency / percentage style metrics (calculations on top of existing metrics/telemetry) to be included eventually in semconv?

model/metrics/vcs.yaml

lmolkova · 2024-09-04T22:19:13Z

model/metrics/vcs.yaml

+        requirement_level: required
+      - ref: vcs.repository.ref.name
+        requirement_level: required
+  - id: metric.vcs.repository.count


is it really necessary as a metric? I assume you'd report it daily and it'd be wiser/more frugal to report it as an event or just call GH API whenever you need this info

These metrics in general all come from a component that scrapes the GH API. The API returns a list of repositories to iterate on because to gather the other metrics through the API, one has to know the repository. I don't see why we wouldn't include it as a valid metric, but I would say it's not a required metric. Thoughts? See this comment for additional context.

Is this the component we own and created?
There are multiple ways to write it - e.g. using webhooks vs scraping.

All of the GH things are really rare - even huge monorepos get maybe 100s of PRs opened a day.

If you have a hook a report every even as it arrives (even per-comment!) it's maybe 1000s of events per day. If you exclude comments/issues (that are not yet in the scope of this PR), it'd probably be in 100s order of magnitude.
So it's cheap.

Reporting them as events allows you to include high-cardinality data, do any aggregation that you want at the query time. E.g. you might want to report contributor (or it's anonymized name) as an attribute, or number of comments the PR received - it'd allow to derive almost anything.

You'd need an event for just a few things - repo, PR, issue, comment, branch lifecycles - maybe ~10 distinct events.

Metrics, if you scrape them periodically (e.g. once an hour), would report 24 points for each metric a day and there are 12 metrics already (not even including issues or comments). It's 300 points a day.

But you already pre-aggregated everything and some information got lost - you have to change the component code to create yet another metric that users would be able to report if they wanted to.

TL;DR:

if VCS things were reported as events

it would still be cheap (low volume)

very flexible (you aggregate them at query time)

could have been reported in real time from hooks (low volume)

you would have easy-to-understand things like "pr_opened", "pr_closed"

you could write collector processor to aggregate them to metrics so that dashboards scale better. The cool part users write any aggregation they want or not aggregate at all.

if VCS things are reported as metrics

it's at least as expensive as events (low volume, but metrics are reported periodically),

rigid (you need to envision every aggregation users want to do),

you'd end up with a lot of very specific metrics like repo count or number of lines of code, it's a bigger mental load to understand all that's available

reported rarely (API scrapes are more expensive then hooks)

So I strongly suggest to revisit the approach and consider using events instead of metrics.

Also it seems GH would suggest using webhooks here https://docs.github.com/en/webhooks/about-webhooks#choosing-webhooks-or-the-rest-api

If it's some 3rd party component that emits metrics and we can't change them - we had something similar in the past around legacy Kafka metrics which were not quite aligned with OTel - we ended up removing them from semantic conventions and recommended documenting them in the corresponding collector receiver:

Move existing Kafka metrics to a JMX specific namespace #269

chore: adding kafka brokers metrics #196

new kafka brokers metrics added #113

Events are not a catch all be all. There are valid reasons to use scraping for some of the metrics here. There are certainly metrics I've found to be effective that comes from the GitHub event system.

I wrote this component & contributed it to OTEL. I am happy to make adjustments (traces from events are in the works, and I would also like to add transformed events from my WebHook Event receiver + transform processor solution in the future).

To the original question, I think it'd be ok to change some to histogram. I also don't think these are "required metrics." The purpose of my comment was to bring context & seek to understand. The metrics originate out of openo11y.dev & dora capabilities. I also don't think the "approach" of how you get the metrics should hinder this pull request. I think we can make suggestions, and some aspects of one approach may be better than others. Am I wrong in understanding that semconv doesn't purport defining how you collect a metric, only defines the metric itself? I might just not have seen an example of that yet, so forgive my ignorance if I missed it.

adrielp · 2024-09-05T01:19:36Z

❌ - login: @christophe-kamphaus-jemmic / name: Christophe Kamphaus . The commit (44a1c81, 4d8e3a9, 592483f, ef91dfa, 238c764, 9bd4a95, 8c2e8a7, f712312, 700412a) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please submit a support request ticket.

@christophe-kamphaus-jemmic - please sign the CLA if you haven't already.

model/registry/vcs.yaml

docs/attributes-registry/vcs.md

model/registry/vcs.yaml

lmolkova

I'm suggesting to change the approach and design those as events given that they would have low volume, but would be very flexible since events can include a lot of context that can be aggregated based on user needs.

See more details in the comment #1383 (comment)

change vcs.change.time_to_approval also into a histogram

adrielp · 2024-09-19T13:55:03Z

Re:

Review statistics provided by Github (Pulse / Insights).
Is there overlap with the metrics defined here?
....

I'm not a huge fan of GitHub's insights, and most folks I've talked to aren't either. They don't get much, if any, value out of those insights as they don't give enough information to ask questions and analyse. Additionally, I have concerns about including PII (contributor emails) in the metrics. Time after time I've seen companies misuse this type of data to incorrectly pass judgement against engineers. I feel so strongly about this that I wrote an observing human systems RFC 2119 specification.

My take is that we should approach these metrics (and I think we are already) from the perspective of what can come from a VCS, not from what comes from GitHub only. I think the metrics we have closely align with dora.dev's capabilities. Additionally, I think these metrics could be considered as foundational to aggregation style metrics in other series CHAOSS.

model/vcs/registry.yaml

model/vcs/metrics.yaml

model/vcs/registry.yaml

lmolkova · 2024-09-22T18:51:29Z

model/vcs/metrics.yaml

+    type: metric
+    metric_name: vcs.repository.ref.lines_added
+    brief: 'The number of lines added in a ref (branch) relative to the default branch (trunk)'
+    instrument: gauge


should this be a histogram?

How do you feel about the claim, here which says that we can form a histogram out of any fundamental instrument type? For "lines_delta" specifically, it sounds like an Counter-- if you were to aggregate a number of refs or repositories together, would you want to sum the lines added and removed.

Is there an attribute to distinguish "add" from "remove" in this situation? If you were to aggregate-away this attribute, would you expect to have a single number that is added+removed?

model/vcs/metrics.yaml

Renamed vcs.repository.change.time to vcs.repository.change.duration

jsuereth · 2024-10-07T15:44:36Z

model/vcs/metrics.yaml

+    type: metric
+    metric_name: vcs.change.duration
+    brief: 'The time duration a change (pull request/merge request/changelist) has been in a given state.'
+    instrument: histogram


@jmacd - I think we have an instance of a gauge histogram here.

jmacd

It's nice to have a concrete "gauge-histogram" example to discuss, though we haven't exactly defined this term. I could not w/o approval from a manager commit to working on protocol and specification-related questions for gauge histograms, so I'll be brief.

When a convention calls for a spatial aggregation of measurements, we expect a histogram, but we also care about the fundamental instrument involved and what instrumentation might look like in synchronous and asynchronous use-cases.

The measurement is a duration, so I believe it is impractical to model synchronous instrumentation--timing measurements are a special case that require asynchronous instrumentation. (This topic is substantially different when the quantity is not a timing measurement, for example if you ask to model the current sizes of active requests in flight, then it is possible to formulate synchronous instrumentation because the value is a constant w/ respect to time.)

One way to address the instrumentation question is to ask which asynchronous instrument you would use if there were only a single measurement: Counter, UpDownCounter, or Gauge. I would agree that timing is something you measure, not something you count, so I could model the semantic proposed above with an async gauge:

func Observe() {
   for _, pr := range pullRequests {
     asyncGauge.Observe(pr.Duration(), attribute.Int("number", pr.Number())
  }
}

Now, the aggregation you are asking for us to erase the "number" attribute from multiple series, yielding a single histogram of Gauge measurements.

If the question had been about active amounts of memory used by running tasks, let's say, then the fundamental instrument would be an UpDownCounter not a gauge--the aggregation yields a histogram of cumulative, non-monotonic Sum measurements. If the question had been about requests made by active tasks, then the fundamental instrument would be Counter--the aggregation yields a histogram of cumulative monotonic sum measurements.

It's OK with me to call the result of every spatial aggregation a "gauge histogram", but I think it would be better if we preserved the fundamental instrument that is notionally involved. In fact, why not define semantic conventions in terms of the fundamental instrument and allow the user to request histogram aggregation, not specify the semantic convention as an aggregation.

IOW, in this case the semantic conventional instrument is Gauge. You don't have to add Histogram, it's presumed you can aggregate multiple gauges into a histogram.

Also split these metrics for ref and change because the semantic meaning is different (what we do we compare these against)

christophe-kamphaus-jemmic · 2024-10-16T17:09:44Z

I have tried to summarize the discussion we had last Monday in the Semantic Convention SIG as best I can here below:

Summary of Semantic Convention SIG meeting 2024-10-14

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md
Histograms aggregate multiple events as they occur. For each reporting time they return the aggregated distribution.

In this PR we do not do that:
If we reported the metrics in question as histogram we would sample all data points (ie. PR counts and duration per change state) at each reporting time and the counts could decrease (ie if a PR moves from one state to another).
We would sample the entire distribution for each reporting time.
Cumulative counts would be incorrect since we would consider all PRs as new events for each reporting.

Gauges best correspond to the data model of the metrics. It accurately represents the data (with no aggregation).
A drawback is that the number of gauges scales with the number of PRs instead of having a single histogram aggregating that data.
Ie we have N gauges instead of 1 gauge histogram.
There might also be an usability issue to transform multiple gauges into a histogram (eg in Prometheus / Grafana).
For histograms Prometheus has several functions available (eg. percentiles, 99th percentile, 50th percentile, averages, counts, sums). We can do heatmaps for histograms.
For N gauges we don't get the usability of having a histogram.

Let's consider usability (eg existing dashboard, recordings of demo) @adrielp.

Conclusion by @jsuereth : stick with gauges for now and sort out the gauge histogram data model later -> create SemConv CICD issue, link to existing gauge histogram issues in spec.
@lmolkova : Add comment somewhere in the Markdown or Yaml, saying that instrument type might change, and this will be a prerequisite to stability, but for now it's good.

christophe-kamphaus-jemmic added 9 commits September 3, 2024 22:38

[cicd] add first cicd metric to registry, created cicd-metrics doc

44a1c81

[cicd] link to cicd doc from parent readme

700412a

[cicd] fix typo vsc -> vcs

4d8e3a9

[cicd] fix toc

9bd4a95

[cicd] use {change} as metric unit

592483f

[cicd] formatting and rename metrics section

238c764

[cicd] add remaining metrics from githubreceiver

ef91dfa

https://github.com/adrielp/opentelemetry-collector-contrib/blob/903eec5382352e4d663fadd2dc7f1cc1e37f62fd/receiver/githubreceiver/documentation.md

[cicd] fix markdown code quotation

f712312

[cicd] add changelog entry

8c2e8a7

christophe-kamphaus-jemmic requested review from a team September 3, 2024 20:59

github-actions bot assigned arminru Sep 3, 2024

christophe-kamphaus-jemmic commented Sep 3, 2024

View reviewed changes

lmolkova reviewed Sep 4, 2024

View reviewed changes

model/metrics/vcs.yaml Outdated Show resolved Hide resolved

lmolkova reviewed Sep 4, 2024

View reviewed changes

adrielp reviewed Sep 5, 2024

View reviewed changes

model/registry/vcs.yaml Outdated Show resolved Hide resolved

christophe-kamphaus-jemmic added 2 commits September 6, 2024 09:31

[cicd] add closed state to vcs.change.state

a148a4e

[cicd] improved brief for vcs.change.state

347b15b

christophe-kamphaus-jemmic commented Sep 6, 2024

View reviewed changes

docs/attributes-registry/vcs.md Outdated Show resolved Hide resolved

adrielp reviewed Sep 6, 2024

View reviewed changes

model/registry/vcs.yaml Outdated Show resolved Hide resolved

lmolkova previously requested changes Sep 8, 2024

View reviewed changes

christophe-kamphaus-jemmic added 2 commits September 15, 2024 17:39

[cicd] update vcs.change.state briefs

afaa1f5

[cicd] combine vcs.change.time_* metrics into one histogram metric

5a9ffe7

change vcs.change.time_to_approval also into a histogram

magnusbaeck reviewed Sep 19, 2024

View reviewed changes

model/vcs/registry.yaml Show resolved Hide resolved

model/vcs/metrics.yaml Outdated Show resolved Hide resolved

model/vcs/metrics.yaml Show resolved Hide resolved

lmolkova reviewed Sep 22, 2024

View reviewed changes

christophe-kamphaus-jemmic added 3 commits October 2, 2024 12:35

Merge branch 'main' into 1372-vcs-metrics

15fe1fb

Merge branch 'main' into 1372-vcs-metrics

30c8fdb

[cicd] Add /docs/cicd/ to CODEOWNERS

344eeed

christophe-kamphaus-jemmic requested review from a team as code owners October 3, 2024 09:00

christophe-kamphaus-jemmic added 5 commits October 3, 2024 11:50

[cicd] rename metric and improve description

cfcdaf1

Renamed vcs.repository.change.time to vcs.repository.change.duration

[cicd] add (PR/MR) to all vcs metric briefs relating to changes

e174ef3

[cicd] rephrase examples in brief to be inclusive

7d114d3

[cicd] make repository count an updowncounter

b92cc08

[cicd] add metric requirement levels

84b2b6a

christophe-kamphaus-jemmic force-pushed the 1372-vcs-metrics branch from 73636fd to 84b2b6a Compare October 3, 2024 15:00

christophe-kamphaus-jemmic added 5 commits October 3, 2024 17:20

[cicd] add wip state to vcs.change.state

163e498

[cicd] add changelist to briefs for change

d2ce62f

[cicd] rename vcs.repository.change.* to vcs.change.*

536a2a7

[cicd] rename vcs.repository.ref.* to vcs.ref.*

cfffb4f

[cicd] follow procedure for renaming attributes

0e02a47

This was referenced Oct 5, 2024

Discuss adding vcs.repository.name as an additional attribute #1453

Open

Add organization name to attribute list when entity definition becomes clearer #1452

Open

jsuereth reviewed Oct 7, 2024

View reviewed changes

jmacd reviewed Oct 9, 2024

View reviewed changes

christophe-kamphaus-jemmic added 4 commits October 10, 2024 18:10

Merge branch 'main' into 1372-vcs-metrics

1a89934

[cicd] change instrument type from "gauge histogram" back to gauge

b5c4660

[cicd] combine lines added/removed and revisions ahead/behind metrics

dafdad7

Also split these metrics for ref and change because the semantic meaning is different (what we do we compare these against)

[cicd] fix minor brief issues

3e122e8

Merge branch 'main' into 1372-vcs-metrics

513ae8b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VCS metrics from Github receiver #1383

Add VCS metrics from Github receiver #1383

christophe-kamphaus-jemmic commented Sep 3, 2024 •

edited

Loading

linux-foundation-easycla bot commented Sep 3, 2024 •

edited

Loading

lmolkova Sep 4, 2024

adrielp Sep 12, 2024

christophe-kamphaus-jemmic Sep 15, 2024

christophe-kamphaus-jemmic Oct 3, 2024

lmolkova Sep 4, 2024 •

edited

Loading

adrielp Sep 7, 2024

lmolkova Sep 8, 2024

adrielp Sep 10, 2024

adrielp Sep 12, 2024

christophe-kamphaus-jemmic Sep 15, 2024 •

edited

Loading

lmolkova Sep 4, 2024

adrielp Sep 7, 2024

lmolkova Sep 4, 2024

adrielp Sep 10, 2024 •

edited

Loading

lmolkova Sep 4, 2024

adrielp Sep 6, 2024 •

edited

Loading

lmolkova Sep 8, 2024

adrielp Sep 9, 2024 •

edited

Loading

adrielp commented Sep 5, 2024

lmolkova left a comment

adrielp commented Sep 19, 2024

lmolkova Sep 22, 2024

jmacd Oct 16, 2024 •

edited

Loading

jsuereth Oct 7, 2024

jmacd left a comment •

edited

Loading

christophe-kamphaus-jemmic commented Oct 16, 2024

Add VCS metrics from Github receiver #1383

Are you sure you want to change the base?

Add VCS metrics from Github receiver #1383

Conversation

christophe-kamphaus-jemmic commented Sep 3, 2024 • edited Loading

Changes

Merge requirement checklist

TODO

linux-foundation-easycla bot commented Sep 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

christophe-kamphaus-jemmic Sep 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrielp Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrielp Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

You'd need an event for just a few things - repo, PR, issue, comment, branch lifecycles - maybe ~10 distinct events.

adrielp Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

adrielp commented Sep 5, 2024

lmolkova left a comment

Choose a reason for hiding this comment

adrielp commented Sep 19, 2024

Choose a reason for hiding this comment

jmacd Oct 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmacd left a comment • edited Loading

Choose a reason for hiding this comment

christophe-kamphaus-jemmic commented Oct 16, 2024

Summary of Semantic Convention SIG meeting 2024-10-14

christophe-kamphaus-jemmic commented Sep 3, 2024 •

edited

Loading

linux-foundation-easycla bot commented Sep 3, 2024 •

edited

Loading

lmolkova Sep 4, 2024 •

edited

Loading

christophe-kamphaus-jemmic Sep 15, 2024 •

edited

Loading

adrielp Sep 10, 2024 •

edited

Loading

adrielp Sep 6, 2024 •

edited

Loading

adrielp Sep 9, 2024 •

edited

Loading

jmacd Oct 16, 2024 •

edited

Loading

jmacd left a comment •

edited

Loading