Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for directly generating span graphs #745

Merged
merged 22 commits into from
Apr 18, 2024

Conversation

grcevski
Copy link
Contributor

@grcevski grcevski commented Apr 16, 2024

Some initial code for generation of span graphs directly from Beyla. The basic design is as follows:

  • Look to match a service by the calculated svc.ID.
  • If you can't find it, use k8s (if enabled) and DNS to resolve the name.
  • Cache all lookups to ensure we don't spam the DNS.
  • Works properly across different nodes if the DNS agrees with the service name configured for OTel. If they don't then service graphs will be partial.

TODO:

  • Prometheus scrape support
  • Unit tests
  • Integration tests
  • Docs

@grcevski grcevski changed the title Draft: Add support for directly generating span graphs Add support for directly generating span graphs Apr 16, 2024
@mariomac
Copy link
Contributor

mariomac commented Apr 17, 2024

The following PR does the merge with origin/main. It just resolves the reported conflict in the instrumenter.go file, adapting the current code to the new pipes API. Despite the big size of the PR (it adds all the changes from main into your branch), I have only slightly changed the pipeline.go and name_resolver.go files.

grcevski#4

@grcevski
Copy link
Contributor Author

The following PR does the merge with origin/main. It just resolves the reported conflict in the instrumenter.go file, adapting the current code to the new pipes API. Despite the big size of the PR (it adds all the changes from main into your branch), I have only slightly changed the pipeline.go and name_resolver.go files.

grcevski#4

Thanks a bunch Mario!

@grcevski grcevski marked this pull request as ready for review April 17, 2024 15:07
@codecov-commenter
Copy link

codecov-commenter commented Apr 17, 2024

Codecov Report

Attention: Patch coverage is 80.60942% with 70 lines in your changes are missing coverage. Please review.

Project coverage is 67.91%. Comparing base (fe9ac27) to head (b517207).

Files Patch % Lines
pkg/internal/export/otel/metrics.go 55.84% 23 Missing and 11 partials ⚠️
pkg/internal/export/prom/prom.go 76.11% 12 Missing and 4 partials ⚠️
pkg/transform/name_resolver.go 88.23% 10 Missing and 4 partials ⚠️
pkg/internal/transform/kube/db.go 90.32% 3 Missing ⚠️
pkg/internal/export/otel/traces.go 87.50% 2 Missing ⚠️
pkg/beyla/network_cfg.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #745      +/-   ##
==========================================
+ Coverage   67.41%   67.91%   +0.49%     
==========================================
  Files         102      103       +1     
  Lines        8853     9181     +328     
==========================================
+ Hits         5968     6235     +267     
- Misses       2445     2489      +44     
- Partials      440      457      +17     
Flag Coverage Δ
integration-test 67.91% <80.60%> (+0.49%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@mariomac mariomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀🙌🏻

@grcevski grcevski merged commit 6ed384d into grafana:main Apr 18, 2024
6 checks passed
@grcevski grcevski deleted the span_graphs branch April 18, 2024 19:49
@@ -41,6 +42,10 @@ const (
SpanMetricsCalls = "traces_spanmetrics_calls_total"
SpanMetricsSizes = "traces_spanmetrics_size_total"
TracesTargetInfo = "traces_target_info"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understood, Tempo generates traces_target_info when the OTel Collector Service Graph Connector generates target_info and we have a configuration flag to choose the one used (screenshot below).

@domasx2 do we have best practices here?

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean we may not need it? I simply went by what I saw Tempo generated for my cloud instance when using Alloy.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed for sure. I think what @cyrille-leclerc wondering is wether you should align with Tempo metric generator metric names, or OTEL spanmetrics/ service graph metrics connector metric names. I don't have a strong opinion TBH, though would lean towards OTEL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh absolutely OTel then, just confirming, so if we renamed the collection as target_info App O11y will continue to work, correct? I'll create a PR to fix this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that span metric names change as well for Otel, instead of traces_span_* it's duration_seconds_bucket, duration_seconds_count and duration_seconds_total

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as @domasx2 said, the question of OTel vs Tempo naming conventions. I would see value in adopting OTel conventions.
That being said, what's our recommendation if a customer is getting started combining Beyla with a bunch of app instrumented with OTel SDKs the simplest possible way, relying on Tempo metrics generation?
Sorry it's more complicated than we would like.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree with the Gang that it would be beneficial to run with the new otel default. Given that users can have more instrumentation than just a single beyla deployment, I would opt to make this configurable to be either Otel (default) OR tempo.

If it was flexible, people could rely on tempo generated metrics + the ones from Beyla.

I would also lean towards helping tempo to align with otel long-term.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I think we have a consensus. We'll add an OTel format and make it default.

I opened a new issue to track this here #821

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants