Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/otlp] Report runtime status #11366

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mwear
Copy link
Member

@mwear mwear commented Oct 4, 2024

Description

This PR adds runtime status reporting for the otlp and otlphttp exporters. It's an updated version of #8788 which was one of several approaches experimented with to add this functionality. This work was paused to allow the consumererror work to evolve, as it appeared as though there might be a way to uniformly apply the same logic to all exporters (via the exporterhelper), but that work has taken a different direction, and it looks like a uniform approach will not be possible.

This PR implements runtime status reporting as discussed in #9957. The choices for which statuses represent permanent errors are up for debate, but the key point is that a permanent error is an error discovered at runtime that will require user intervention to fix.

This implementation makes use of the finite state machine that underlies the status reporting system. The finite state machine will ensure that:

  • Only changes in status are reported. Repeat reports of the same status will no-op.
  • If a component transitions into a PermanentError all further status reports will no-op.

This means the exporter does not have to reason about current or previous statuses. It can report status based on its current view of the world and the status reporting system will handle the rest. Flapping between recoverable and ok is meant to be handled by watchers consuming status events. The healthcheckv2extension handles this by using a time based approach (e.g. recovery interval). Other watchers may choose to handle this situation differently.

For more information on status reporting, the state machine, etc see: https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/component-status.md

While all components report status during start and shutdown (via automation), this is the first component to report runtime status. This will allow the healthcheckv2extension to replace the currently non-functioning check_collector_pipeline capability of the original healthcheckextension and should serve as an example for other components that wish to report runtime status moving forward.

Link to tracking issue

Implements #9957 for the otlp and otlphttp exporters

Testing

units/manual

Documentation

code comments

@mwear mwear requested a review from a team as a code owner October 4, 2024 22:07
@mwear mwear requested a review from codeboten October 4, 2024 22:07
Copy link

codecov bot commented Oct 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.81%. Comparing base (c3f09f4) to head (84944b7).
Report is 29 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #11366   +/-   ##
=======================================
  Coverage   91.80%   91.81%           
=======================================
  Files         432      432           
  Lines       20423    20479   +56     
=======================================
+ Hits        18749    18802   +53     
- Misses       1300     1302    +2     
- Partials      374      375    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mwear mwear force-pushed the exp-status branch 2 times, most recently from 059934c to 646d3dc Compare October 4, 2024 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants