[exporter/otlp] Report runtime status #11366
Open
+482
−14
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR adds runtime status reporting for the otlp and otlphttp exporters. It's an updated version of #8788 which was one of several approaches experimented with to add this functionality. This work was paused to allow the consumererror work to evolve, as it appeared as though there might be a way to uniformly apply the same logic to all exporters (via the exporterhelper), but that work has taken a different direction, and it looks like a uniform approach will not be possible.
This PR implements runtime status reporting as discussed in #9957. The choices for which statuses represent permanent errors are up for debate, but the key point is that a permanent error is an error discovered at runtime that will require user intervention to fix.
This implementation makes use of the finite state machine that underlies the status reporting system. The finite state machine will ensure that:
This means the exporter does not have to reason about current or previous statuses. It can report status based on its current view of the world and the status reporting system will handle the rest. Flapping between recoverable and ok is meant to be handled by watchers consuming status events. The healthcheckv2extension handles this by using a time based approach (e.g. recovery interval). Other watchers may choose to handle this situation differently.
For more information on status reporting, the state machine, etc see: https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/component-status.md
While all components report status during start and shutdown (via automation), this is the first component to report runtime status. This will allow the healthcheckv2extension to replace the currently non-functioning
check_collector_pipeline
capability of the original healthcheckextension and should serve as an example for other components that wish to report runtime status moving forward.Link to tracking issue
Implements #9957 for the otlp and otlphttp exporters
Testing
units/manual
Documentation
code comments