Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider disallowing components from starting goroutines during creation #9734

Open
crobert-1 opened this issue Mar 8, 2024 · 6 comments
Open
Labels
bug Something isn't working

Comments

@crobert-1
Copy link
Member

Describe the bug

For context, here's the collector's component documentation, specifically referencing this section:

Component is either a receiver, exporter, processor, or an extension.

A component's lifecycle has the following phases:

  1. Creation: The component is created using its respective factory, via a Create* call.
  2. Start: The component's Start method is called.
  3. Running: The component is up and running.
  4. Shutdown: The component's Shutdown method is called and the lifecycle is complete.

I've observed that some components are starting goroutines in the creation step. This can cause potential memory leaks and isn't clear to consumers. Here are my overarching points:

  1. If the creation of any component errors, Shutdown is not called on any created component. Component creation is called in the sub stack of this method, but if an error is returned no created components are shutdown.
  2. It's not immediately obvious (to me) that Create starts background work, especially when step 2 of the referenced documentation is Start. With unclear behavior we're liable to mistakenly assume things (such as it being unnecessary to call Shutdown if Start hasn't been called). The memory leak described in my first point is another case in point of this.

I haven't found any place that requires goroutines to be started in creation, all could be started in Start.

Steps to reproduce

The kafka exporter is an example of starting a running goroutine during Create{Logs/Traces/Metrics}Exporter. If any component created after this exporter in the process of building the service graph returns an error on Create, the collector will shutdown without calling Shutdown on the kafka exporter. This is a goroutine (memory) leak.

What did you expect to see?

Create creates a component, but nothing starts doing the work of the components until Start is called.

What did you see instead?

Leaked goroutines unless Shutdown is called, even without calling Start.

Additional context

Another example of lack of clarity around components and Shutdown: #9682

Alternative
We could call Shutdown on every created component if any single one fails, as a safeguard against goroutines started in Create. However, this still doesn't resolve the issue of clarity and expectations around what "Create" should do.

@crobert-1 crobert-1 added the bug Something isn't working label Mar 8, 2024
@crobert-1 crobert-1 changed the title Consider forbidding components from starting goroutines during creation Consider disallowing components from starting goroutines during creation Mar 8, 2024
@mx-psi
Copy link
Member

mx-psi commented Mar 11, 2024

I am in favor of this. Is there a way we can use goleak to enforce this? Would a TestCreate that creates but does not start or shutdown the component test this with goleak?

@crobert-1
Copy link
Member Author

Is there a way we can use goleak to enforce this? Would a TestCreate that creates but does not start or shutdown the component test this with goleak?

Yes, your proposed test would be a great way to test this.

In my mind, this should be something we test through mdatagen. The goal is to have goleak generated through mdatagen(open-telemetry/opentelemetry-collector-contrib#30483), so I think this test could be generated for every component that has goleak enabled. mdatagen already has tests that create components for each telemetry type (which would be necessary for this) so I believe this should be a pretty simple enhancement once we also have goleak tests generated there.

@mx-psi
Copy link
Member

mx-psi commented Mar 11, 2024

Makes total sense to me to take that approach! 👍

@atoulme
Copy link
Contributor

atoulme commented Mar 11, 2024

This would help greatly, agreed. Thanks Curtis!

@andrzej-stencel
Copy link
Member

@crobert-1 can you point me to the place where the Kafka exporter starts a goroutine in Create function? I cannot find it.

I suppose another way to fix this issue would be to introduce a Teardown or Destroy function that would be called after Create even if Start hasn't been called. But as long as it's not required to have any goroutines in Create functions and there's nothing else to clean up after Create, then the Teardown function is not needed.

@crobert-1
Copy link
Member Author

crobert-1 commented Mar 22, 2024

@crobert-1 can you point me to the place where the Kafka exporter starts a goroutine in Create function? I cannot find it.

Sorry the confusion there, it looks like this PR moved the goroutine start to Start, instead of Create.

I found a couple other examples, the k8s attributes processor (slightly different, but same idea), and the signalfx exporter:

ttlmap.(*TTLMap).Start (ttl_map.go:31) github.com/open-telemetry/opentelemetry-collector-contrib/internal/common/ttlmap
translation.newDeltaTranslator (delta_translator.go:23) github.com/open-telemetry/opentelemetry-collector-contrib/exporter/signalfxexporter/internal/translation
translation.NewMetricTranslator (translator.go:243) github.com/open-telemetry/opentelemetry-collector-contrib/exporter/signalfxexporter/internal/translation
signalfxexporter.(*Config).getMetricTranslator (config.go:169) github.com/open-telemetry/opentelemetry-collector-contrib/exporter/signalfxexporter
signalfxexporter.newSignalFxExporter (exporter.go:75) github.com/open-telemetry/opentelemetry-collector-contrib/exporter/signalfxexporter
signalfxexporter.createMetricsExporter (factory.go:120) github.com/open-telemetry/opentelemetry-collector-contrib/exporter/signalfxexporter

(I still haven't done a full search of this, so this isn't a definitive list 👍)

Edit:
Another example, the AWS EMF exporter:

singleflight.(*Group).DoChan (singleflight.go:90) github.com/aws/aws-sdk-go/internal/sync/singleflight
credentials.(*Credentials).GetWithContext (credentials.go:254) github.com/aws/aws-sdk-go/aws/credentials
credentials.(*Credentials).Get (credentials.go:298) github.com/aws/aws-sdk-go/aws/credentials
awsutil.getSTSCreds (conn.go:230) github.com/open-telemetry/opentelemetry-collector-contrib/internal/aws/awsutil
awsutil.(*Conn).newAWSSession (conn.go:205) github.com/open-telemetry/opentelemetry-collector-contrib/internal/aws/awsutil
awsutil.GetAWSConfigSession (conn.go:148) github.com/open-telemetry/opentelemetry-collector-contrib/internal/aws/awsutil
awsemfexporter.newEmfExporter (emf_exporter.go:53) github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsemfexporter
awsemfexporter.createMetricsExporter (factory.go:58) github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsemfexporter
exporter.CreateMetricsFunc.CreateMetricsExporter (exporter.go:113) go.opentelemetry.io/collector/exporter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants