Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chore]: fix go routine leaks in tests #34729

Merged
merged 25 commits into from
Oct 9, 2024

Conversation

odubajDT
Copy link
Contributor

@odubajDT odubajDT commented Aug 19, 2024

Description:

  • removing re-running parameter from gotestsum
  • fixing tests/code with go routine leaks
  • resolving race conditions (mostly caused by parallel tests)
  • placing goleak ignorers for go routine leaks from external libraries

Link to tracking Issue: #34495

@odubajDT odubajDT force-pushed the fix-test-gotestsum branch 3 times, most recently from 2fa8bfe to e1ff106 Compare August 21, 2024 08:56
@github-actions github-actions bot requested a review from pxaws August 21, 2024 11:01
@odubajDT odubajDT force-pushed the fix-test-gotestsum branch 2 times, most recently from 9d84a9d to 565ee62 Compare August 21, 2024 11:26
@crobert-1 crobert-1 mentioned this pull request Sep 12, 2024
@songy23 songy23 added the ready to merge Code review completed; ready to merge by maintainers label Sep 13, 2024
@odubajDT odubajDT requested a review from a team as a code owner September 19, 2024 05:25
Makefile.Common Outdated
GOTESTSUM_OPT?= --rerun-fails=1
GOTESTSUM_OPT?=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this option removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enabling the option led to re-running the failed tests (which mostly failed to to go routine leaks and race conditions), which always led to a passed test. This behavior was from some reason blocking the actual problems which were shown after it was removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this in #31253 because we have a problem with flaky tests and not enough resources to deal with them. I feel like this is still the case and that if we remove this we will get a more flaky CI which is detrimental to contributors. Is there any way we can detect leaks while still re-running other unit tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware of any possibility here. In fact it seems very strange to me that all of the failing tests become passed after re-running them for the 2. time.

We can leave the re-run option in place until we find the root cause why the re-executed tests have very deterministic behavior in terms of go-leaks and data-races, but we should at least consider merging the fixed tests as part of this PR.

Maybe it can be a caching issue in the gotestsum wrapper, not sure here...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you report this upstream on the gotestsum repository then to see if it's actually an issue with them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported here

But there seems to be a bigger problem with the --rerun-fails option, as I see multiple issues created on the upstream related to it https://github.com/gotestyourself/gotestsum/issues

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @mx-psi

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I subscribed to gotestyourself/gotestsum/issues/442 and I would like to wait a bit more to see if upstream has anything to say here. I see some issues related to --rerun-fails, but I don't see any specifically related to this.

I am happy to merge the rest of the changes and only keep --rerun-fails to a separate PR, would that work for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, reverted the --rerun-fails and will keep an eye on the issue opened on gotestsum upstream.

When it will be resolved, will open a new PR

Makefile.Common Outdated Show resolved Hide resolved
@andrzej-stencel andrzej-stencel merged commit 5398c6b into open-telemetry:main Oct 9, 2024
155 checks passed
@github-actions github-actions bot added this to the next release milestone Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants