Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Add race option to detect raced codes #10899

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

sivchari
Copy link
Member

What this PR does / why we need it:

I added -race option to go test command. This option can find the raced code.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 18, 2024
@k8s-ci-robot k8s-ci-robot added do-not-merge/needs-area PR is missing an area label size/S Denotes a PR that changes 10-29 lines, ignoring generated files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 18, 2024
@sivchari
Copy link
Member Author

/assign

@fabriziopandini
Copy link
Member

/area testing
/test pull-cluster-api-e2e-full-main

@k8s-ci-robot k8s-ci-robot added the area/testing Issues or PRs related to testing label Aug 2, 2024
@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-cluster-api-build-main
  • /test pull-cluster-api-e2e-blocking-main
  • /test pull-cluster-api-e2e-conformance-ci-latest-main
  • /test pull-cluster-api-e2e-conformance-main
  • /test pull-cluster-api-e2e-main
  • /test pull-cluster-api-e2e-mink8s-main
  • /test pull-cluster-api-e2e-upgrade-1-30-1-31-main
  • /test pull-cluster-api-test-main
  • /test pull-cluster-api-test-mink8s-main
  • /test pull-cluster-api-verify-main

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-apidiff-main

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-apidiff-main
  • pull-cluster-api-build-main
  • pull-cluster-api-e2e-blocking-main
  • pull-cluster-api-test-main
  • pull-cluster-api-verify-main

In response to this:

/area testing
/test pull-cluster-api-e2e-full-main

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-area PR is missing an area label label Aug 2, 2024
@fabriziopandini
Copy link
Member

/test pull-cluster-api-e2e-main

@fabriziopandini
Copy link
Member

Overall lgtm, running E2E to validate changes in the in memory provider

Makefile Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 3, 2024
@chrischdi
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 5, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 5931a1618319937c284bd75a36f1709a484a6c7e

@@ -50,10 +50,10 @@ func (c *cache) startSyncer(ctx context.Context) error {
c.syncQueue.ShutDown()
}()

syncLoopStarted := false
syncLoopStarted := make(chan struct{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would drop this entirely.

This now just checks that we got until l.56. I'm not sure I understand why we are waiting for that. At this point the only guarantee is that the log was written, which doesn't make sense

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I fixed it.

@sbueringer
Copy link
Member

@sivchari 2 smaller findings. Sorry for the misunderstanding here: #10899 (comment)

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 20, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from chrischdi. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sivchari
Copy link
Member Author

/retest

@sivchari
Copy link
Member Author

When I remove the syncLoopStarted, the other race errors are spawned.
I'd investigate later.

@sbueringer
Copy link
Member

sbueringer commented Aug 20, 2024

When I remove the syncLoopStarted, the other race errors are spawned. I'd investigate later.

Hm not sure why that is, but these errors look entirely unrelated (they are even in a different go module)

@sivchari sivchari force-pushed the add-race-option branch 2 times, most recently from a93aa1d to ddd99a1 Compare August 29, 2024 07:30
@sbueringer
Copy link
Member

@sivchari Let's please split this PR into two. In the first PR, let's add the flag to all unit tests for which it just works without further modification. And then let's try to address everything else in a second PR.

I would really like to have the flag in the test and test-junit targets. I'm working on something where it's basically mandatory to run tests with the race detector

@sbueringer sbueringer added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Sep 16, 2024
@sbueringer
Copy link
Member

sbueringer commented Sep 19, 2024

Opened a PR: #11207

@sbueringer
Copy link
Member

sbueringer commented Sep 20, 2024

@sivchari #11207 will enable the race detector for the main tests.
It should also fix your test errors here.

Once #11207 is merged you can rebase and we can see if we want to add race detector everywhere. I'm mostly worried about slowing down the test ProwJob. But it's fine if the other test targets are slowing done the job only a bit.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 21, 2024
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 24, 2024
@sivchari
Copy link
Member Author

sivchari commented Sep 24, 2024

@sbueringer
Sorry for too late reply and thank you so much for your efforts.
I updated this branch.
Perhaps a little speed reduction, but I think it's more important to detect race codes than its speed.
Again, thanks for your brilliant jobs.

@sbueringer
Copy link
Member

@sivchari Can you re-add the -race flags to the test targets that don't have it already? So we can compare how muich longer the Job tests if we enable the race detector everywhere

@sivchari
Copy link
Member Author

sivchari commented Oct 2, 2024

@sbueringer
Sorry, I was on vacation.

Can you re-add the -race flags to the test targets that don't have it already? So we can compare how muich longer the Job tests if we enable the race detector everywhere

I'm not sure what you mean. -race is already added on each task (e.g. test, test-junit). It's enough, isn't it ? If not, please teach me what you want to do.

@sbueringer
Copy link
Member

No worries. -race is only set on some test targets. A previous version of this PR was setting it on all. I would like to set it on all test targets again that don't have it at the moment

@sivchari
Copy link
Member Author

sivchari commented Oct 4, 2024

You mean, you want to remove !race tag in all test targets, right ?

@sbueringer
Copy link
Member

sbueringer commented Oct 4, 2024

No, I want to add -race to all test Makefile targets that don't have it yet. For example test-docker-infrastructure

A previous version of your PR already had it correctly on all targets. My PR only added it to the most important test targets

@sivchari
Copy link
Member Author

sivchari commented Oct 7, 2024

Okay, I got it. Sorry for taking your time.
I'd take care of it by the end of this week.

@sbueringer
Copy link
Member

Thx! No problem at all and no rush! 😀

Signed-off-by: sivchari <shibuuuu5@gmail.com>
Signed-off-by: sivchari <shibuuuu5@gmail.com>
Signed-off-by: sivchari <shibuuuu5@gmail.com>
Signed-off-by: sivchari <shibuuuu5@gmail.com>
Signed-off-by: sivchari <shibuuuu5@gmail.com>
@sivchari
Copy link
Member Author

I re-added the -race to each target

@sbueringer
Copy link
Member

/test pull-cluster-api-test-main

@@ -50,10 +50,8 @@ func (c *cache) startSyncer(ctx context.Context) error {
c.syncQueue.ShutDown()
}()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sivchari I took another look. Let's keep this but make it concurrency safe
l.53

var syncLoopStarted atomic.Bool

l.56

syncLoopStarted.Store(true)

l.85

if !syncLoopStarted.Load() {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SubhasmitaSw
Thank you for commenting about it. Surely, it might be right, but I think it's not necessary. In l.83, we check if all workers starts by atomic.Load(&workers) < int64(c.syncConcurrency) and I believe it's enough to achieve to block data race. Thanks.

Copy link
Member

@sbueringer sbueringer Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check in l.83++ does not include the syncloop (l.53-l.63)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing Issues or PRs related to testing cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

5 participants