Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer - Remove recursive locks in snapshot node #1004

Merged
merged 2 commits into from
Oct 22, 2023

Conversation

yahavi
Copy link
Member

@yahavi yahavi commented Oct 19, 2023

  • All tests passed. If this feature is not already covered by the tests, I added new tests.
  • All static analysis checks passed.
  • This pull request is on the dev branch.
  • I used gofmt for formatting the code before submitting the pull request.

In the given situation, a deadlock could potentially arise. Let's look at the nodes A, B, and C:

Thread 1:
convertToWrapper -> acquired lock on A -> convertToWrapper -> acquired lock on B -> convertToWrapper -> Attempt to acquire lock on C

Thread 2:
CheckCompleted -> setCompleted -> acquired lock on C -> CheckCompleted -> setCompleted -> Attempt to acquire lock on B

Both Thread 1 and Thread 2 find themselves in a state of mutual waiting. Thread 1 is waiting for the mutex of C, which is held by Thread 2, while Thread 2 is waiting for the mutex of B, which is held by Thread 1.

The solution is to ensure that each thread never holds more than one mutex at a time.

Proofs from the logs - Deadlock of 680 minutes

Proof for thread 1's scenario:

goroutine 299 [sync.Mutex.Lock, 680 minutes]:
runtime.gopark(0xfca5e0?, 0xc00ccf9808?, 0xa0?, 0xbd?, 0x7fd74d470108?)
        /var/jenkins_home/tools/org.jenkinsci.plugins.golang.GolangInstallation/go-1.20.8/src/runtime/proc.go:381 +0xd6 fp=0xc00ccf97a0 sp=0xc00ccf9780 pc=0x438176
runtime.goparkunlock(...)
        /var/jenkins_home/tools/org.jenkinsci.plugins.golang.GolangInstallation/go-1.20.8/src/runtime/proc.go:387
runtime.semacquire1(0xc02ade29b4, 0x0?, 0x3, 0x1, 0x1f?)
        /var/jenkins_home/tools/org.jenkinsci.plugins.golang.GolangInstallation/go-1.20.8/src/runtime/sema.go:160 +0x20f fp=0xc00ccf9808 sp=0xc00ccf97a0 pc=0x448f2f
sync.runtime_SemacquireMutex(0x1b94780?, 0xb0?, 0x40dc68?)
        /var/jenkins_home/tools/org.jenkinsci.plugins.golang.GolangInstallation/go-1.20.8/src/runtime/sema.go:77 +0x26 fp=0xc00ccf9840 sp=0xc00ccf9808 pc=0x464066
sync.(*Mutex).lockSlow(0xc02ade29b0)
        /var/jenkins_home/tools/org.jenkinsci.plugins.golang.GolangInstallation/go-1.20.8/src/sync/mutex.go:171 +0x165 fp=0xc00ccf9890 sp=0xc00ccf9840 pc=0x481de5
sync.(*Mutex).Lock(...)
        /var/jenkins_home/tools/org.jenkinsci.plugins.golang.GolangInstallation/go-1.20.8/src/sync/mutex.go:90
github.com/jfrog/jfrog-cli-core/v2/utils/reposnapshot.(*Node).action(0x41e000?, 0xc00ccf9900?)
        /root/go/pkg/mod/github.com/jfrog/jfrog-cli-core/v2@v2.45.1/utils/reposnapshot/node.go:47 +0x5f fp=0xc00ccf98e8 sp=0xc00ccf9890 pc=0xcb6cbf
github.com/jfrog/jfrog-cli-core/v2/utils/reposnapshot.(*Node).convertToWrapper(0xc03e300000?)
        /root/go/pkg/mod/github.com/jfrog/jfrog-cli-core/v2@v2.45.1/utils/reposnapshot/node.go:55 +0x45 fp=0xc00ccf9920 sp=0xc00ccf98e8 pc=0xcb6e25
github.com/jfrog/jfrog-cli-core/v2/utils/reposnapshot.(*Node).convertToWrapper.func1(0xc000355580)
        /root/go/pkg/mod/github.com/jfrog/jfrog-cli-core/v2@v2.45.1/utils/reposnapshot/node.go:61 +0xcb fp=0xc00ccf9998 sp=0xc00ccf9920 pc=0xcb6f2b
github.com/jfrog/jfrog-cli-core/v2/utils/reposnapshot.(*Node).action(0x1?, 0xc00ccf9a08?)
        /root/go/pkg/mod/github.com/jfrog/jfrog-cli-core/v2@v2.45.1/utils/reposnapshot/node.go:50 +0x9f fp=0xc00ccf99f0 sp=0xc00ccf9998 pc=0xcb6cff
github.com/jfrog/jfrog-cli-core/v2/utils/reposnapshot.(*Node).convertToWrapper(0xcbd379?)
        /root/go/pkg/mod/github.com/jfrog/jfrog-cli-core/v2@v2.45.1/utils/reposnapshot/node.go:55 +0x45 fp=0xc00ccf9a28 sp=0xc00ccf99f0 pc=0xcb6e25
github.com/jfrog/jfrog-cli-core/v2/utils/reposnapshot.(*Node).convertAndSaveToFile(0xc1443b63394b29a7?, {0xc000043dc0, 0x6e})
        /root/go/pkg/mod/github.com/jfrog/jfrog-cli-core/v2@v2.45.1/utils/reposnapshot/node.go:178 +0x27 fp=0xc00ccf9a68 sp=0xc00ccf9a28 pc=0xcb7747
github.com/jfrog/jfrog-cli-core/v2/utils/reposnapshot.(*RepoSnapshotManager).PersistRepoSnapshot(...)
        /root/go/pkg/mod/github.com/jfrog/jfrog-cli-core/v2@v2.45.1/utils/reposnapshot/snapshotmanager.go:82

Proof for thread 2's scenario:

goroutine 276 [sync.Mutex.Lock, 680 minutes]:
runtime.gopark(0xc019c74c60?, 0xc024bc9860?, 0x60?, 0x4c?, 0xcb638c?)
        /var/jenkins_home/tools/org.jenkinsci.plugins.golang.GolangInstallation/go-1.20.8/src/runtime/proc.go:381 +0xd6 fp=0xc00ccf7738 sp=0xc00ccf7718 pc=0x438176
runtime.goparkunlock(...)
        /var/jenkins_home/tools/org.jenkinsci.plugins.golang.GolangInstallation/go-1.20.8/src/runtime/proc.go:387
runtime.semacquire1(0xc0003555b4, 0x0?, 0x3, 0x1, 0x1e?)
        /var/jenkins_home/tools/org.jenkinsci.plugins.golang.GolangInstallation/go-1.20.8/src/runtime/sema.go:160 +0x20f fp=0xc00ccf77a0 sp=0xc00ccf7738 pc=0x448f2f
sync.runtime_SemacquireMutex(0x18?, 0x20?, 0xc02ade2980?)
        /var/jenkins_home/tools/org.jenkinsci.plugins.golang.GolangInstallation/go-1.20.8/src/runtime/sema.go:77 +0x26 fp=0xc00ccf77d8 sp=0xc00ccf77a0 pc=0x464066
sync.(*Mutex).lockSlow(0xc0003555b0)
        /var/jenkins_home/tools/org.jenkinsci.plugins.golang.GolangInstallation/go-1.20.8/src/sync/mutex.go:171 +0x165 fp=0xc00ccf7828 sp=0xc00ccf77d8 pc=0x481de5
sync.(*Mutex).Lock(...)
        /var/jenkins_home/tools/org.jenkinsci.plugins.golang.GolangInstallation/go-1.20.8/src/sync/mutex.go:90
github.com/jfrog/jfrog-cli-core/v2/utils/reposnapshot.(*Node).action(0x0?, 0xc00ccf78a0?)
        /root/go/pkg/mod/github.com/jfrog/jfrog-cli-core/v2@v2.45.1/utils/reposnapshot/node.go:47 +0x5f fp=0xc00ccf7880 sp=0xc00ccf7828 pc=0xcb6cbf
github.com/jfrog/jfrog-cli-core/v2/utils/reposnapshot.(*Node).CheckCompleted(0xc1443acd384ac9d5?)
        /root/go/pkg/mod/github.com/jfrog/jfrog-cli-core/v2@v2.45.1/utils/reposnapshot/node.go:126 +0x45 fp=0xc00ccf78c0 sp=0xc00ccf7880 pc=0xcb7265
github.com/jfrog/jfrog-cli-core/v2/utils/reposnapshot.(*Node).setCompleted.func1(0xcb6da6?)
        /root/go/pkg/mod/github.com/jfrog/jfrog-cli-core/v2@v2.45.1/utils/reposnapshot/node.go:117 +0x67 fp=0xc00ccf78d8 sp=0xc00ccf78c0 pc=0xcb8807
github.com/jfrog/jfrog-cli-core/v2/utils/reposnapshot.(*Node).action(0xc02ade29b0?, 0xc00ccf7928?)
        /root/go/pkg/mod/github.com/jfrog/jfrog-cli-core/v2@v2.45.1/utils/reposnapshot/node.go:50 +0x9f fp=0xc00ccf7930 sp=0xc00ccf78d8 pc=0xcb6cff
github.com/jfrog/jfrog-cli-core/v2/utils/reposnapshot.(*Node).setCompleted(0x0?)
        /root/go/pkg/mod/github.com/jfrog/jfrog-cli-core/v2@v2.45.1/utils/reposnapshot/node.go:111 +0x25 fp=0xc00ccf7950 sp=0xc00ccf7930 pc=0xcb71e5
github.com/jfrog/jfrog-cli-core/v2/utils/reposnapshot.(*Node).CheckCompleted(0xc8?)
        /root/go/pkg/mod/github.com/jfrog/jfrog-cli-core/v2@v2.45.1/utils/reposnapshot/node.go:142 +0x65 fp=0xc00ccf7990 sp=0xc00ccf7950 pc=0xcb7285
github.com/jfrog/jfrog-cli-core/v2/artifactory/commands/transferfiles.setChunkCompletedInRepoSnapshot(0xc0000f2a80, {0xc00ba3cf00?, 0x1, 0x24?})
        /root/go/pkg/mod/github.com/jfrog/jfrog-cli-core/v2@v2.45.1/artifactory/commands/transferfiles/utils.go:289 +0x10f fp=0xc00ccf7a48 sp=0xc00ccf7990 pc=0xd89e4f
github.com/jfrog/jfrog-cli-core/v2/artifactory/commands/transferfiles.handleChunksStatuses(0xc0002a0480, 0xc00ccf7cf8, 0xc00ccf7c20, 0xc00ccf7bd8?, 0x20?)
        /root/go/pkg/mod/github.com/jfrog/jfrog-cli-core/v2@v2.45.1/artifactory/commands/transferfiles/manager.go:396 +0x407 fp=0xc00ccf7b48 sp=0xc00ccf7a48 pc=0xd7e5e7

@yahavi yahavi requested review from RobiNino and eyalbe4 October 19, 2023 13:20
@yahavi yahavi self-assigned this Oct 19, 2023
@yahavi yahavi added the bug Something isn't working label Oct 19, 2023
@yahavi yahavi changed the title Transfer - Remove recursive locks in snapshot mgr Transfer - Remove recursive locks in snapshot node Oct 19, 2023
@yahavi yahavi marked this pull request as ready for review October 19, 2023 13:39
Copy link
Contributor

@eyalbe4 eyalbe4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there's a test that's failing on this PR.

@yahavi yahavi merged commit 45dca6a into jfrog:dev Oct 22, 2023
6 of 8 checks passed
@yahavi yahavi deleted the fix-deadlock branch October 22, 2023 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants