Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NeutrinoClient data race #819

Closed
MStreet3 opened this issue Oct 14, 2022 · 0 comments · Fixed by #822
Closed

NeutrinoClient data race #819

MStreet3 opened this issue Oct 14, 2022 · 0 comments · Fixed by #822

Comments

@MStreet3
Copy link
Contributor

MStreet3 commented Oct 14, 2022

Problem

The NeturinoClient Rescan method locks, sets rescan to nil then unlocks itself before resetting the value.
As the client state is scanning = true yet rescan = nil when the lock is released there is a race between the executing Rescan method and any NotifyReceived calls that are waiting to obtain the lock. If a NotifyReceived call picks up the lock in this state, it will panic when it tries to run rescan.Update as scanning is true.

https://github.com/btcsuite/btcwallet/blob/master/chain/neutrino.go#L351

Originally analyzed and found by @guggero in lightningnetwork/lnd#6985

--- FAIL: TestLightningWallet (7.74s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x7d47b7]

goroutine 6 [running]:
testing.tRunner.func1.2({0x9c34e0, 0xfdbfe0})
	/opt/hostedtoolcache/go/1.19.0/x64/src/testing/testing.go:1396 +0x24e
testing.tRunner.func1()
	/opt/hostedtoolcache/go/1.19.0/x64/src/testing/testing.go:1399 +0x39f
panic({0x9c34e0, 0xfdbfe0})
	/opt/hostedtoolcache/go/1.19.0/x64/src/runtime/panic.go:884 +0x212
github.com/lightninglabs/neutrino.(*Rescan).Update(0x0, {0xc00064a248, 0x1, 0xc00129a4e0?})
	/home/runner/work/go/pkg/mod/github.com/lightninglabs/neutrino@v0.14.2/rescan.go:1391 +0x57
github.com/btcsuite/btcwallet/chain.(*NeutrinoClient).NotifyReceived(0xc0000fa120, {0xc0012bc060?, 0x1, 0x1})
	/home/runner/work/go/pkg/mod/github.com/btcsuite/btcwallet@v0.16.1/chain/neutrino.go:457 +0x159
github.com/btcsuite/btcwallet/wallet.(*Wallet).NewAddress(0xc0001ce480, 0x0, {0x0?, 0x9f16e0?})
	/home/runner/work/go/pkg/mod/github.com/btcsuite/btcwallet@v0.16.1/wallet/wallet.go:3020 +0x17a
github.com/lightningnetwork/lnd/lnwallet/btcwallet.(*BtcWallet).NewAddress(0xc0005a8180, 0x50?, 0x0, {0xa88a60?, 0x7?})
	/home/runner/work/lnd/lnd/lnwallet/btcwallet/btcwallet.go:556 +0x10c
github.com/lightningnetwork/lnd/lnwallet/test.loadTestCredits(0xc000078720, 0xc000340380, 0x14, 0x0?)
	/home/runner/work/lnd/lnd/lnwallet/test/test_interface.go:275 +0x287
github.com/lightningnetwork/lnd/lnwallet/test.createTestWallet({0xc00031ff80?, 0xa87ad1?}, 0x5?, 0xfe8080, {0xc43560, 0xc0001ca180}, {0xc48820, 0xc0005a8180}, {0xc43b00, 0xc0011c41c8}, ...)
	/home/runner/work/lnd/lnd/lnwallet/test/test_interface.go:378 +0x4a9
github.com/lightningnetwork/lnd/lnwallet/test.runTests(_, _, {_, _}, _, {{0xc00002eba0, 0xe}, {0xa873bc, 0x2}, {0xa879c6, ...}, ...}, ...)
	/home/runner/work/lnd/lnd/lnwallet/test/test_interface.go:3398 +0x2633
github.com/lightningnetwork/lnd/lnwallet/test.TestLightningWallet(0xc000007ba0, {0xa89361, 0x8})
	/home/runner/work/lnd/lnd/lnwallet/test/test_interface.go:3064 +0x8ef
github.com/lightningnetwork/lnd/lnwallet/test/neutrino_test.TestLightningWallet(0x0?)
	/home/runner/work/lnd/lnd/lnwallet/test/neutrino/neutrino_test.go:12 +0x25
testing.tRunner(0xc000007ba0, 0xb9f9b8)
	/opt/hostedtoolcache/go/1.19.0/x64/src/testing/testing.go:1446 +0x10b
created by testing.(*T).Run
	/opt/hostedtoolcache/go/1.19.0/x64/src/testing/testing.go:1493 +0x35f
FAIL	github.com/lightningnetwork/lnd/lnwallet/test/neutrino	7.770s
FAIL

Related Work

See PR #695
See PR #533

Solution 1

Make a change against btcwallet to address the state management issues:

  1. make the entire Rescan method a critical path (i.e., only unlock via a single defer call)
  2. make the entire NotifyReceived method a critical path
  3. remove the locking in the notificationHandler to prevent a deadlock caused by making Rescan completely critical. This locking isn't necessary as it attempts to read the state of the rescanErr channel. But if rescanErr is nil, then there is no Rescan object created yet so no errors to read. Also, reading from a nil channel will simply block but the read is wrapped in a select statement with other exits.

Risks

  • there are no unit tests on the neutrino client in btcwallet and even a targeted change like this is hard to know if it introduced any regressions
@MStreet3 MStreet3 changed the title Found in unit test run: NeutrinoClient data race Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant