-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chain: fix NeutrinoClient segfault on NotifyReceived call #822
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for re-structuring the PR! I left a few suggestions for how to improve it even more for easier review.
But I was able to understand all the changes, nice work! Sorry it took so long to get to the first pass review.
I think we can further reduce the diff by passing in the newRescanFunc
in the NewNeutrinoClient
. That removes the need for any type casting or having two versions of the cs
member around. See my diff with the proposed changes (might need some additional cleanup, this is just to get my idea across):
4d87ccb
to
45a359f
Compare
45a359f
to
54710ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, only nits left 🎉 Thanks for all your work on this!
@ellemouton could you take a look at this as well please? |
a175d3f
to
baead51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rescanMtx
resolution seems fine, but this PR introduces some notable and breaking API changes and abstraction, which create some challenges and confusion for consumers.
The abstraction seems unnecessary, except for test stubbing, but even then I don't think the constructors and exported types need to change. Namely, the test code can and does set the fields directly with its mock stubs (without using the constructors), since e.g. newMockNeutrinoClient
manually creates the NeutrinoClient
, and the newly added NewRescanFunc
isn't used. Could the exported constructor, which consumers like the btcwallet app use, stay the same as before but deal with setting up the rescanner fields from the provided chain service?
This is only from a cursory scan, so I apologize in advance if my assessment is off base, but I don't think this improves the API in its present state, and I suspect the goals of testing with a mock chain service can still be achieved without the API changes.
0b24151
to
a6507c0
Compare
47929e7
to
9d91d11
Compare
df17f40
to
9498860
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, LGTM 🎉
9498860
to
49e8366
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for all the quick updates on this. LGTM 🎉
Before we merge this, can you please update (or create?) the PR on the lnd
side to make sure all neutrino integration tests still pass?
Yea will do. |
c5e8a86
to
53904c1
Compare
@guggero the corresponding LND PR with updated btcwallet is here: lightningnetwork/lnd#7049 |
53904c1
to
12cb2a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Love the new abstractions that make things way more testable. Great job!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still builds and runs fine in my use case.
With limited familiarity with btcwallet's code, the atomicity changes look fine too. I don't see any lock ordering pitfalls that could lead to deadlock, but it's a risk of nested locking.
I think I see some ancient data races on the rescanQuit
field in the notification handlers like onBlockConnected
, but that's unrelated to the issue being solved in this PR.
Define an abstract rescanner interface and a newRescanFunc constructor type. Define a constructor adapter that adapts a neutrino.NewRescan constructor into the required type. Replace direct calls in the NeutrinoClient to the neutrino.NewRescan constructor with calls to the abstract type.
Define NeutrinoChainService interface to provide abstract access to all the public methods defined for a *neutrino.ChainService object. The interface defines all public methods to ensure downstream consumers of the package maintain compatibility as the previous reference to a *neutrino.ChainService is exposed publicly on the NeutrinoClient.
Add TestNeutrinoClientSequentialStartStop to verify that the neutrino client can have Start() and Stop() called sequentially. Test case fails with -race flag enabled. Add TestNeutrinoClientNotifyReceived to verify that calls to NotifyReceived call the Update method of the rescan object. Verify that there is no race condition on any state variables of the NeutrinoClient. Pass test with -race flag enabled. Add a unit test that demonstrates a segmentation fault exists when concurrent calls to NotifyReceived, NotifyBlocks and Rescan methods of the NeutrinoClient are executed. The rescan property is set to nil and a segmentation fault arises.
Add a mutex to synchronize access to the rescan prop of the neutrino client. Make calls to Rescan and NotifyReceived require the rescan mutex lock to execute. Resolve the segmentation fault in the TestNeutrinoClientNotifyReceivedRescan unit test.
Resolve the data race in the sequential start stop unit test by ensuring that each goroutine launched in the Start method is properly accounted for in the client's waitgroup. Add a TODO to shutdown the rescan goroutine when the Stop method is called.
12cb2a6
to
8c31629
Compare
chain: fix NeutrinoClient segfault on NotifyReceived call
chain: fix NeutrinoClient segfault on NotifyReceived call
Fixes #819
Replaces #820
Background
There is datarace in the neutrino client implementation that is causing test
flakes in
lnd
. The sharedrescan
object and the granularlocking strategy allow for the possibility of a segmentation fault when
NotifyReceived
attempts to callUpdate
on arescan
that isnil
. Thisis because the
clientMtx
lock is released afterrescan
is set tonil
.Changes
The solution here is to be minimally invasive. Add a new
rescanMtx
mutexand require that the
NotifyReceived
method and theRescan
methodhold the
rescanMtx
lock for the duration of their execution. Locking andunlocking of the client is unaffected and changes to the
rescan
objectbecome atomic.
Testing
This PR adds two interfaces
rescanner
andnutrinoChainService
to abstract out the dependency on structs from the
neutrino
package.This PR adds a file of mock implementations for these interfaces and uses them
in a new unit test file.