Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chain/neutrino: improve rescan locking #695

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

cfromknecht
Copy link
Contributor

Fixes 5 or so issues around locking the rescan-related member variables, see commit messages for more details. The issue fixed by the first commit was triggered by an lnd itest, the others are just observed from looking through the code.

Copy link
Contributor

@halseth halseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@guggero
Copy link
Collaborator

guggero commented Nov 7, 2020

I tried running our itests against this PR (rebased onto current btcwallet master) and it looks like something's deadlocking now. The tests that do rescan don't even start properly, not sure what's going on.

This commit fixes a potential nil pointer dereference in the neutrino
chain client. Currently the Rescan method blindly overwrites the rescan
member with nil _after_ reacquiring the lock. However, the field may
have been populated with a new rescan while the mutex wasn't held,
causing a panic later on where we assume the rescan is non-nil.

To remedy, we add a simple check that asserts we are nilling the same
rescan we were shutting down while the mutex was unlocked. If we find a
differing value, we will continue to this process until we arrive at
matching rescan pointers, at which point we know we can safely proceed
in creating a new rescan.
The scanning variable is only ever set to true, and it happens in the
same paths that the rescanQuit is initialized. Since rescanQuit is also
only ever set to a non-nil value, we instead use this to determine
whether we are "scanning".
The rescanQuit and rescan objects are set under different mutex
acquisitions, hence it's possible for rescanQuit to be non-nil while
rescan is nil. This commit ensures that we only try to wait for shutdown
of non-nil rescans.
This also fixes an unprotected access of s.rescan when calling Update
without the lock held.
Currently NotifyBlock releases the clientMtx before calling a public
version of NotifyReceived that reacquires clientMtx. This can have
unexpected behavior because the value of isScanning() could change
between lock acquisitions. We switch to using the internal
notifyReceived so that our read on isScanning() is consistent for the
duration of the call.
cfromknecht added a commit to cfromknecht/lnd that referenced this pull request Dec 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants