-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hardening]: synchronously handle on-chain events during startup #8166
Comments
Today we block start up until the chain backend is synced with the wallet: Lines 648 to 672 in 64753b0
The sub-system that handles on-chain claims is created a bit later in the life cycle here: Lines 1981 to 1985 in 64753b0
To accomplish this, we'll need to defer connecting to our set of persistent peers: Lines 2091 to 2148 in 64753b0
Then implement some async routine where the chain arb tells the server that all contracts have finished resolving. I think this should only be restricted to: breaches, and commitment outputs or HTLCs that have fully expired and are ready to sweep. Otherwise, we'd stay in that mode for days potentially. |
The start/stop order of subsystems can be tricky, some relevant context and previous work,
I think before attempting fixing this issue, we should refactor |
Do you mean the Brontide Peer Manager by For example @xmrk-btc had the case where his node came online with an outdated channel.db, couple of channels had HTLCs on it (in the old channel.db state) which were timeout-out at the current blockheight, so even before the channel was reestablished initiating a SCB by the peer with the correct state (potentially), the channel-arbitrator immediately went onchain for the channel with the HTLC on it causing funds to be lost. So I would suggest starting the channelarbitrator for a channel only if our peer does not cooperate in the reestablishment msg. So I think the self-harming scenario but also the attack scenario have to be combined for the best outcome. |
There's a connection manager in
How would we detect if the peer is lying in the channel reestablish? The peer could falsely claim to have a newer state, causing our node to attempt DLP and not start the channel arbitrator. Then the peer could steal funds onchain without our node detecting it. The other problem is the attack surface widens dramatically after peers are allowed to connect -- peers can send anything they want over the connection, potentially crashing our node if there's a bug. If we always reconnect before starting channel arbitrator, the peer always gets a chance to crash us before we can detect stealing attempts. |
From lightning/bolts#1049, I think you can tell that a peer is pretending to have a fake commitment state by not supplying the right From the spec:
I agree with that, I am inclined to also always think adversarial though I would also like to see some measurements in place which would prevent us harm ourselves. For example if our peer was honest and would have signaled us to not go onchain with an older state via the channel reestablishment msg. |
Cool, thanks for explaining.
I think "safe mode" would be great for this, especially if we can enable it automatically when the node has been offline for a long time. There's been a bit of discussion about this issue over at #8148 (comment), with different ideas about what should be done. @ProofOfKeags mentioned isolating the contractcourt and sweeper entirely from the rest of LND, as a separate process. I think this is the holy grail for protecting funds under DoS, but it's a big project. It would be significantly more work than I intended to propose in this issue. @Roasbeef suggested waiting for commitments and HTLCs to confirm on chain before allowing peer connections, providing similar isolation to running a separate process, but during startup only. The major downside of this is the 10+ minute delay (potentially hours) it would add to startup times before peers could connect. I think we could get some quick incremental improvement over the current state without too much work or downside. Rather than waiting for transactions to confirm we could simply ensure they are broadcast before allowing peer connections. Concretely, we would do something like this:
With these changes, any transactions that need to be published at the current block should be published during startup. Goroutines are only spawned for future events, not things that can be handled immediately. We'd want to do some benchmarking of startup times, but I suspect the effect will be relatively small. |
When LND starts up, it should handle any immediately serviceable on-chain events (breach retribution, sweeping HTLCs, etc.) prior to starting the connection manager. Once the connection manager is running, LND's attack surface widens greatly, so it is best to handle critical on-chain things beforehand.
During startup, we already wait for the wallet to sync with the blockchain before doing anything else. But we don't do the same for
ChainArbitrator
inserver.Start()
. This creates a race where it is possible for a peer to connect to our node before on-chain events have been handled.This could be a problem if an attacker has a fast way to crash our node after connecting (e.g., ping of death). They could crash our node and broadcast a commitment tx while we're offline. Then every time our node tries to restart, the attacker may be able to crash it again before it punishes revoked commitments or sweeps HTLCs.
The text was updated successfully, but these errors were encountered: