blockchain: Decouple processing and download logic. #2518

davecgh · 2020-12-23T21:56:10Z

This completely reworks the way block index and processing works to use headers-first semantics and support out of order processing of the associated block data.

This will ultimately provide a wide variety of benefits as it means the entire shape of the block tree can be determined from the headers alone which in turn will allow much more informed decisions to be made, provides better information regarding sync status and warning conditions, and allows future work to add invalidation and reconsideration of arbitrary blocks.

It must be noted that this is a major change to the way the block index and block handling is done and is deeply integrated with the consensus rules, so it will need a significant amount of testing and extremely careful review. To that end, the changes have been split up into several commits to help ease review, but, unfortunately the vast majority of them really couldn't be reasonably split, so the bulk of hardest to review changes are in a single commit.

Also of note is that there are a lot of assumptions made in the calling code in regards to expected error attribution and the state the chain has at the moment notifications are processed, so all of those semantics have intentionally been retained, even though they might seem a bit out of place now, in order to limit the amount of changes introduced at a time and thus better ensures correctness.

For example, in the future it would probably make more sense to make the notifications entirely asynchronous and for validation failures to be reported via those asynchronous notifications. However, before that can happen, all callers would first need to be updated to ensure they do not rely on the chain being in the same state it was at the moment the notification was generated.

Another significant change is that the logic that is used to determine if the chain believes it is current is now more robust and takes advantage of the fact the best known header is now available.

In particular, it introduces two new constraints:

A latching mechanism that prevents the chain from flipping back to being non current unless no new blocks have been seen for an extended period of time
The chain only latches to current once it is synced to the header with the most cumulative work that is not known to be invalid in addition to the existing conditions

Finally, in addition to all of the existing full block tests, this introduces a comprehensive set of processing order tests which exercise all of the new logic.

High level overview of the changes:

Introduce tracking for whether a block is fully linked, meaning it builds on a branch that has block data for all of its ancestors
Add received order tracking to ensure miners are not able to gain an advantage in terms of chain selection by only advertising a header
Add several new pieces of information to the block index:
- Header with the most cumulative work that is not known to be invalid
- Header with the most cumulative work that is known to be invalid
- Map of best chain candidates to aid in efficient selection
- Map of unlinked children for more efficient linking
- Prunable cached chain tips to significantly reduce potential search space during invalidation
Introduce a compare function which determines which of two nodes should be considered better for the purposes of best chain selection
Add chain tip iteration capabilities with potential filtering
Mark all descendants invalid due to known invalid ancestor when a block is marked invalid
Add ability to determine if a block can currently be validated
Rework the chain reorganization func to work with an arbitrary target
Modify the overall chain reorg func to reorg to the block with the most cumulative work that is valid in the case it is not possible to reorg to the target block
Add a new MultiError type for keeping track of multiple errors in the same call
Add a new ErrNoBlockData error kind
Update all cases that only require all of the ancestor block data to be available to check for that condition instead of the more strict condition of already being fully validated
Introduce a new processing lock separate from the overall chain lock
Add new method and ability to independently accept block headers
Rework the block processing func to accept blocks out of order so long as their header is already known valid
Keep a cache of blocks that recently passed contextual validation checks
Cleanup and correct various comments to match reality
Rework chain current logic to improve robustness and take advantage new best known header availability
- Adds a latching mechanism that prevents the chain from flipping back to being non current unless no new blocks have been seen for an extended period of time
- Only latches the chain to current once it is synced to the header with the most cumulative work that is not known to be invalid in addition to the existing conditions
Add comprehensive tests to exercise the new processing logic and best header tracking (for both invalid and not known invalid)
- Support query block test name by hash in the testing
- Improve test harness logging
- Add separate test block generation and acceptance
Update README.md and doc.go to accommodate the changes

This is work towards #1145.

davecgh · 2020-12-23T22:10:19Z

Oh, one other thing I forgot to call out is that the NTBlockAccepted notifications are now delivered at the same time for all connected blocks as opposed to interspersed with every block as it's processed. This is acceptable because all of the information the code that uses the notification require is available in the notification.

In order to increase confidence in the notifications in general, I also added some logging for them as they're sent and compared the before and after of the full block tests as follows:

$ cat testfullblocks.txt | grep NT | cut -d' ' -f2 | sort | uniq -c
    204 NTBlockAccepted
    210 NTBlockConnected
     20 NTBlockDisconnected
     12 NTChainReorgDone
     12 NTChainReorgStarted
    179 NTNewTickets
    226 NTNewTipBlockChecked
      7 NTReorganization
    179 NTSpentAndMissedTickets

$ cat testfullblocksnew.txt | grep NT | cut -d' ' -f2 | sort | uniq -c
    204 NTBlockAccepted
    210 NTBlockConnected
     20 NTBlockDisconnected
     12 NTChainReorgDone
     12 NTChainReorgStarted
    179 NTNewTickets
    226 NTNewTipBlockChecked
      7 NTReorganization
    179 NTSpentAndMissedTickets

As can be seen the same number of and type of notifications are being produced. Closer inspection at the logs shows they are produced in the same order as expected as well.

davecgh

Comment typos.

blockchain/checkpoints.go

blockchain/blockindex.go

blockchain/chain.go

blockchain/process.go

blockchain/blockindex.go

blockchain/chain.go

blockchain/blockindex.go

blockchain/chainquery.go

blockchain/blockindex.go

blockchain/common_test.go

blockchain/process_test.go

blockchain/blockindex.go

rstaudt2

Got through the tests so far and called out a few typos. The process tests read very well and are super helpful for visualizing all of the various scenarios this takes into account!

blockchain/process_test.go

blockchain/blockindex_test.go

blockchain/common_test.go

blockchain/process_test.go

JoeGruffins

Awesome.

On testnet there is a lot of output. One warning a few times [WRN] FEES: Trying to process mined transactions at block 587476 when previous best block was at height 587476 at different heights. Also this info [INF] CHAN: FORK: Block 0000002b64225f709a777cbb2acdb011705d8f43bbf32888eeed8476c5c4ecf1 (height 587489) forks the chain at height 587488/block 0000009a0b2f93131a56d24b5151337ebcdd259515b0ef9cf204e44469d01636, but does not cause a reorganize numerous times with different values for the same block. I don't see either of those logs on mainnet, so wondering why such a difference.

All expected?

I also observed a successful reorg happen on testnet.

I am unable to use wallet with the new rpc api version yet, so unable to test sending transactions and such easily.

blockchain/chain.go

blockchain/common_test.go

blockchain/process.go

davecgh · 2020-12-28T05:56:51Z

On testnet there is a lot of output. One warning a few times [WRN] FEES: Trying to process mined transactions at block 587476 when previous best block was at height 587476 at different heights. Also this info [INF] CHAN: FORK: Block 0000002b64225f709a777cbb2acdb011705d8f43bbf32888eeed8476c5c4ecf1 (height 587489) forks the chain at height 587488/block 0000009a0b2f93131a56d24b5151337ebcdd259515b0ef9cf204e44469d01636, but does not cause a reorganize numerous times with different values for the same block. I don't see either of those logs on mainnet, so wondering why such a difference.

All expected?

Yes, testnet has a lot of competing blocks show up around the time the difficulty drops. Mainnet does not have a difficulty reduction like that, so it doesn't happen there.

The fees warning is also normal. It happens for all forked blocks. It probably doesn't actually need to be a warning, but the behavior is the same on master.

blockchain/blockindex.go

matheusd

Finished the first pass review. Aside from the minor issues, looks good. Will do another general pass after the duplicate positional checks issue is addressed. I also ran the sync a bunch of times in mainnet.

Just wanna say, this really mustn't have been fun to write, since it involves such a critical piece of the processing code for the blockchain. Definitely was hard to review, since the individual checks are all interdependent in the critical commit.

OTOH, the resulting decoupled processing is a very nice thing to achieve :P

blockchain/chain.go

blockchain/process_test.go

matheusd · 2020-12-31T00:10:53Z

blockchain/process_test.go

+// variety of fairly complex scenarios selects the expected best chain and
+// properly tracks the header with the most cumulative work that is not known to
+// be invalid as well as the one that is known to be invalid (when it exists).
+func TestProcessLogic(t *testing.T) {


Also, just wanna hightlight this is an awesome test :P

davecgh · 2020-12-31T01:53:41Z

... the "Verified checkpoint at height XXX" is duplicated.

EDIT: I tracked the duplicate print down and it's because the positional block header checks are being run twice. It's not a big deal, but I'll get this updated to avoid doing that.

This has been resolved. I refactored the checks which apply to the block data to a separate function and call it instead when accepting the block data since the headers have already been checked at that point.

View Changes.

blockchain/blockindex.go

matheusd · 2020-12-31T15:17:29Z

Did a second pass review of the code (as of now), taking note of the final structure for ProcessBlock()/ProcessBlockHeader() for the most important sub-calls and some of the checks performed. Here it is:

chain.ProcessBlockHeader()
  * chain.maybeAcceptBlockHeader(checkHeaderSanity == true)
    * checkBlockHeaderSanity()
      - checks PoW
    * chain.checkBlockHeaderPositional()
      - diff retargeting
      - correct height
    * index.AddNode()
      - create or extend header tip
      - maybe change best header
    
chain.ProcessBlock()
  # (block header check stage)
  * checkBlockSanity()
    * checkBlockHeaderSanity()
      - checks PoW
    * checkProofOfStake
    - checks block size < wire.MaxBlockPayload
    * checkTransactionSanity() 					# (regular then stake txs)
    - checks total ticket and revocation # in header
  * chain.maybeAcceptBlockHeader(checkHeaderSanity == false)
    * chain.checkBlockHeaderPositional()
      - checks diff retargeting
      - checks correct height
    * index.AddNode()
      - create or extend header tip
      - maybe change best header
      
  # (block data check stage)
  * chain.maybeAcceptBlockData()    
    * chain.checkBlockDataPositional()
      - checks for expired txs
    * index.AcceptBlockData()
  * chain.maybeAcceptBlocks()
    * chain.checkBlockContext()
      * chain.checkBlockHeaderContext()
        - checks stakediff in header 
        - checks poolsize in header
      * checkTransactionContext()         			# (repeated for every tx and stx)
      - checks stake tx counts limits
      - checks finalized txs
      - checks count sig ops
      - checks tspends only in TVI blocks
      
  # (best chain selection stage)
  * chain.reorganizeChain()
    * chain.reorganizeChainInternal()     			# (repeated until a valid tip is found)
      * utxos.disconnectBlock()					# (when reorging to a different branch)
      * chain.disconnectBlock()
      * chain.checkBlockContext()         			# (if not done yet for some block - see previous call)
      * chain.checkConnectBlock()         			# (called for every block in the new branch)
        * chain.tspendChecks()
          - checks tspend in correct block
          * chain.checkTSpendHasVotes()
          * chain.checkTSPendsExpenditure()
        * utxos.disconnectDisapprovedBlock() 			# (if block votes to disapprove parent block)
        * chain.checkTransactionsAndConnect() 			# (first stake, then regular txs)
        * checkBlockScripts() 					# (first stake, then regular txs)
        * blockOneCoinbasePaysTokens()
      * chain.connectBlock()              			# (called for every block that passes the above)
        * dbPut****()						# (store best state, utxo set, treasury, gcs filter)
        - update in-memory best state
        - send block connected & tickets ntnfs
      * index.RemoveLessWorkCandidates
    * chain.maybeUpdateIsCurrent()

matheusd

Also ran the sync again a few times and executed the test suite on every commit.

Woop!

davecgh · 2021-01-06T04:47:54Z

I updated this to ensure the index lock is not held during flushing in case pruned ticket information has to be reloaded which itself requires the index lock and to add a couple of trace logging statements for connected and disconnected blocks.

Also, rebased.

This makes the logging in the test harness more consistent and improves it to make use of the ability to look up the test name for blocks when logging issues for easier identification.

This modifies the chain test harness to allow the initial blocks that are generated to specified heights, such as stake validation height, to be independently generated without also accepting them to the chain. It also allows a custom underlying chain generator which houses those blocks to be provided when creating a test harness. These two combined provide more flexibility to the tests since they can choose when (and if) to make the generated blocks available to the chain.

sefbkn · 2021-01-07T00:36:41Z

This PR looks solid overall. Thank you for taking the time to write this along with clear comments and tests. I am still working through it, but have not seen any issues with the latest code thusfar.

—

On a mostly unrelated note with respect to this PR, while syncing I did notice a log message that appears incorrect. It cropped up while testing this PR so I figured it worth mentioning at least even though it's not coming from this set of changes.

[WRN] PEER: Misbehaving peer xxx:9108 (outbound): 0 transactions not found -- ban score increased to 80
...
[INF] SRVR: Banned peer xxx (outbound) for 24h0m0s

In server.go -> OnNotFound, the following line

		reason := fmt.Sprintf("%d %v not found", numBlocks, txStr)

seems like it should be

		reason := fmt.Sprintf("%d %v not found", numTxns, txStr)

blockchain/chain.go

This completely reworks the way block index and processing works to use headers-first semantics and support out of order processing of the associated block data. This will ultimately provide a wide variety of benefits as it means the entire shape of the block tree can be determined from the headers alone which in turn will allow much more informed decisions to be made, provides better information regarding sync status and warning conditions, and allows future work to add invalidation and reconsideration of arbitrary blocks. It must be noted that this is a major change to the way the block index and block handling is done and is deeply integrated with the consensus rules, so it will need a significant amount of testing and extremely careful review. Also of note is that there are a lot of assumptions made in the calling code in regards to expected error attribution and the state the chain has at the moment notifications are processed, so all of those semantics have intentionally been retained, even though they might seem a bit out of place now, in order to limit the amount of changes introduced at a time and thus better ensures correctness. For example, in the future it would probably make more sense to make the notifications entirely asynchronous and for validation failures to be reported via those asynchronous notifications. However, before that can happen, all callers would first need to be updated to ensure they do not rely on the chain being in the same state it was at the moment the notification was generated. In addition to all of the existing full block tests, this introduces a comprehensive set of processing order tests which exercise all of the new logic. High level overview of the changes: - Introduce tracking for whether a block is fully linked, meaning it builds on a branch that has block data for all of its ancestors - Add received order tracking to ensure miners are not able to gain an advantage in terms of chain selection by only advertising a header - Add several new pieces of information to the block index: - Header with the most cumulative work that is not known to be invalid - Header with the most cumulative work that is known to be invalid - Map of best chain candidates to aid in efficient selection - Map of unlinked children for more efficient linking - Prunable cached chain tips to significantly reduce potential search space during invalidation - Introduce a compare function which determines which of two nodes should be considered better for the purposes of best chain selection - Add chain tip iteration capabilities with potential filtering - Mark all descendants invalid due to known invalid ancestor when a block is marked invalid - Add ability to determine if a block can currently be validated - Rework the chain reorganization func to work with an arbitrary target - Modify the overall chain reorg func to reorg to the block with the most cumulative work that is valid in the case it is not possible to reorg to the target block - Add a new MultiError type for keeping track of multiple errors in the same call - Add a new ErrNoBlockData error kind - Update all cases that only require all of the ancestor block data to be available to check for that condition instead of the more strict condition of already being fully validated - Introduce a new processing lock separate from the overall chain lock - Add new method and ability to independently accept block headers - Rework the block processing func to accept blocks out of order so long as their header is already known valid - Keep a cache of blocks that recently passed contextual validation checks - Cleanup and correct various comments to match reality - Add comprehensive tests to exercise the new processing logic and best header tracking (for both invalid and not known invalid)

This reworks the logic that is used to determine if the chain believes it is current to be more robust and to take advantage of the fact that the best known header is now available. In particular, it introduces two new constraints: - A latching mechanism that prevents the chain from flipping back to being non current unless no new blocks have been seen for an extended period of time - The chain only latches to current once it is synced to the header with the most cumulative work that is not known to be invalid in addition to the existing conditions

This modifies the getblockchaininfo RPC handler to use the newly-exposed best header that is not known to be invalid from chain versus the height of the current best chain tip.

This modifies the README.md and doc.go files to match the latest code.

davecgh · 2021-01-07T16:45:39Z

Thanks to everyone for reviewing this. I know it was not a trivial one to review and took a lot of effort.

davecgh added this to the 1.7.0 milestone Dec 23, 2020

davecgh mentioned this pull request Dec 23, 2020

Multi-peer Checklist #1145

Open

33 tasks

davecgh commented Dec 24, 2020

View reviewed changes

dnldd reviewed Dec 24, 2020

View reviewed changes

blockchain/blockindex.go Outdated Show resolved Hide resolved

dnldd reviewed Dec 24, 2020

View reviewed changes

blockchain/blockindex.go Show resolved Hide resolved

dnldd reviewed Dec 24, 2020

View reviewed changes

blockchain/chain.go Outdated Show resolved Hide resolved

davecgh force-pushed the blockchain_decouple_processing branch 2 times, most recently from 4e616cc to d250928 Compare December 24, 2020 18:45

sefbkn reviewed Dec 24, 2020

View reviewed changes

blockchain/blockindex.go Outdated Show resolved Hide resolved

blockchain/blockindex.go Outdated Show resolved Hide resolved

sefbkn reviewed Dec 24, 2020

View reviewed changes

blockchain/chainquery.go Show resolved Hide resolved

sefbkn reviewed Dec 24, 2020

View reviewed changes

blockchain/blockindex.go Show resolved Hide resolved

davecgh force-pushed the blockchain_decouple_processing branch from d250928 to 6e16986 Compare December 24, 2020 22:17

dnldd reviewed Dec 24, 2020

View reviewed changes

blockchain/common_test.go Outdated Show resolved Hide resolved

davecgh force-pushed the blockchain_decouple_processing branch from 6e16986 to 248bdcd Compare December 24, 2020 22:38

davecgh commented Dec 25, 2020

View reviewed changes

blockchain/process_test.go Outdated Show resolved Hide resolved

davecgh mentioned this pull request Dec 25, 2020

blockchain: Explicit hash in estimate stake diff. #2524

Merged

davecgh force-pushed the blockchain_decouple_processing branch from 248bdcd to d64afaf Compare December 25, 2020 20:03

JoeGruffins reviewed Dec 26, 2020

View reviewed changes

blockchain/blockindex.go Outdated Show resolved Hide resolved

blockchain/blockindex.go Show resolved Hide resolved

davecgh force-pushed the blockchain_decouple_processing branch from d64afaf to e041543 Compare December 26, 2020 18:35

davecgh added the non-forking consensus Changes that involve modifying consensus code without causing any forking changes. label Dec 26, 2020

rstaudt2 reviewed Dec 27, 2020

View reviewed changes

davecgh commented Dec 27, 2020

View reviewed changes

blockchain/process_test.go Outdated Show resolved Hide resolved

davecgh force-pushed the blockchain_decouple_processing branch 2 times, most recently from ae6b99f to 434823a Compare December 27, 2020 23:57

JoeGruffins approved these changes Dec 28, 2020

View reviewed changes

davecgh force-pushed the blockchain_decouple_processing branch from 434823a to 7f73122 Compare December 28, 2020 05:58

davecgh commented Dec 28, 2020

View reviewed changes

blockchain/blockindex.go Outdated Show resolved Hide resolved

davecgh force-pushed the blockchain_decouple_processing branch from 7f73122 to c0aaa33 Compare December 29, 2020 03:16

matheusd reviewed Dec 31, 2020

View reviewed changes

blockchain/chain.go Outdated Show resolved Hide resolved

matheusd reviewed Dec 31, 2020

View reviewed changes

blockchain/process_test.go Show resolved Hide resolved

matheusd reviewed Dec 31, 2020

View reviewed changes

davecgh force-pushed the blockchain_decouple_processing branch 2 times, most recently from 600154d to bf07a55 Compare December 31, 2020 01:53

davecgh force-pushed the blockchain_decouple_processing branch from bf07a55 to a327a43 Compare December 31, 2020 01:56

davecgh commented Dec 31, 2020

View reviewed changes

blockchain/blockindex.go Outdated Show resolved Hide resolved

davecgh force-pushed the blockchain_decouple_processing branch from a327a43 to a78f426 Compare December 31, 2020 05:45

matheusd approved these changes Dec 31, 2020

View reviewed changes

dnldd approved these changes Jan 5, 2021

View reviewed changes

davecgh force-pushed the blockchain_decouple_processing branch from a78f426 to b6b2be8 Compare January 6, 2021 04:42

davecgh added 3 commits January 5, 2021 22:48

chaingen: Support querying block test name by hash.

b2f035d

blockchain: Improve test harness logging.

f7f5aa7

This makes the logging in the test harness more consistent and improves it to make use of the ability to look up the test name for blocks when logging issues for easier identification.

davecgh force-pushed the blockchain_decouple_processing branch from b6b2be8 to 76b3e5a Compare January 6, 2021 04:49

matheusd approved these changes Jan 6, 2021

View reviewed changes

sefbkn reviewed Jan 7, 2021

View reviewed changes

blockchain/chain.go Outdated Show resolved Hide resolved

sefbkn approved these changes Jan 7, 2021

View reviewed changes

davecgh added 4 commits January 7, 2021 10:43

rpcserver: Update getblockchaininfo best header.

f3890ca

This modifies the getblockchaininfo RPC handler to use the newly-exposed best header that is not known to be invalid from chain versus the height of the current best chain tip.

blockchain: Update README.md and doc.go.

12e132e

This modifies the README.md and doc.go files to match the latest code.

davecgh force-pushed the blockchain_decouple_processing branch from 76b3e5a to 12e132e Compare January 7, 2021 16:44

davecgh merged commit 12e132e into decred:master Jan 7, 2021

davecgh deleted the blockchain_decouple_processing branch January 7, 2021 16:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blockchain: Decouple processing and download logic. #2518

blockchain: Decouple processing and download logic. #2518

davecgh commented Dec 23, 2020 •

edited

Loading

davecgh commented Dec 23, 2020

davecgh left a comment

rstaudt2 left a comment

JoeGruffins left a comment

davecgh commented Dec 28, 2020

matheusd left a comment

matheusd Dec 31, 2020

davecgh commented Dec 31, 2020 •

edited

Loading

matheusd commented Dec 31, 2020

matheusd left a comment

davecgh commented Jan 6, 2021 •

edited

Loading

sefbkn commented Jan 7, 2021

davecgh commented Jan 7, 2021

blockchain: Decouple processing and download logic. #2518

blockchain: Decouple processing and download logic. #2518

Conversation

davecgh commented Dec 23, 2020 • edited Loading

High level overview of the changes:

davecgh commented Dec 23, 2020

davecgh left a comment

Choose a reason for hiding this comment

rstaudt2 left a comment

Choose a reason for hiding this comment

JoeGruffins left a comment

Choose a reason for hiding this comment

davecgh commented Dec 28, 2020

matheusd left a comment

Choose a reason for hiding this comment

matheusd Dec 31, 2020

Choose a reason for hiding this comment

davecgh commented Dec 31, 2020 • edited Loading

matheusd commented Dec 31, 2020

matheusd left a comment

Choose a reason for hiding this comment

davecgh commented Jan 6, 2021 • edited Loading

sefbkn commented Jan 7, 2021

davecgh commented Jan 7, 2021

davecgh commented Dec 23, 2020 •

edited

Loading

davecgh commented Dec 31, 2020 •

edited

Loading

davecgh commented Jan 6, 2021 •

edited

Loading