Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blockchain: Decouple processing and download logic. #2518

Merged
merged 7 commits into from
Jan 7, 2021

Conversation

davecgh
Copy link
Member

@davecgh davecgh commented Dec 23, 2020

This completely reworks the way block index and processing works to use headers-first semantics and support out of order processing of the associated block data.

This will ultimately provide a wide variety of benefits as it means the entire shape of the block tree can be determined from the headers alone which in turn will allow much more informed decisions to be made, provides better information regarding sync status and warning conditions, and allows future work to add invalidation and reconsideration of arbitrary blocks.

It must be noted that this is a major change to the way the block index and block handling is done and is deeply integrated with the consensus rules, so it will need a significant amount of testing and extremely careful review. To that end, the changes have been split up into several commits to help ease review, but, unfortunately the vast majority of them really couldn't be reasonably split, so the bulk of hardest to review changes are in a single commit.

Also of note is that there are a lot of assumptions made in the calling code in regards to expected error attribution and the state the chain has at the moment notifications are processed, so all of those semantics have intentionally been retained, even though they might seem a bit out of place now, in order to limit the amount of changes introduced at a time and thus better ensures correctness.

For example, in the future it would probably make more sense to make the notifications entirely asynchronous and for validation failures to be reported via those asynchronous notifications. However, before that can happen, all callers would first need to be updated to ensure they do not rely on the chain being in the same state it was at the moment the notification was generated.

Another significant change is that the logic that is used to determine if the chain believes it is current is now more robust and takes advantage of the fact the best known header is now available.

In particular, it introduces two new constraints:

  • A latching mechanism that prevents the chain from flipping back to being non current unless no new blocks have been seen for an extended period of time
  • The chain only latches to current once it is synced to the header with the most cumulative work that is not known to be invalid in addition to the existing conditions

Finally, in addition to all of the existing full block tests, this introduces a comprehensive set of processing order tests which exercise all of the new logic.

High level overview of the changes:

  • Introduce tracking for whether a block is fully linked, meaning it builds on a branch that has block data for all of its ancestors
  • Add received order tracking to ensure miners are not able to gain an advantage in terms of chain selection by only advertising a header
  • Add several new pieces of information to the block index:
    • Header with the most cumulative work that is not known to be invalid
    • Header with the most cumulative work that is known to be invalid
    • Map of best chain candidates to aid in efficient selection
    • Map of unlinked children for more efficient linking
    • Prunable cached chain tips to significantly reduce potential search space during invalidation
  • Introduce a compare function which determines which of two nodes should be considered better for the purposes of best chain selection
  • Add chain tip iteration capabilities with potential filtering
  • Mark all descendants invalid due to known invalid ancestor when a block is marked invalid
  • Add ability to determine if a block can currently be validated
  • Rework the chain reorganization func to work with an arbitrary target
  • Modify the overall chain reorg func to reorg to the block with the most cumulative work that is valid in the case it is not possible to reorg to the target block
  • Add a new MultiError type for keeping track of multiple errors in the same call
  • Add a new ErrNoBlockData error kind
  • Update all cases that only require all of the ancestor block data to be available to check for that condition instead of the more strict condition of already being fully validated
  • Introduce a new processing lock separate from the overall chain lock
  • Add new method and ability to independently accept block headers
  • Rework the block processing func to accept blocks out of order so long as their header is already known valid
  • Keep a cache of blocks that recently passed contextual validation checks
  • Cleanup and correct various comments to match reality
  • Rework chain current logic to improve robustness and take advantage new best known header availability
    • Adds a latching mechanism that prevents the chain from flipping back to being non current unless no new blocks have been seen for an extended period of time
    • Only latches the chain to current once it is synced to the header with the most cumulative work that is not known to be invalid in addition to the existing conditions
  • Add comprehensive tests to exercise the new processing logic and best header tracking (for both invalid and not known invalid)
    • Support query block test name by hash in the testing
    • Improve test harness logging
    • Add separate test block generation and acceptance
  • Update README.md and doc.go to accommodate the changes

This is work towards #1145.

@davecgh davecgh added this to the 1.7.0 milestone Dec 23, 2020
@davecgh davecgh mentioned this pull request Dec 23, 2020
33 tasks
@davecgh
Copy link
Member Author

davecgh commented Dec 23, 2020

Oh, one other thing I forgot to call out is that the NTBlockAccepted notifications are now delivered at the same time for all connected blocks as opposed to interspersed with every block as it's processed. This is acceptable because all of the information the code that uses the notification require is available in the notification.

In order to increase confidence in the notifications in general, I also added some logging for them as they're sent and compared the before and after of the full block tests as follows:

$ cat testfullblocks.txt | grep NT | cut -d' ' -f2 | sort | uniq -c
    204 NTBlockAccepted
    210 NTBlockConnected
     20 NTBlockDisconnected
     12 NTChainReorgDone
     12 NTChainReorgStarted
    179 NTNewTickets
    226 NTNewTipBlockChecked
      7 NTReorganization
    179 NTSpentAndMissedTickets

$ cat testfullblocksnew.txt | grep NT | cut -d' ' -f2 | sort | uniq -c
    204 NTBlockAccepted
    210 NTBlockConnected
     20 NTBlockDisconnected
     12 NTChainReorgDone
     12 NTChainReorgStarted
    179 NTNewTickets
    226 NTNewTipBlockChecked
      7 NTReorganization
    179 NTSpentAndMissedTickets

As can be seen the same number of and type of notifications are being produced. Closer inspection at the logs shows they are produced in the same order as expected as well.

Copy link
Member Author

@davecgh davecgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment typos.

blockchain/checkpoints.go Outdated Show resolved Hide resolved
blockchain/blockindex.go Outdated Show resolved Hide resolved
blockchain/blockindex.go Outdated Show resolved Hide resolved
blockchain/chain.go Outdated Show resolved Hide resolved
blockchain/chain.go Outdated Show resolved Hide resolved
blockchain/process.go Outdated Show resolved Hide resolved
blockchain/process.go Outdated Show resolved Hide resolved
blockchain/process.go Outdated Show resolved Hide resolved
blockchain/process.go Outdated Show resolved Hide resolved
blockchain/process.go Outdated Show resolved Hide resolved
blockchain/blockindex.go Outdated Show resolved Hide resolved
blockchain/chain.go Outdated Show resolved Hide resolved
@davecgh davecgh force-pushed the blockchain_decouple_processing branch 2 times, most recently from 4e616cc to d250928 Compare December 24, 2020 18:45
blockchain/blockindex.go Outdated Show resolved Hide resolved
blockchain/blockindex.go Outdated Show resolved Hide resolved
@davecgh davecgh force-pushed the blockchain_decouple_processing branch from d250928 to 6e16986 Compare December 24, 2020 22:17
blockchain/common_test.go Outdated Show resolved Hide resolved
@davecgh davecgh force-pushed the blockchain_decouple_processing branch from 6e16986 to 248bdcd Compare December 24, 2020 22:38
blockchain/process_test.go Outdated Show resolved Hide resolved
blockchain/blockindex.go Outdated Show resolved Hide resolved
blockchain/blockindex.go Show resolved Hide resolved
@davecgh davecgh force-pushed the blockchain_decouple_processing branch from d64afaf to e041543 Compare December 26, 2020 18:35
@davecgh davecgh added the non-forking consensus Changes that involve modifying consensus code without causing any forking changes. label Dec 26, 2020
Copy link
Member

@rstaudt2 rstaudt2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got through the tests so far and called out a few typos. The process tests read very well and are super helpful for visualizing all of the various scenarios this takes into account!

blockchain/process_test.go Outdated Show resolved Hide resolved
blockchain/process_test.go Outdated Show resolved Hide resolved
blockchain/process_test.go Outdated Show resolved Hide resolved
blockchain/process_test.go Outdated Show resolved Hide resolved
blockchain/blockindex_test.go Outdated Show resolved Hide resolved
blockchain/common_test.go Outdated Show resolved Hide resolved
blockchain/common_test.go Outdated Show resolved Hide resolved
blockchain/process_test.go Outdated Show resolved Hide resolved
@davecgh davecgh force-pushed the blockchain_decouple_processing branch 2 times, most recently from ae6b99f to 434823a Compare December 27, 2020 23:57
Copy link
Member

@JoeGruffins JoeGruffins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome.

On testnet there is a lot of output. One warning a few times [WRN] FEES: Trying to process mined transactions at block 587476 when previous best block was at height 587476 at different heights. Also this info [INF] CHAN: FORK: Block 0000002b64225f709a777cbb2acdb011705d8f43bbf32888eeed8476c5c4ecf1 (height 587489) forks the chain at height 587488/block 0000009a0b2f93131a56d24b5151337ebcdd259515b0ef9cf204e44469d01636, but does not cause a reorganize numerous times with different values for the same block. I don't see either of those logs on mainnet, so wondering why such a difference.

All expected?

I also observed a successful reorg happen on testnet.

I am unable to use wallet with the new rpc api version yet, so unable to test sending transactions and such easily.

blockchain/chain.go Show resolved Hide resolved
blockchain/common_test.go Outdated Show resolved Hide resolved
blockchain/process.go Show resolved Hide resolved
blockchain/process.go Show resolved Hide resolved
blockchain/process.go Show resolved Hide resolved
blockchain/process.go Show resolved Hide resolved
blockchain/process.go Show resolved Hide resolved
@davecgh
Copy link
Member Author

davecgh commented Dec 28, 2020

On testnet there is a lot of output. One warning a few times [WRN] FEES: Trying to process mined transactions at block 587476 when previous best block was at height 587476 at different heights. Also this info [INF] CHAN: FORK: Block 0000002b64225f709a777cbb2acdb011705d8f43bbf32888eeed8476c5c4ecf1 (height 587489) forks the chain at height 587488/block 0000009a0b2f93131a56d24b5151337ebcdd259515b0ef9cf204e44469d01636, but does not cause a reorganize numerous times with different values for the same block. I don't see either of those logs on mainnet, so wondering why such a difference.

All expected?

Yes, testnet has a lot of competing blocks show up around the time the difficulty drops. Mainnet does not have a difficulty reduction like that, so it doesn't happen there.

The fees warning is also normal. It happens for all forked blocks. It probably doesn't actually need to be a warning, but the behavior is the same on master.

@davecgh davecgh force-pushed the blockchain_decouple_processing branch from 434823a to 7f73122 Compare December 28, 2020 05:58
blockchain/blockindex.go Outdated Show resolved Hide resolved
@davecgh davecgh force-pushed the blockchain_decouple_processing branch from 7f73122 to c0aaa33 Compare December 29, 2020 03:16
Copy link
Member

@matheusd matheusd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished the first pass review. Aside from the minor issues, looks good. Will do another general pass after the duplicate positional checks issue is addressed. I also ran the sync a bunch of times in mainnet.

Just wanna say, this really mustn't have been fun to write, since it involves such a critical piece of the processing code for the blockchain. Definitely was hard to review, since the individual checks are all interdependent in the critical commit.

OTOH, the resulting decoupled processing is a very nice thing to achieve :P

blockchain/chain.go Outdated Show resolved Hide resolved
// variety of fairly complex scenarios selects the expected best chain and
// properly tracks the header with the most cumulative work that is not known to
// be invalid as well as the one that is known to be invalid (when it exists).
func TestProcessLogic(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, just wanna hightlight this is an awesome test :P

@davecgh davecgh force-pushed the blockchain_decouple_processing branch 2 times, most recently from 600154d to bf07a55 Compare December 31, 2020 01:53
@davecgh
Copy link
Member Author

davecgh commented Dec 31, 2020

... the "Verified checkpoint at height XXX" is duplicated.

EDIT: I tracked the duplicate print down and it's because the positional block header checks are being run twice. It's not a big deal, but I'll get this updated to avoid doing that.

This has been resolved. I refactored the checks which apply to the block data to a separate function and call it instead when accepting the block data since the headers have already been checked at that point.

View Changes.

@davecgh davecgh force-pushed the blockchain_decouple_processing branch from bf07a55 to a327a43 Compare December 31, 2020 01:56
blockchain/blockindex.go Outdated Show resolved Hide resolved
@davecgh davecgh force-pushed the blockchain_decouple_processing branch from a327a43 to a78f426 Compare December 31, 2020 05:45
@matheusd
Copy link
Member

Did a second pass review of the code (as of now), taking note of the final structure for ProcessBlock()/ProcessBlockHeader() for the most important sub-calls and some of the checks performed. Here it is:

chain.ProcessBlockHeader()
  * chain.maybeAcceptBlockHeader(checkHeaderSanity == true)
    * checkBlockHeaderSanity()
      - checks PoW
    * chain.checkBlockHeaderPositional()
      - diff retargeting
      - correct height
    * index.AddNode()
      - create or extend header tip
      - maybe change best header
    
chain.ProcessBlock()
  # (block header check stage)
  * checkBlockSanity()
    * checkBlockHeaderSanity()
      - checks PoW
    * checkProofOfStake
    - checks block size < wire.MaxBlockPayload
    * checkTransactionSanity() 					# (regular then stake txs)
    - checks total ticket and revocation # in header
  * chain.maybeAcceptBlockHeader(checkHeaderSanity == false)
    * chain.checkBlockHeaderPositional()
      - checks diff retargeting
      - checks correct height
    * index.AddNode()
      - create or extend header tip
      - maybe change best header
      
  # (block data check stage)
  * chain.maybeAcceptBlockData()    
    * chain.checkBlockDataPositional()
      - checks for expired txs
    * index.AcceptBlockData()
  * chain.maybeAcceptBlocks()
    * chain.checkBlockContext()
      * chain.checkBlockHeaderContext()
        - checks stakediff in header 
        - checks poolsize in header
      * checkTransactionContext()         			# (repeated for every tx and stx)
      - checks stake tx counts limits
      - checks finalized txs
      - checks count sig ops
      - checks tspends only in TVI blocks
      
  # (best chain selection stage)
  * chain.reorganizeChain()
    * chain.reorganizeChainInternal()     			# (repeated until a valid tip is found)
      * utxos.disconnectBlock()					# (when reorging to a different branch)
      * chain.disconnectBlock()
      * chain.checkBlockContext()         			# (if not done yet for some block - see previous call)
      * chain.checkConnectBlock()         			# (called for every block in the new branch)
        * chain.tspendChecks()
          - checks tspend in correct block
          * chain.checkTSpendHasVotes()
          * chain.checkTSPendsExpenditure()
        * utxos.disconnectDisapprovedBlock() 			# (if block votes to disapprove parent block)
        * chain.checkTransactionsAndConnect() 			# (first stake, then regular txs)
        * checkBlockScripts() 					# (first stake, then regular txs)
        * blockOneCoinbasePaysTokens()
      * chain.connectBlock()              			# (called for every block that passes the above)
        * dbPut****()						# (store best state, utxo set, treasury, gcs filter)
        - update in-memory best state
        - send block connected & tickets ntnfs
      * index.RemoveLessWorkCandidates
    * chain.maybeUpdateIsCurrent()

Copy link
Member

@matheusd matheusd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also ran the sync again a few times and executed the test suite on every commit.

Woop!

@davecgh davecgh force-pushed the blockchain_decouple_processing branch from a78f426 to b6b2be8 Compare January 6, 2021 04:42
@davecgh
Copy link
Member Author

davecgh commented Jan 6, 2021

I updated this to ensure the index lock is not held during flushing in case pruned ticket information has to be reloaded which itself requires the index lock and to add a couple of trace logging statements for connected and disconnected blocks.

Also, rebased.

This makes the logging in the test harness more consistent and improves
it to make use of the ability to look up the test name for blocks when
logging issues for easier identification.
This modifies the chain test harness to allow the initial blocks that
are generated to specified heights, such as stake validation height, to
be independently generated without also accepting them to the chain.

It also allows a custom underlying chain generator which houses those
blocks to be provided when creating a test harness.

These two combined provide more flexibility to the tests since they can
choose when (and if) to make the generated blocks available to the
chain.
@davecgh davecgh force-pushed the blockchain_decouple_processing branch from b6b2be8 to 76b3e5a Compare January 6, 2021 04:49
@sefbkn
Copy link
Member

sefbkn commented Jan 7, 2021

This PR looks solid overall. Thank you for taking the time to write this along with clear comments and tests. I am still working through it, but have not seen any issues with the latest code thusfar.

On a mostly unrelated note with respect to this PR, while syncing I did notice a log message that appears incorrect. It cropped up while testing this PR so I figured it worth mentioning at least even though it's not coming from this set of changes.

[WRN] PEER: Misbehaving peer xxx:9108 (outbound): 0 transactions not found -- ban score increased to 80
...
[INF] SRVR: Banned peer xxx (outbound) for 24h0m0s

In server.go -> OnNotFound, the following line

		reason := fmt.Sprintf("%d %v not found", numBlocks, txStr)

seems like it should be

		reason := fmt.Sprintf("%d %v not found", numTxns, txStr)

blockchain/chain.go Outdated Show resolved Hide resolved
This completely reworks the way block index and processing works to use
headers-first semantics and support out of order processing of the
associated block data.

This will ultimately provide a wide variety of benefits as it means the
entire shape of the block tree can be determined from the headers alone
which in turn will allow much more informed decisions to be made,
provides better information regarding sync status and warning
conditions, and allows future work to add invalidation and
reconsideration of arbitrary blocks.

It must be noted that this is a major change to the way the block index
and block handling is done and is deeply integrated with the consensus
rules, so it will need a significant amount of testing and extremely
careful review.

Also of note is that there are a lot of assumptions made in the calling
code in regards to expected error attribution and the state the chain
has at the moment notifications are processed, so all of those semantics
have intentionally been retained, even though they might seem a bit out
of place now, in order to limit the amount of changes introduced at a
time and thus better ensures correctness.

For example, in the future it would probably make more sense to make the
notifications entirely asynchronous and for validation failures to be
reported via those asynchronous notifications.  However, before that can
happen, all callers would first need to be updated to ensure they do not
rely on the chain being in the same state it was at the moment the
notification was generated.

In addition to all of the existing full block tests, this introduces
a comprehensive set of processing order tests which exercise all of the
new logic.

High level overview of the changes:

- Introduce tracking for whether a block is fully linked, meaning it
  builds on a branch that has block data for all of its ancestors
- Add received order tracking to ensure miners are not able to gain an
  advantage in terms of chain selection by only advertising a header
- Add several new pieces of information to the block index:
  - Header with the most cumulative work that is not known to be invalid
  - Header with the most cumulative work that is known to be invalid
  - Map of best chain candidates to aid in efficient selection
  - Map of unlinked children for more efficient linking
  - Prunable cached chain tips to significantly reduce potential search
    space during invalidation
- Introduce a compare function which determines which of two nodes
  should be considered better for the purposes of best chain selection
- Add chain tip iteration capabilities with potential filtering
- Mark all descendants invalid due to known invalid ancestor when a
  block is marked invalid
- Add ability to determine if a block can currently be validated
- Rework the chain reorganization func to work with an arbitrary target
- Modify the overall chain reorg func to reorg to the block with the
  most cumulative work that is valid in the case it is not possible to
  reorg to the target block
- Add a new MultiError type for keeping track of multiple errors in the
  same call
- Add a new ErrNoBlockData error kind
- Update all cases that only require all of the ancestor block data to
  be available to check for that condition instead of the more strict
  condition of already being fully validated
- Introduce a new processing lock separate from the overall chain lock
- Add new method and ability to independently accept block headers
- Rework the block processing func to accept blocks out of order so long
  as their header is already known valid
- Keep a cache of blocks that recently passed contextual validation
  checks
- Cleanup and correct various comments to match reality
- Add comprehensive tests to exercise the new processing logic and best
  header tracking (for both invalid and not known invalid)
This reworks the logic that is used to determine if the chain believes
it is current to be more robust and to take advantage of the fact that
the best known header is now available.

In particular, it introduces two new constraints:
- A latching mechanism that prevents the chain from flipping back to
  being non current unless no new blocks have been seen for an extended
  period of time
- The chain only latches to current once it is synced to the header with
  the most cumulative work that is not known to be invalid in addition
  to the existing conditions
This modifies the getblockchaininfo RPC handler to use the newly-exposed
best header that is not known to be invalid from chain versus the height
of the current best chain tip.
This modifies the README.md and doc.go files to match the latest code.
@davecgh davecgh force-pushed the blockchain_decouple_processing branch from 76b3e5a to 12e132e Compare January 7, 2021 16:44
@davecgh
Copy link
Member Author

davecgh commented Jan 7, 2021

Thanks to everyone for reviewing this. I know it was not a trivial one to review and took a lot of effort.

@davecgh davecgh merged commit 12e132e into decred:master Jan 7, 2021
@davecgh davecgh deleted the blockchain_decouple_processing branch January 7, 2021 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
non-forking consensus Changes that involve modifying consensus code without causing any forking changes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants