Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-peer Checklist #1145

Open
29 of 33 tasks
davecgh opened this issue Mar 10, 2018 · 27 comments
Open
29 of 33 tasks

Multi-peer Checklist #1145

davecgh opened this issue Mar 10, 2018 · 27 comments
Labels
non-forking consensus Changes that involve modifying consensus code without causing any forking changes. optimization

Comments

@davecgh
Copy link
Member

davecgh commented Mar 10, 2018

This issue is intended to be a running checklist of several of the planned and necessary refactors in order to properly support efficient multi-peer parallel downloads. It will be a multi-month effort and is being provided in order to help prevent duplication of effort and to avoid merge conflicts if anyone was also thinking of working in the referenced areas.

Given how drastic the overall changes will be, some of these will very likely need to be updated as the work proceeds since unexpected things invariably come up during large development efforts such as this.

  • Refactor CheckConnectBlock to CheckConnectBlockTemplate -- Implemented by PR blockchain: CheckConnectBlockTemplate with tests. #1086
    • More accurately reflects its purpose of testing block template proposals and uses this fact to be more restrictive about the allowed inputs and to avoid performing the proof of work check. In particular, the provided block template must only build from the current tip or its parent or an error should be returned.
  • Convert to a full block index in memory -- Implemented by PR blockchain: Convert to full block index in mem. #1229
    • Since every block node will be in memory, the code which reconstructs headers from block nodes means that all headers will always be served from memory which will be important since the network will be moving to header-based semantics
    • Several of the error paths can be removed since they will no longer be necessary
    • It will no longer be expensive to calculate CSV sequence locks or median times of blocks way in the past
    • It will be much less expensive to calculate the initial states for the various intervals such as the stake and voter version
    • It will be possible to create much more efficient iteration and simplified views of the overall index
  • Refactor and optimize checkpoint handling to use nodes instead of full blocks -- Implemented by PR blockchain: Optimize checkpoint handling. #1230
    • Since the entire block index will be in memory, this code can be significantly optimized to use a node that is already in memory versus requiring the entire block
  • Refactor and optimize block locator generation -- Implemented by PR blockchain: Optimize block locator generation. #1237
    • The block locator code can be significantly optimized once the full block index is in memory since it will no longer be necessary to consult the database
  • Refactor and optimize inventory discovery -- Implemented by PR multi: Refactor and optimize inv discovery. #1239
  • Refactor and optimize exported header access -- Implemented by PR blockchain: Optimize exported header access. #1273
    • Since headers can be reconstructed from a block index node and the full block index will be in memory, it will no longer be necessary to consult the database
  • Refactor all code that relies on the main chain index in the database to use the new memory block index -- Implemented by PR blockchain: Refactor db main chain idx to blk idx. #1332
    • This will be a major optimization for several functions because they will no longer have to first consult the database (which is incredibly slow in many cases due to the way leveldb splits all of the information across files) to perform lookups and determine if blocks are in the main chain
  • Perform a database migration to remove the main chain index which will no longer be used -- Implemented by PR blockchain: Remove main chain index from db. #1334
  • Implement an efficient and well-tested chain view -- Implemented by PR blockchain: Implement new chain view. #1337
    • Chain views can take advantage of the fact that all block nodes will be in memory to provide a flat view of a specific chain of blocks (a specific branch of the overall block tree) from a given tip all the way back to the genesis block
    • An example of some benefits are:
      • Efficiently comparing two views
      • Quickly finding the fork point (if any) between two views
      • O(1) lookups of nodes at a specific height
      • Possibility for more efficient block locator code through the use of the aforementioned O(1) height lookups
      • Possibility for efficient skip lists for ancestor iteration
  • Refactor code to use the new chain view instead of manually tracking the best node -- Implemented by PR blockchain: Refactor to use new chain view. #1344
    • This will further speed up areas such as best chain selection, best chain membership testing, fork finding, and block locator generation
    • It will also simplify some of the more challenging parts of the code since it will more cleanly separate the chain-specific logic from the block index logic
  • Refactor reorganization logic to make use of known validation status -- Implemented by PR blockchain: Optimize reorg to use known status. #1367
    • This will help ensure that no more than one attempt is made to reorganize to an invalid chain and valid blocks are not evaluated more than once during reorganizations
    • It will avoid a lot of potential unnecessary extra work when moving to parallel insertion in the future
    • It will allow more accurate reporting on the order in which historical side chains tips (aka orphaned blocks) were handled
  • Add support to the block index for marking dirty nodes and flushing them to the database -- Implemented by PR blockchain: Make block index flushable. #1375
    • This will allow validation states to be set on arbitrary nodes and persisted
  • Refactor logic not specifically related to inputs out of CheckTransactionInputs -- Implemented by PRs blockchain: Separate tx input stake checks. #1452, blockchain: Ensure no stake opcodes in tx sanity. #1453, multi: No stake height checks in check tx inputs. #1457, and multi: Cleanup and optimize tx input check code. #1468
    • This will allow all sanity checks to be performed on blocks that do not yet connect due to arriving out of order in parallel
    • It will also reduce the time required to connect the block to the best chain
  • Change the UTXO view semantics to include the tip block unless it is disallowed by voters -- Implemented by PRs blockchain: Reverse utxo set semantics. #1471 and multi: Migration for utxo set semantics reversal. #1520
    • The logic is currently backwards in that the current UTXO set does not include the tip block until it is approved by voters which leads to a plethora of undesirable behavior
    • This will allow all UTXO-related code to be significantly simplified
    • It optimizes for the typical case
    • It will fix issue mempool utxo set is incorrect #618
    • It will provide a path to more easily supporting a UTXO cache
  • Convert chain to perform direct single-step reorganizations with reversion to original tip in case of failure -- Implemented by PR blockchain: Convert to direct single-step reorgs. #1500
    • This will significantly reduce he memory consumption of large reorgs
    • It is a much more cache-friendly approach
    • It will provide a path to more easily decoupling the chain processing and connection code from the download logic
  • Remove all chain state from block manager -- Implemented by PR blockchain: update BestState. #1416 and multi: Handle chain ntfn callback in server. #2498
    • This will help ensure all of the chain-related state logic lives in the blockchain package
    • It will also provide a nice performance boost for various chain-state query functions since the block manager acts as a big lock
  • Decouple the chain processing and connection code from the download logic -- Implemented by PR blockchain: Decouple processing and download logic. #2518
    • This will ultimately allow blocks to be downloaded and stored based on the known headers even when all of the ancestors are not yet known
    • It will provide a path to remove orphan (unknown parents) handling from chain
    • It will allow the code related to the live tickets and UTXO set to be further simplified and optimized
  • Convert the UTXO view to store and work on a per-output basis instead of at a transaction level -- Implemented by multi: Rework utxoset/view to use outpoints. #2540
    • This simplifies the code and paves the way for a UTXO cache
    • Optimizes runtime performance.
  • Implement a UTXO cache -- Implemented by PR multi: Add UtxoCache. #2591
    • Since connection is necessarily linear and all inputs reference previous outputs, performing the updates in memory with periodic writes to the database will allow intermediate states to effectively be skipped
  • Refactor the block manager to a netsync package -- Implemented by PR netsync: Split blockmanager into separate package. #2500
  • Rework the existing headers-first logic to download all of the headers instead of alternating based on checkpoints -- Implemented by PR multi: Rework sync model to use hdr annoucements. #2555
    • This will allow the headers to be stored independently and ensure they all connect as well as providing all of the information needed to determine exactly which blocks comprise the chain with the most work without relying on checkpoints
  • Rework the block download logic to use the newly available full header information while still being linear -- Implemented by PR multi: Rework sync model to use hdr annoucements. #2555
    • This will break the reliance on blindly requesting newer blocks from peers
    • It will also pave the way towards implementing the logic necessary to request blocks from multiple peers
  • Modify the block download logic to request blocks from multiple peers in parallel
    • Track the best known block announced by each peer as an efficient mechanism to discover which blocks are available to download from each peer -- Implemented by PR netsync: Track best known blocks per peer. #3443
    • Update tracking of inflight blocks to include the requested peer -- Implemented by PR netsync: Track peer for requested blocks. #3444
    • Refactor the header sync peer to be part of the header sync state
    • Request blocks from multiple peers in parallel using the information added by the previous steps
    • Deprecate all code related to the notion of a sync peer
    • This will also involve several other steps such re-requesting any data that has not been delivered after a certain time, weighting towards faster peers, and blacklisting misbehaving peers
@davecgh
Copy link
Member Author

davecgh commented May 27, 2018

Updated issue description for PR #1229.

@davecgh
Copy link
Member Author

davecgh commented May 27, 2018

Update issue description for PR #1230.

@davecgh
Copy link
Member Author

davecgh commented May 29, 2018

Updated issue description for PR #1237.

@davecgh
Copy link
Member Author

davecgh commented May 29, 2018

Updated issue description for PR #1239.

@davecgh
Copy link
Member Author

davecgh commented Jun 9, 2018

Updated issue description for PR #1273.

@davecgh
Copy link
Member Author

davecgh commented Jul 2, 2018

Updated issue description for PR #1332.

@davecgh
Copy link
Member Author

davecgh commented Jul 3, 2018

Updated issue description for PR #1334.

@davecgh
Copy link
Member Author

davecgh commented Jul 6, 2018

Updated issue description for PR #1337.

@davecgh
Copy link
Member Author

davecgh commented Jul 9, 2018

Updated issue description for PR #1344.

@davecgh
Copy link
Member Author

davecgh commented Jul 21, 2018

Updated issue description for PR #1367.

@davecgh
Copy link
Member Author

davecgh commented Jul 25, 2018

Updated issue description for PR #1375.

@davecgh
Copy link
Member Author

davecgh commented Aug 28, 2018

Updated issue description for PR #1416.

@davecgh
Copy link
Member Author

davecgh commented Sep 14, 2018

Updated issue description for PRs #1452 and #1453.

@davecgh davecgh added the non-forking consensus Changes that involve modifying consensus code without causing any forking changes. label Dec 28, 2019
@davecgh
Copy link
Member Author

davecgh commented Dec 9, 2020

Updated issue description for PR #1728 and #1735.

@davecgh
Copy link
Member Author

davecgh commented Dec 9, 2020

Updated issue description for PR #2497.

@davecgh
Copy link
Member Author

davecgh commented Dec 10, 2020

Updated issue description for PR #2498.

@davecgh
Copy link
Member Author

davecgh commented Dec 10, 2020

Updated issue description for PR #2499 and #2500.

@davecgh
Copy link
Member Author

davecgh commented Dec 23, 2020

Updated issue description for PR #2518.

@davecgh
Copy link
Member Author

davecgh commented Jan 1, 2021

Updated issue description for PR #2540.

@davecgh
Copy link
Member Author

davecgh commented Jan 17, 2021

Updated issue description for PR #2555.

@davecgh
Copy link
Member Author

davecgh commented Feb 12, 2021

Updated issue description for PR #2591.

@davecgh
Copy link
Member Author

davecgh commented Sep 10, 2024

Updated issue description for PR #3443.

@davecgh
Copy link
Member Author

davecgh commented Sep 10, 2024

Updated issue description for PR #3444.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
non-forking consensus Changes that involve modifying consensus code without causing any forking changes. optimization
Projects
None yet
Development

No branches or pull requests

1 participant