Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi: Rework sync model to use hdr annoucements. #2555

Merged
merged 10 commits into from
Jan 22, 2021

Conversation

davecgh
Copy link
Member

@davecgh davecgh commented Jan 17, 2021

This overhauls the primary code that deals with synchronizing the blockchain with other peers on the network to use a model based on header announcements instead of inventory vectors.

Currently, all blocks are discovered via a combination of inventory vector announcements and the getblocks protocol message, both of which only include the hash of the blocks in question.

This is not ideal because it means blocks are blindly downloaded based on those announcements without knowing if they're actually likely to be something that is useful since there is no further information such as what blocks they connect to. It also means that extra logic is needed to deal with orphan blocks (those whose parents are not yet known) such as caching them, determining known orphan roots, and expiration.

In short, the current method generally ends up wasting bandwidth and also makes it much more difficult to detect certain classes of malicious behavior.

The recently added capability of blockchain to process headers independently from blocks while the block data is added out of order later opened the door for a much better model that addresses all of the aforementioned issues as well as paves the way for many other related enhancements.

The new model discovers and downloads all of the headers prior to any block data via the getheaders protocol message and then uses those headers to determine exactly which blocks need to be downloaded to reach the tip of the best chain. Notably, this means orphan blocks are now a thing of the past as blocks that do not connect are no longer downloaded under any circumstance.

Further, the new model also makes use of sendheaders so that all block announcements are made via the associated block header. This in turn is used to better determine if an announced block is likely to be useful prior to downloading it.

It should be noted that the changes herein are intentionally limited to those necessary to use the new sync model based on headers in an incremental fashion to help simplify review and make it easier to assert correctness. There are many more improvements that this model paves the way to support planned for future commits. For example, syncing from multiple peers in parallel and improved DoS protection.

All of the changes are split up into a series of logic commits that describe each step to help ease the review process.

High-level overview of the key features:

  • All headers are downloaded and examined prior to downloading any blocks
  • The initial header sync process:
    • Detects and recovers from stalled/malicious peers
    • Includes periodic logging with cumulative stats and a progress percentage
  • The concept of orphan blocks no longer exists
    • This means blocks which are not already known to connect are not downloaded
  • All block announcements are handled via header announcements
    • Detects and prevents malicious behavior related to orphan headers
  • The chain sync process:
    • Starts once the headers are downloaded and entails both downloading blocks as well as verifying them
    • Uses the headers to determine the best blocks to download
    • Pipelines the requests to increase throughput
    • Actively avoids downloading duplicate blocks
    • Logs the overall verification progress versus the timestamp of the last processed block
  • The sync height is dynamically updated as new headers are discovered
  • The logging behavior is changed once steady state mode is reached to instead log new blocks along with their hashes and additional stats immediately

This is work towards #1145.

Fixes #2252.

@davecgh davecgh added this to the 1.7.0 milestone Jan 17, 2021
@davecgh davecgh mentioned this pull request Jan 17, 2021
33 tasks
@davecgh davecgh force-pushed the netsync_header_announcement_sync_model branch from 38758c3 to fafbfc7 Compare January 17, 2021 11:02
internal/netsync/manager.go Outdated Show resolved Hide resolved
@davecgh davecgh force-pushed the netsync_header_announcement_sync_model branch from fafbfc7 to ee68773 Compare January 17, 2021 22:25
@davecgh
Copy link
Member Author

davecgh commented Jan 17, 2021

I added an additional commit since the original PR was opened to change the verification progress reported by the RPC server for the new model as well.

@davecgh davecgh force-pushed the netsync_header_announcement_sync_model branch 5 times, most recently from 0334b4c to efbc116 Compare January 18, 2021 06:24
blockchain/chainquery.go Outdated Show resolved Hide resolved
internal/netsync/manager.go Outdated Show resolved Hide resolved
Copy link
Member

@matheusd matheusd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(still doing some functional testing)

Got the following output while syncing on mainnet:

2021-01-18 16:03:03.837 [INF] SYNC: New valid peer [x] (outbound) (/dcrwire:0.4.0/dcrd:1.6.0(pre)/)
2021-01-18 16:03:12.919 [INF] SYNC: Processed 258000 headers in the last 10.15s (progress 94.02%)
2021-01-18 16:03:13.521 [INF] SYNC: Syncing headers to block height 521071 from peer [y] (outbound)
2021-01-18 16:03:19.309 [INF] SYNC: Processed 133074 headers in the last 6.38s (progress 100.00%)
2021-01-18 16:03:19.309 [INF] SYNC: Headers synced -- downloading blocks
2021-01-18 16:03:19.711 [INF] SYNC: Processed 2000 headers in the last 400ms (progress 100.00%)
2021-01-18 16:03:20.191 [INF] SYNC: Processed 2000 headers in the last 470ms (progress 100.00%)
2021-01-18 16:03:20.293 [INF] SYNC: Processed 2000 headers in the last 100ms (progress 100.00%)
2021-01-18 16:03:20.840 [INF] SYNC: Processed 2000 headers in the last 540ms (progress 100.00%)
2021-01-18 16:03:20.866 [INF] SYNC: Processed 2000 headers in the last 20ms (progress 100.00%)
2021-01-18 16:03:21.314 [INF] SYNC: Processed 2000 headers in the last 440ms (progress 100.00%)
2021-01-18 16:03:21.441 [INF] SYNC: Processed 2000 headers in the last 120ms (progress 100.00%)
2021-01-18 16:03:21.787 [INF] SYNC: Processed 2000 headers in the last 340ms (progress 100.00%)
2021-01-18 16:03:22.015 [INF] SYNC: Processed 2000 headers in the last 220ms (progress 100.00%)
2021-01-18 16:03:22.268 [INF] SYNC: Processed 2000 headers in the last 250ms (progress 100.00%)
2021-01-18 16:03:22.591 [INF] SYNC: Processed 2000 headers in the last 320ms (progress 100.00%)
2021-01-18 16:03:22.747 [INF] SYNC: Processed 2000 headers in the last 150ms (progress 100.00%)
2021-01-18 16:03:23.164 [INF] SYNC: Processed 2000 headers in the last 410ms (progress 100.00%)

internal/netsync/manager.go Outdated Show resolved Hide resolved
@davecgh davecgh force-pushed the netsync_header_announcement_sync_model branch 3 times, most recently from f249585 to ed9397a Compare January 18, 2021 19:47
@davecgh
Copy link
Member Author

davecgh commented Jan 18, 2021

Got the following output while syncing on mainnet:

...
2021-01-18 16:03:21.314 [INF] SYNC: Processed 2000 headers in the last 440ms (progress 100.00%)

That is fixed with the latest.

@matheusd
Copy link
Member

Woot :P

While downloading headers:

$ dcrctl getblockchaininfo
{
  "chain": "mainnet",
  "blocks": 0,
  "headers": 182000,
  "syncheight": 521083,
  "bestblockhash": "298e5cc3d985bfe7f81dc135f360abe089edd4396b86d2de66b0cef42b21d980",
  "difficulty": 453115903,
  "difficultyratio": 32767.74999809,
  "verificationprogress": 0,
  "chainwork": "0000000000000000000000000000000000000000000000000000800040002000",
  "initialblockdownload": true,
  "maxblocksize": 393216,
  "deployments": {
  ...

And then while downloading blocks:

$ dcrctl getblockchaininfo
  {
  "chain": "mainnet",
  "blocks": 1472,
  "headers": 521084,
  "syncheight": 521084,
  "bestblockhash": "000000000000157f9f24f293d8153b8de0465e8558f2c386f080047b67d1502c",
  "difficulty": 442693272,
  "difficultyratio": 169527.16125555,
  "verificationprogress": 0.002824880441541095,
  "chainwork": "0000000000000000000000000000000000000000000000000b668d107cb857b2",
  "initialblockdownload": true,
  "maxblocksize": 393216,
  "deployments": {
  ...

@davecgh davecgh force-pushed the netsync_header_announcement_sync_model branch from ed9397a to 194420b Compare January 18, 2021 21:15
Copy link
Member

@matheusd matheusd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All issues resolved, tested reorgs on the standard simnet setup and syncing in mainnet.

Progressing nicely :P

@davecgh davecgh force-pushed the netsync_header_announcement_sync_model branch from 194420b to 2e55226 Compare January 19, 2021 00:48
@davecgh
Copy link
Member Author

davecgh commented Jan 19, 2021

I updated the initial chain synced log message to also account for the case when the chain is already fully synced after the initial header sync process so that it prints immediately as expected versus waiting until a new block shows up.

Changes.

Copy link
Member

@rstaudt2 rstaudt2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing/testing but I did notice one strange behavior. It's intermittent but about 50% of the time the header progress message just keeps getting logged non-stop after the initial header sync is completed (with a fresh db on mainnet):

EDIT: Looks like this was already mentioned above as resolved, but it appears to still be occurring intermittently.

2021-01-18 20:57:49.085 [INF] CMGR: 16 addresses found from seeder mainnet-seed.dcrdata.org
2021-01-18 20:57:49.107 [INF] CMGR: 16 addresses found from seeder mainnet-seed-1.decred.org
2021-01-18 20:57:49.771 [INF] CMGR: 16 addresses found from seeder mainnet-seed-2.decred.org
2021-01-18 20:57:50.065 [INF] CMGR: 16 addresses found from seeder mainnet-seed.planetdecred.org
2021-01-18 20:57:50.478 [INF] CMGR: 16 addresses found from seeder mainnet-seed.jamieholdstock.com
2021-01-18 20:57:53.902 [INF] SYNC: New valid peer 107.152.35.163:9108 (outbound) (/dcrwire:0.4.0/dcrd:1.5.1/)
2021-01-18 20:57:53.902 [INF] SYNC: Syncing headers to block height 521176 from peer 107.152.35.163:9108 (outbound)
2021-01-18 20:57:54.128 [INF] SYNC: New valid peer 51.195.222.207:9108 (outbound) (/dcrwire:0.4.0/dcrd:1.6.0/)
2021-01-18 20:57:54.128 [INF] SYNC: New valid peer 18.198.44.215:9108 (outbound) (/dcrwire:0.4.0/dcrd:1.5.2/)
2021-01-18 20:57:54.152 [INF] SYNC: New valid peer 206.81.16.204:9108 (outbound) (/dcrwire:0.4.0/dcrd:1.7.0(pre)/)
2021-01-18 20:57:54.335 [INF] SYNC: New valid peer 104.199.150.126:9108 (outbound) (/dcrwire:0.4.0/dcrd:1.5.2/)
2021-01-18 20:57:54.335 [INF] SYNC: New valid peer 139.99.135.146:9108 (outbound) (/dcrwire:0.4.0/dcrd:1.6.0/)
2021-01-18 20:57:55.150 [INF] SYNC: New valid peer [2a0b:b7c4:1:10c1::1]:9108 (outbound) (/dcrwire:0.4.0/dcrd:1.6.0(rc4)/)
2021-01-18 20:57:55.317 [INF] SYNC: New valid peer [2001:19f0:7001:3511:5400:1ff:feb3:3d18]:9108 (outbound) (/dcrwire:0.4.0/dcrd:1.5.1/)
2021-01-18 20:57:58.976 [INF] SYNC: Processed 52000 headers in the last 10.12s (progress 9.99%)
2021-01-18 20:58:09.059 [INF] SYNC: Processed 114000 headers in the last 10.08s (progress 30.34%)
2021-01-18 20:58:19.672 [INF] SYNC: Processed 428000 headers in the last 10.61s (progress 45.68%)
2021-01-18 20:58:29.748 [INF] SYNC: Processed 540000 headers in the last 10.07s (progress 70.57%)
2021-01-18 20:58:39.785 [INF] SYNC: Processed 302000 headers in the last 10.03s (progress 87.86%)
2021-01-18 20:58:46.313 [INF] SYNC: Processed 149177 headers in the last 6.52s (progress 100.00%)
2021-01-18 20:58:46.313 [INF] SYNC: Initial headers sync complete -- downloading blocks (best header hash 000000000000000006767d42fc60cf1dba3e3ca83075a51514ff25de7d2dbe3b, height 521177)
2021-01-18 20:58:46.323 [INF] SYNC: Processed 2000 headers in the last 10ms (progress 100.00%)
2021-01-18 20:58:46.347 [INF] SYNC: Processed 2000 headers in the last 20ms (progress 100.00%)
2021-01-18 20:58:46.468 [INF] SYNC: Processed 2000 headers in the last 120ms (progress 100.00%)
2021-01-18 20:58:46.518 [INF] SYNC: Processed 2000 headers in the last 50ms (progress 100.00%)
2021-01-18 20:58:46.604 [INF] SYNC: Processed 2000 headers in the last 80ms (progress 100.00%)
2021-01-18 20:58:46.713 [INF] SYNC: Processed 2000 headers in the last 100ms (progress 100.00%)
2021-01-18 20:58:46.767 [INF] SYNC: Processed 2000 headers in the last 50ms (progress 100.00%)
2021-01-18 20:58:46.921 [INF] SYNC: Processed 2000 headers in the last 150ms (progress 100.00%)
2021-01-18 20:58:46.967 [INF] SYNC: Processed 2000 headers in the last 40ms (progress 100.00%)
2021-01-18 20:58:47.175 [INF] SYNC: Processed 2000 headers in the last 200ms (progress 100.00%)
2021-01-18 20:58:47.187 [INF] SYNC: Processed 2000 headers in the last 10ms (progress 100.00%)
2021-01-18 20:58:47.228 [INF] SYNC: Processed 2000 headers in the last 40ms (progress 100.00%)
2021-01-18 20:58:47.238 [INF] SYNC: Processed 2000 headers in the last 10ms (progress 100.00%)
2021-01-18 20:58:47.427 [INF] SYNC: Processed 2000 headers in the last 180ms (progress 100.00%)
...

internal/progresslog/logger.go Outdated Show resolved Hide resolved
internal/progresslog/logger.go Outdated Show resolved Hide resolved
@davecgh davecgh force-pushed the netsync_header_announcement_sync_model branch from 2e55226 to 23fa9f0 Compare January 19, 2021 06:01
@davecgh
Copy link
Member Author

davecgh commented Jan 19, 2021

Still reviewing/testing but I did notice one strange behavior. It's intermittent but about 50% of the time the header progress message just keeps getting logged non-stop after the initial header sync is completed (with a fresh db on mainnet):

EDIT: Looks like this was already mentioned above as resolved, but it appears to still be occurring intermittently.

Upon closer inspection I see this is something else than what I thought it was. The culprit was a new block showing up during the initial headers sync. I've pushed a proper fix for it.

@davecgh davecgh force-pushed the netsync_header_announcement_sync_model branch from 23fa9f0 to e46ff1f Compare January 19, 2021 06:12
internal/netsync/manager.go Outdated Show resolved Hide resolved
internal/netsync/manager.go Outdated Show resolved Hide resolved
internal/netsync/manager.go Outdated Show resolved Hide resolved
internal/netsync/manager.go Outdated Show resolved Hide resolved
@davecgh davecgh force-pushed the netsync_header_announcement_sync_model branch from e46ff1f to 0d6a9d4 Compare January 19, 2021 07:09
@rstaudt2
Copy link
Member

Upon closer inspection I see this is something else than what I thought it was. The culprit was a new block showing up during the initial headers sync. I've pushed a proper fix for it.

I tested a bunch more and can confirm that this is resolved.

Copy link
Member

@rstaudt2 rstaudt2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I did a bunch more testing, and everything has been working as expected. I just had a few typos called out inline.

internal/netsync/manager.go Outdated Show resolved Hide resolved
internal/netsync/manager.go Outdated Show resolved Hide resolved
internal/netsync/manager.go Outdated Show resolved Hide resolved
internal/progresslog/logger.go Outdated Show resolved Hide resolved
Copy link
Member

@dnldd dnldd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@davecgh davecgh force-pushed the netsync_header_announcement_sync_model branch from 0d6a9d4 to 96a6c20 Compare January 19, 2021 18:13
@davecgh
Copy link
Member Author

davecgh commented Jan 19, 2021

In addition to addressing all of the review items, I added the interval since the previous block to the "New block ...." log print stats once steady state is reached in the latest update.

2021-01-19 12:05:08.348 [INF] SYNC: New block 000000000000000003a539c1a64bb0619a06656df1c04d82607febf9c4262aa4 (7 transactions, 2 tickets, 4 votes, 0 revocations, height 521350, interval 10m52s)
2021-01-19 12:06:59.648 [INF] SYNC: New block 000000000000000014d7caf9e7a34f411ec30c8b53e3a72e8078c55fb29a76c4 (3 transactions, 8 tickets, 5 votes, 0 revocations, height 521351, interval 1m46s)
2021-01-19 12:07:59.969 [INF] SYNC: New block 000000000000000018253f8754457db36492a00b9700275060d29d1b4364a973 (3 transactions, 1 ticket, 5 votes, 0 revocations, height 521352, interval 1m22s)
2021-01-19 12:09:30.861 [INF] SYNC: New block 000000000000000006a9023b92575322d120d63f30e677f78e03407141493672 (4 transactions, 0 tickets, 5 votes, 0 revocations, height 521353, interval 1m20s)

@davecgh davecgh force-pushed the netsync_header_announcement_sync_model branch from 96a6c20 to b17a77e Compare January 19, 2021 19:48
This modifies the progress logger to make it a bit more generic so that
it can support logging header sync progress in the future too and
modernizes the code while here.
This removes the flag from the SubmitBlock method of the sync manager
since those flags are not really intended for general block submission
which is why it is only ever called with blockchain.BFNone.

Also, the intention is to modify the way the sync manager deals with
block submission in the future which this change help simplify.
This removes the orphan return flag from the SubmitBlock and
ProcessBlock netsync methods in favor of allowing the caller to detect
it from the error.

This is a step towards the goal of removing handling of orphan blocks
altogether in favor of using block headers for sync.
This removes all orphan block handling from the syncing process and is
part of the overall effort to move to full headers-first syncing as
opposed to the current partial header-first checkpoint-based process.

It should be noted that this change is made in such a way that the
syncing process continues to work properly, however, it is only intended
to be an intermediate step, and, as such, it does result in suboptimal
behavior once it passes the final checkpoint.  That will no longer be
the case one the entire sync process conversion is complete.
@davecgh davecgh force-pushed the netsync_header_announcement_sync_model branch from a8bf6e3 to 752e723 Compare January 22, 2021 05:21
This overhauls the primary code that deals with synchronizing the
blockchain with other peers on the network to use a model based on
header announcements instead of inventory vectors.

Currently, all blocks are discovered via a combination of inventory
vector announcements and the getblocks protocol message, both of which
only include the hash of the blocks in question.

This is not ideal because it means blocks are blindly downloaded based
on those announcements without knowing if they're actually likely to be
something that is useful since there is no further information such as
what blocks they connect to.  It also means that extra logic is needed
to deal with orphan blocks (those whose parents are not yet known) such
as caching them, determining known orphan roots, and expiration.

In short, the current method generally ends up wasting bandwidth and
also makes it much more difficult to detect certain classes of malicious
behavior.

The recently added capability of blockchain to process headers
independently from blocks while the block data is added out of order
later opened the door for a much better model that addresses all of the
aforementioned issues as well as paves the way for many other related
enhancements.

The new model discovers and downloads all of the headers prior to any
block data via the getheaders protocol message and then uses those
headers to determine exactly which blocks need to be downloaded to reach
the tip of the best chain.  Notably, this means orphan blocks are now a
thing of the past as blocks that do not connect are no longer
downloaded under any circumstance.

Further, the new model also makes use of sendheaders so that all block
announcements are made via the associated block header.  This in turn is
used to better determine if an announced block is likely to be useful
prior to downloading it.

It should be noted that the changes herein are intentionally limited to
those necessary to use the new sync model based on headers in an
incremental fashion to help simplify review and make it easier to assert
correctness.  There are many more improvements that this model paves the
way to support planned for future commits.  For example, syncing from
multiple peers in parallel and improved DoS protection.

The following is a high-level overview of the key features:

- All headers are downloaded and examined prior to downloading any
  blocks
- The initial header sync process:
  - Detects and recovers from stalled/malicious peers
- The concept of orphan blocks no longer exists
  - This means blocks which are not already known to connect are not
    downloaded
- All block announcements are handled via header announcements
  - Detects and prevents malicious behavior related to orphan headers
- The chain sync process:
  - Starts once the headers are downloaded and entails both downloading
    blocks as well as verifying them
  - Uses the headers to determine the best blocks to download
  - Pipelines the requests to increase throughput
  - Actively avoids downloading duplicate blocks
- The sync height is dynamically updated as new headers are discovered
This modifies the progress logger to expose a new method for logging the
progress of header syncing and adds tests to ensure proper functionality.
This modifies netsync to provide progress logging for the initial
headers sync including a progress percentage.

A new method is added that determines the progress of the headers sync
based on an algorithm that calculates the expected number of headers
there should be in the time interval since the current best known header
and the current time given the target block time for the network.  This
is the preferred approach because, unlike the sync height reported by
remote peers, it is difficult to game since it is based on the target
proof of work, but it assumes consistent mining, which is not the case
on all networks, so it is limited to the main and test networks where
that applies.

For the other networks, it falls back to the sync height reported by the
remote peers.

The best known header is now logged at initial load time in blockchain
and the netsync code is updated to make use of the new headers sync
progress logger to tally the number of processed and periodically log
the progress.
This modifies blockchain, progresslog, and netsync to provide a progress
percentage when logging the main chain verification process.

A new method is added to blockchain that determines the verification
progress of the current best chain based on how far along it towards the
best known header and blockchain is also modified to log the
verification progress in the initial chain state message when the
blockchain is first loaded.

In addition, the progess logger is modified to log the verification
progress instead of the timestamp of the last block and the netsync code
is updated to make use of the new verification progress func per above.

Finally, the netsync code now determines when the chain is considered
current and logs new blocks as they are connected along with their hash
and additional details.  This provides a cleaner distinction between the
initial verification and catchup phase and steady state operation.
This removes the response dealine for getheaders since there is no
guaranteed response if the remote peer does not have any headers for the
locator.

Also, while here, modify the debug logging to only log a deadline was
added when it actually was.
This modifies the RPC server getblockchaininfo to base the verification
progress calculation on the best header now that all headers are
determined prior to downloading and verifying blocks.
@davecgh davecgh force-pushed the netsync_header_announcement_sync_model branch from 752e723 to cbe730a Compare January 22, 2021 05:32
@davecgh davecgh merged commit cbe730a into decred:master Jan 22, 2021
@davecgh davecgh deleted the netsync_header_announcement_sync_model branch January 22, 2021 05:35
@davecgh davecgh mentioned this pull request Nov 10, 2021
57 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Valid blocks being marked as orphans and daemon getting out of sync
5 participants