Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(security): fix concurrency issues in tree key formats, and CPU usage in genesis tree roots #7392

Merged
merged 31 commits into from
Oct 19, 2023

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented Aug 28, 2023

Motivation

Older Zebra versions did not cache the genesis tree roots.
Zebra currently does a lot of unnecessary writes to the anchors, sprout trees, and history trees.

Since block validation, mempool validation, and RPCs can read these trees, and block validation writes them, this is a potential remotely-triggered CPU or disk denial of service.

Part of #7664, alternative strategy 1 (stop deleting keys).
This is also part of a permanent fix for bug #7618.
Adds docs for part of #7737.

Solution

Improve the database size and performance, particularly during attacks via RPCs, mempool validation, or multiple block writes.

Format Upgrades

  • Change the sprout and history key format from tip_height to () (empty key) in the code and docs
  • Read the sprout, sapling and orchard genesis trees, then write their cached roots back to the database, and update the docs
  • Remove genesis-specific tree code, which uses the wrong format
  • Update tests now the genesis block has an entry for an empty tree (rather than using default() which is an empty tree)
  • Increment the state minor version

Format Tests

  • Check for incorrect sprout or history key formats, and duplicate tip trees
  • Check that there aren't any un-cached tree roots for sprout, sapling, and orchard
  • Update database snapshot tests
  • Disable format validity checks in some tests that create invalid states

Refactors

  • Refactor sprout, sapling, and orchard tree and anchor writes so they are all skipped if the trees are the same
    • The history tree changes with every block, so we don't do this check for it
  • Refactor sprout, sapling and orchard tree and anchor writes so they automatically cache roots (and use the newer key format)
  • This format upgrade is short, so some tests needed changes to wait for a short upgrade, rather than a long upgrade
  • Fix metrics for the genesis block

Documentation

  • Document the underlying performance bug that this PR fixes, by adding a section to the state upgrades doc.

Review

This is a routine bug fix.

This is a database format change, so it needs 2 reviews.

Reviewer Checklist

  • Will the PR name make sense to users?
    • Does it need extra CHANGELOG info? (new features, breaking changes, large changes)
  • Are the PR labels correct?
  • Does the code do what the ticket and PR says?
    • Does it change concurrent code, unsafe code, or consensus rules?
  • How do you know it works? Does it have tests?

@teor2345 teor2345 added C-bug Category: This is a bug P-Low ❄️ C-security Category: Security issues I-panic Zebra panics with an internal error message I-slow Problems with performance or responsiveness A-state Area: State / database changes A-concurrency Area: Async code, needs extra work to make it work properly. I-remote-trigger Remote nodes can make Zebra do something bad labels Aug 28, 2023
@teor2345 teor2345 requested a review from a team as a code owner August 28, 2023 03:45
@teor2345 teor2345 self-assigned this Aug 28, 2023
@teor2345 teor2345 requested review from upbqdn and removed request for a team August 28, 2023 03:45
@teor2345 teor2345 force-pushed the fix-genesis-tree-root branch from 65b09c7 to 6f4540f Compare August 28, 2023 03:50
@teor2345 teor2345 marked this pull request as draft August 28, 2023 04:07
@teor2345 teor2345 removed the request for review from upbqdn August 28, 2023 04:07
Base automatically changed from fix-tree-dedup-suggestion to fix-tree-dedup August 28, 2023 18:58
Base automatically changed from fix-tree-dedup to main August 28, 2023 22:59
@oxarbitrage oxarbitrage added the do-not-merge Tells Mergify not to merge this PR label Aug 31, 2023
@teor2345

This comment was marked as outdated.

@teor2345 teor2345 closed this Sep 1, 2023
@teor2345 teor2345 changed the title fix(state): Cache roots for genesis trees in legacy states, and upgrade sprout tree key format fix(security): Cache roots for genesis trees in legacy states, and upgrade sprout tree key format Sep 4, 2023
@teor2345

This comment was marked as outdated.

@upbqdn upbqdn removed the do-not-merge Tells Mergify not to merge this PR label Oct 17, 2023
arya2
arya2 previously approved these changes Oct 18, 2023
@arya2
Copy link
Contributor

arya2 commented Oct 18, 2023

lightwalletd_wallet_grpc_tests failed because it marked the database format as upgraded before opening the RPC endpoint.

@teor2345
Copy link
Contributor Author

lightwalletd_wallet_grpc_tests failed because it marked the database format as upgraded before opening the RPC endpoint.

This happened before the other way around, let me work on a better fix.

Co-authored-by: Marek <mail@marek.onl>
@github-actions github-actions bot added the C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG label Oct 18, 2023
@teor2345
Copy link
Contributor Author

lightwalletd_wallet_grpc_tests failed because it marked the database format as upgraded before opening the RPC endpoint.

@arya2 your (deleted) suggestion here was really good and I ended up implementing it 🙂

It would be nice to have a ~expect_stdout_unordered_lines_match() method in TestChild for these cases.

zebra-test/src/command.rs Outdated Show resolved Hide resolved
zebra-test/src/command.rs Outdated Show resolved Hide resolved
upbqdn
upbqdn previously approved these changes Oct 18, 2023
zebra-test/src/command.rs Outdated Show resolved Hide resolved
zebra-test/src/command.rs Outdated Show resolved Hide resolved
Co-authored-by: Marek <mail@marek.onl>
@teor2345 teor2345 removed C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG extra-reviews This PR needs at least 2 reviews to merge labels Oct 18, 2023
@github-actions github-actions bot added the C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG label Oct 18, 2023
@teor2345
Copy link
Contributor Author

We've had two reviews on the production code, and I've just changed docs and test code. So I've taken the extra-reviews label off.

@upbqdn upbqdn removed the C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG label Oct 18, 2023
mergify bot added a commit that referenced this pull request Oct 19, 2023
@mergify mergify bot merged commit 64f7772 into main Oct 19, 2023
@mergify mergify bot deleted the fix-genesis-tree-root branch October 19, 2023 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-concurrency Area: Async code, needs extra work to make it work properly. A-state Area: State / database changes C-bug Category: This is a bug C-security Category: Security issues C-tech-debt Category: Code maintainability issues I-panic Zebra panics with an internal error message
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants