Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(network): Rate-limit inbound connections per IP. #7041

Merged
merged 8 commits into from
Jul 14, 2023

Conversation

arya2
Copy link
Contributor

@arya2 arya2 commented Jun 21, 2023

Motivation

This prevents some kinds of attacks by making Zebra skip the handshake for these inbound connections.

Closes #6981.

Solution

  • Count recent inbound connection attempts by IP
  • Drop connections if there were already max_connections_per_ip recent inbound connections from their IP

Review

Part of regular scheduled work.

Reviewer Checklist

  • Will the PR name make sense to users?
    • Does it need extra CHANGELOG info? (new features, breaking changes, large changes)
  • Are the PR labels correct?
  • Does the code do what the ticket and PR says?
    • Does it change concurrent code, unsafe code, or consensus rules?
  • How do you know it works? Does it have tests?

@arya2 arya2 added C-bug Category: This is a bug P-Medium ⚡ C-security Category: Security issues I-slow Problems with performance or responsiveness A-network Area: Network protocol updates or fixes A-concurrency Area: Async code, needs extra work to make it work properly. I-remote-trigger Remote nodes can make Zebra do something bad labels Jun 21, 2023
@arya2 arya2 self-assigned this Jun 21, 2023
@github-actions github-actions bot added the C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG label Jun 21, 2023
@arya2 arya2 changed the base branch from main to constrain-outbound-connector June 21, 2023 20:11
@codecov
Copy link

codecov bot commented Jun 21, 2023

Codecov Report

Merging #7041 (493df40) into main (77ad91c) will decrease coverage by 0.07%.
The diff coverage is 66.03%.

❗ Current head 493df40 differs from pull request most recent head 4fab026. Consider uploading reports for the commit 4fab026 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7041      +/-   ##
==========================================
- Coverage   77.53%   77.47%   -0.07%     
==========================================
  Files         310      310              
  Lines       41933    41749     -184     
==========================================
- Hits        32513    32343     -170     
+ Misses       9420     9406      -14     

Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some overall design advice:

This ticket and PR are about preventing repeated inbound handshakes from the same IP. (Or a small set of IPs.)

We already rate-limit these handshakes, and drop duplicates when they reach the peer set.

Here's what I think the remaining security issues are:

A single IP can have multiple inbound connections in flight between the inbound connection task and the peer set. (About 3-4 if it uses both timeouts fully.) These connections must be successful to be sent to the peer set.

A single IP can also deny service to other inbound peers by continually making connections, and hitting the 1 second rate-limit. These connections can be successful, failed, or time out. (So there's no guarantee they will reach the peer set.)

I should have written this in the ticket, but I hadn't thought it through, because I didn't realise it was going to be scheduled so soon.

So I'd suggest keeping a list of recent inbound connection IPs in the inbound connection task, and expiring them after a timeout.

Outbound peers and the PeerSet are not involved:

I don't think these security issues can be fixed by checking the connections tracked by the peer set or address book.

The AddressBook only tracks addresses that are valid for outbound connections. The ports of inbound connections are ephemeral, so they aren't stored in the AddressBook, because we can't make outbound connections to those ports.

The PeerSet tracks both inbound and outbound connections, but only after they successfully complete a handshake.

It's also ok to leave any duplicate outbound/inbound IPs for the peer set to drop.

Related bugs:

We're currently fixing a concurrency bug in the inbound connection handler, so I'd like to avoid adding more locking to that function. Particularly when it's hidden in trait methods.

Edit: it turns out the bug was #7103, a future using 100% CPU in a busy-loop. So it's not related to locking.

@teor2345 teor2345 added do-not-merge Tells Mergify not to merge this PR and removed do-not-merge Tells Mergify not to merge this PR C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG labels Jun 30, 2023
Base automatically changed from constrain-outbound-connector to main July 6, 2023 05:54
@teor2345
Copy link
Contributor

teor2345 commented Jul 10, 2023

I updated my review to remove the note about a locking bug, because it turned out to be incorrect.

Edit: there's no need to work on this either just because I commented 😅

@arya2 arya2 force-pushed the ignore-some-inbound-conns branch from 493df40 to 7de63bb Compare July 11, 2023 23:19
@arya2 arya2 requested a review from teor2345 July 11, 2023 23:19
@arya2 arya2 marked this pull request as ready for review July 11, 2023 23:19
@arya2 arya2 requested a review from a team as a code owner July 11, 2023 23:19
@teor2345

This comment was marked as resolved.

teor2345
teor2345 previously approved these changes Jul 12, 2023
Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks so much for rewriting it! I really appreciate the simpler code.

I have some questions about how this code will handle an attack where Zebra is flooded with inbound connections from different IP addresses. It would be good to fix them here or in a quick follow-up.

I was also wondering about the PR title. I think what this code does now is rate-limit inbound connections to max_connections_per_ip per MIN_PEER_RECONNECTION_DELAY, rather than ignoring them completely?
(Which still fixes the original bug where the handshaker could get overloaded.)

Otherwise all my suggestions are optional.

As a follow-up, I think we might want to hide these IpAddr and the ones in the AddressBook from the logs, in the same way as PeerSocketAddr. But I think that's a low priority, because we don't actually log those types or instrument their functions. So the IpAddrs we've added won't appear in any logs yet.

zebra-network/src/peer_set/initialize/recent_by_ip.rs Outdated Show resolved Hide resolved
zebra-network/src/peer_set/initialize/recent_by_ip.rs Outdated Show resolved Hide resolved
zebra-network/src/peer_set/initialize.rs Show resolved Hide resolved
zebra-network/src/peer_set/initialize/recent_by_ip.rs Outdated Show resolved Hide resolved
Uses partition_point & split_off

Moves tests to separate module
arya2 and others added 2 commits July 13, 2023 13:47
teor2345
teor2345 previously approved these changes Jul 13, 2023
Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to go, feel free to accept or reject my suggestions in PRs #7213 and #7214.

@teor2345
Copy link
Contributor

I was also wondering about the PR title. I think what this code does now is rate-limit inbound connections to max_connections_per_ip per MIN_PEER_RECONNECTION_DELAY, rather than ignoring them completely?
(Which still fixes the original bug where the handshaker could get overloaded.)

Oh this is also an optional nitpick, I doubt most users will notice the difference between "Ignore for 2 minutes" and "rate-limit for 2 minutes".

@arya2 arya2 changed the title fix(network): Ignore inbound connections from IPs for which Zebra already has an active peer. fix(network): Rate-limit inbound connections per IP. Jul 13, 2023
@arya2 arya2 requested a review from teor2345 July 13, 2023 21:19
mergify bot added a commit that referenced this pull request Jul 14, 2023
@mergify mergify bot merged commit 7b0dedd into main Jul 14, 2023
@mergify mergify bot deleted the ignore-some-inbound-conns branch July 14, 2023 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-concurrency Area: Async code, needs extra work to make it work properly. A-network Area: Network protocol updates or fixes C-bug Category: This is a bug C-security Category: Security issues I-remote-trigger Remote nodes can make Zebra do something bad I-slow Problems with performance or responsiveness
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ignore repeated inbound connections from the same IP
2 participants