Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(api_service): exclude health checks from throttling with ConcurrencyLimitLayer #2320

Merged
merged 7 commits into from
Oct 10, 2024

Conversation

rymnc
Copy link
Member

@rymnc rymnc commented Oct 8, 2024

Linked Issues/PRs

closes #2319

Description

Uses axum's merge chaining operator to add the ConcurrencyLimit Layer only on subsets of the routes we serve, and exclude /health from it.

We retain /metrics to be throttled, because it acquires a lock on the GLOBAL_REGISTER, which may prevent Prometheus from reading metrics. This is a double-edged sword though i think.

Checklist

  • Breaking changes are clearly marked as such in the PR description and changelog
  • New behavior is reflected in tests
  • The specification matches the implemented behavior (link update PR if changes are needed)

Before requesting review

  • I have reviewed the code myself
  • I have created follow-up issues caused by this PR and linked them here

After merging, notify other teams

[Add or remove entries as needed]

@rymnc rymnc added the bug Something isn't working label Oct 8, 2024
@rymnc rymnc self-assigned this Oct 8, 2024
@rymnc rymnc requested a review from a team October 8, 2024 19:16
@rymnc rymnc marked this pull request as ready for review October 8, 2024 20:14
netrome
netrome previously approved these changes Oct 8, 2024
Copy link
Contributor

@netrome netrome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice stuff! Just a nit suggestion, but feel free to skip it.

crates/fuel-core/src/graphql_api/api_service.rs Outdated Show resolved Hide resolved
crates/fuel-core/src/graphql_api/api_service.rs Outdated Show resolved Hide resolved
netrome
netrome previously approved these changes Oct 8, 2024
netrome
netrome previously approved these changes Oct 8, 2024
Copy link
Contributor

@rafal-ch rafal-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I just have one minor question.

CHANGELOG.md Outdated Show resolved Hide resolved
Co-authored-by: Rafał Chabowski <88321181+rafal-ch@users.noreply.github.com>
rafal-ch
rafal-ch previously approved these changes Oct 8, 2024
netrome
netrome previously approved these changes Oct 9, 2024
Comment on lines 224 to 236
let unthrottled_routes = Router::new()
.route("/v1/health", get(health))
.route("/health", get(health));

let throttled_routes = Router::new()
.route("/v1/playground", get(graphql_playground))
.route("/v1/graphql", post(graphql_handler).options(ok))
.route(
"/v1/graphql-sub",
post(graphql_subscription_handler).options(ok),
)
.route("/v1/metrics", get(metrics))
.route("/v1/health", get(health))
.route("/health", get(health))
.layer(ConcurrencyLimitLayer::new(concurrency_limit))
.layer(ConcurrencyLimitLayer::new(concurrency_limit));
Copy link
Collaborator

@xgreenx xgreenx Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want something like. Because graph-sub also is limited by number of subscriptions, and playground is one time query that returns pre rendered html. Throteling metrics doesn't allow request data=)

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throteling metrics doesn't allow request data=)
are you saying that because metrics are throttled, it affects how graphql and graphql-sub are throttled as well?

why shouldn't we throttle the metrics?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if we have a lot of queries, it becomes impossible for us to track the state of the node from the Grafana=)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, making the change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 1a66422

@rymnc rymnc dismissed stale reviews from netrome and rafal-ch via 1a66422 October 9, 2024 15:56
@rymnc rymnc requested a review from xgreenx October 9, 2024 15:56
@rymnc rymnc requested review from netrome and rafal-ch October 9, 2024 15:56
@rymnc rymnc merged commit f87f59a into master Oct 10, 2024
38 checks passed
@rymnc rymnc deleted the fix/health-concurrency branch October 10, 2024 09:30
@rafal-ch rafal-ch mentioned this pull request Oct 11, 2024
rafal-ch added a commit that referenced this pull request Oct 11, 2024
## Version v0.39.0

### Added
- [2324](#2324): Added metrics
for sync, async processor and for all GraphQL queries.
- [2320](#2320): Added new CLI
flag `graphql-max-resolver-recursive-depth` to limit recursion within
resolver. The default value it "1".


## Fixed
- [2320](#2320): Prevent
`/health` and `/v1/health` from being throttled by the concurrency
limiter.
- [2322](#2322): Set the
salt of genesis contracts to zero on execution.
- [2324](#2324): Ignore peer
if we already are syncing transactions from it.

#### Breaking

- [2320](#2330): Reject
queries that are recursive during the resolution of the query.

### Changed

#### Breaking
- [2311](#2311): Changed the
text of the error returned by the executor if gas overflows.

## What's Changed
* chore(executor): instrument errors to be more meaningful by @rymnc in
#2311
* fix(dummy_da_block_costs): remove dependency on polling_interval and
use channel instead by @rymnc in
#2314
* fix(txpool): Error in GossipsubMessageAcceptance variant docstrings by
@netrome in #2316
* refactor: Eager returns in txpool_v2::service::Task::run by @netrome
in #2325
* fix(api_service): exclude health checks from throttling with
ConcurrencyLimitLayer by @rymnc in
#2320
* Remove the `normalize_rewards_and_costs()` function by @rafal-ch in
#2293
* fix(genesis): set salt of contract on execution of genesis state
configuration by @rymnc in
#2322
* Fixing the issue with duplicate transaction synchronization processes
by @xgreenx in #2324
* Reject queries that are recursive during the resolution by @xgreenx in
#2330


**Full Changelog**:
v0.38.0...v0.39.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

/health endpoint maybe be queued causing false positive restart for operators
4 participants