-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce direct memory OOM chances on broker with a per server query response size budget #11710
Conversation
53b16a8
to
445ab73
Compare
pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java
Outdated
Show resolved
Hide resolved
...on-tests/src/test/java/org/apache/pinot/integration/tests/OfflineClusterIntegrationTest.java
Outdated
Show resolved
Hide resolved
I've added some notes, but I think the PR is in good shape |
Codecov Report
@@ Coverage Diff @@
## master #11710 +/- ##
============================================
+ Coverage 63.08% 63.15% +0.06%
- Complexity 207 1141 +934
============================================
Files 2342 2343 +1
Lines 125883 126503 +620
Branches 19357 19460 +103
============================================
+ Hits 79410 79889 +479
- Misses 40822 40930 +108
- Partials 5651 5684 +33
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 155 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Cool, thanks! I think reducing the likelyhood of DM OOMs is the logical next step. How will a user determine what the threshold should be? A critical requirement for a feature like this is to introduce it such that it does suddenly start breaking queries that were working before the threshold was enabled; so, a guide on how to determine what the threshold should be is, I think, critical for safe adoption. |
@ege-st |
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM over all, I think the behavior of having both broker config and query option to set this is
pretty much like the global threshold + query option we do for timeout, easpecially when the direct memory threshold can be conservative and not considering data skew. My only concern is the user need to be instructed to not abuse the option and kill the cluster, or we can have a swich to turn it off centrally (for security).
pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may also consider adding a table level override for this config
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java
Outdated
Show resolved
Hide resolved
89c634c
to
889ecaf
Compare
pinot-core/src/main/java/org/apache/pinot/core/query/scheduler/QueryScheduler.java
Outdated
Show resolved
Hide resolved
I am still not convinced of how we have wired the config aspect. At a high level, I suggest we do the same thing as
Also this may be a nit but the following 2 configs in CommonConstants is not super intuitive to me.
One problem I see is that while (1) is an instance config but tells for the query and therefore we internally compute the per server threshold / budget during routing depending on how many servers we are routing to. However, (2) seems to be the queryOption way of doing things but here we are having the user directly specify the server level budget instead of overall. Why this difference ? |
@siddharthteotia Added tableConfig and overriding sequence. The reasoning behind adding a broker level instance config was - the broker ultimately should decide how much response size it should get for each query (depending on it's direct memory limits). If this broker instance config is set, we use that to set the query option to limit response size. I'm open to suggestions if the broker instance config doesn't make sense. |
48f169f
to
ccf29fe
Compare
ccf29fe
to
e67f2b1
Compare
I understand the intention of tracking broker side total memory usage, but evenly splitting it into server side memory limit might cause inconsistent behavior because the actually limit relies on the fanout, and it assumes evenly distribution of the results from each server. |
@Jackie-Jiang got it. Based on the suggestions, I've added a server side limit in the table config. Would you prefer that we add an instance config for the server in addition to the table config setting? |
@vvivekiyer I'm thinking 2 configs - We need some documentation to clear document how these 2 configs are used |
@Jackie-Jiang added perServer and perQuery configs. Please take a look. I'll update the pinot docs with this behavior once this code is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
Outdated
Show resolved
Hide resolved
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java
Outdated
Show resolved
Hide resolved
pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java
Outdated
Show resolved
Hide resolved
@vvivekiyer Have you added these configs to the pinot doc? |
@Jackie-Jiang Added documentations for broker configs, queryOption and table configs. |
When server responds for a query with a large response, the broker can potentially crash with direct memory OOM.
In PR #11496 - a fix was added to restart the Netty Channel in such scenarios. This will result in all active queries failing with an error. This is a good fix to have to protect Brokers in this extreme case.
However, to reduce the probability of such events happening and contain the impact for other queries, this PR introduces a threshold at the Server. If the serialized query response at a Server exceeds the threshold, the query is failed. The overall threshold for a query is set using a config. This budget is divided across all Servers processing the query.
cc: @siddharthteotia @jasperjiaguo @gortiz @ege-st @dinoocch