Reproduce Quota Rejections #18

sushantmane · 2024-08-13T16:02:02Z

Reproduce Quota Rejections

Resolves #XXX

How was this PR tested?

Does this PR introduce any user-facing changes?

No. You can skip the rest of this section.
Yes. Make sure to explain your proposed changes and call out the behavior change.

…nkedin#1032) When smart long tail retry is enabled, each storage node that responds is marked as fast and is used for subsequent retries. However, if a storage node returns a TOO_MANY_REQUESTS (429) status code, it indicates that the node has exceeded its quota and retrying would be futile. Retrying in this scenario would increase the load on already overwhelmed servers, leading to higher latency and degraded performance. This PR prevents retries against storage nodes that return a 429 status code, improving overall system efficiency.

…f store is AA enabled (linkedin#1043) Do not use DRP for lag measurement if store is AA enabled. For AA-enabled stores, consider lag from all fabrics to avoid premature version swaps in some colos. This PR removes DRP dependency for lag measurement in AA-enabled stores.

…finishes before put can be executed (linkedin#1040) * Create InternalDaVinciRecordTransformer to ensure that onStartIngestionTask completes before any puts can be executed * Add public final to recordTransformer * Make recordtransformer private * Handle interrupts by restoring the interrupt flag instead of throwing an exception * Fix incorrect package ref * Don't have two references of record transformer * Ensure getRecordTransformer does not return null * Don't use apply twice * Change back recordTransformer to final * Don't instantiate DVRT twice * Rename record transformer to be explicit about locking and remove linting changes * Rename InternalDaVinciRecordTransformer to BlockingDavinciRecordTransformer. Undo variable name change, and modify description

…inkedin#1048) - Use v0.4.263 for Venice docker images

Added a new store-level config setting `maxRecordSizeBytes`. This will be used to decide if a batch push job should fail or if nearline consumption should be paused due the ingestion of records that are too large. * The default value of `maxRecordSizeBytes` is `-1`. By leaving the limit as unset by default, it is easier for us to add a global config in the future should we deem necessary. * This default value of `maxRecordSizeBytes` will be changed to `10MB` in-memory when it reaches its eventual destination at `VeniceWriter`, even if the config is set to `-1`. * The adjustment is at the store-level, as we anticipate only a few stores could surpass the threshold and require an increased limit. Added tests `testUpdateMaxRecordSize()` and `testChunkingMaxRecordSize()` in `VeniceParentHelixAdminTest`. The former verifies that the config is read and set properly, and the latter verifies that the default value of `maxRecordSizeBytes` will be `10MB` if chunking is enabled. Refactored tests involving "setting a value and verifying the value was set" into a reusable function `testUpdateConfig()`. All you need to do is provide `Consumer` functions for setting a value and verifying the value, which will hopefully make testing configs very straightforward.

…r data integrity validation (linkedin#1042) [controller][server][client][test] Adds a new checksum type ADHASH for data integrity validation This PR adds a new checksum type ADHASH for Venice's DIV. ADHASH adopts the ideal of incremental hash from AdHash and leverages the CRC32 as the underlying hashing algorithm. CRC32 is commonly used in networks and storage devices for detecting data corruption. The purpose of this algorithm is not to protect against intentional changes (cryptographic hashing), but rather to catch accidents like network errors and disk write errors, etc. The emphasis of CRC32 is more on speed than on security. For testing, this PR extends all the existing checksum and DIV tests to include the new ADHASH type.

* [server][davinci] Created interface for Blob dicovery and added a server implementation * Created the BlobSnapshotManager that takes care of hybrid snapshots * Adjusted concurrent users logic and added a test case to test it * Cleaned up notes * Implemented Kai's suggestions * Added looger error for no path found * Implemented comments * Implemented comments * Implemented comments * Implemented comments

…n#1051)

… regions (linkedin#1049) When target region push is enabled, after pushing to the target region, rest of the regions do not delete the backup version before the start of the push. This increases the disk usage during the duration of the push to 3x of the store size (backup + current + future versions). This PR deletes the backup version before starting ingestion in non-target regions.

* [server][davinci] Created interface for Blob dicovery and added a server implementation * Fixed inconsistent unit test * Fixed inconsistent unit test

…hot and Restore tooling (linkedin#1047) * Implemented naive solution for getVenicePaths() and wrote tests * Implemented+tested AdminTool command for extracting Venice-specific paths into an output file * Implemented+tested tree data structure to filter Venice-specific paths * Refactored ZNode to TreeNode * Tested AdminTool command to extract venice paths * Implemented Nisarg's comments (refactored code/files, added javadoc, removed unnecessary comments) * Implemented comments (refactored code/tests, added javadoc) * Replaced Venice-managed ZK paths strings with constants in multiple files * Refactored code + refined test cases

* commit bootstrap changes --------- Co-authored-by: “ishwarya-personal” <“mishwarya76@gmail.com”>

… push (linkedin#1059) Today, Controller would check push status system store partition by partition in the following way: 1. Iterate the partition by partition, and get all the subscribed DaVinci instances per partition. 2. Check whether instance partition status is completed or not. 3. If not, check the liveness of the instance to decide whether the status should be ignored or not. If one DaVinci store is being used in unpartitioned way, the total number of calls to push status system store for liveness check will be partition_num * host_num, which will be a very high number, which would timeout the Controller status polling request for inc push, which is still doing real-time status check against push status system store. With this change, Controller will skip the duplicate liveness calls, so that the total call count will be partition count + unique host name for each status polling request. The reason I didn't batch the requests as I would like to reduce the KPS to the push status store to avoid quota issue and with the reduction, the perf should be good enough.

…kedin#1060) remove the feature flags from tests Co-authored-by: “ishwarya-personal” <“mishwarya76@gmail.com”>

…c io by default (linkedin#950) (linkedin#1063) Revert "[server][da-vinci] Bumped RocksDB dep and adopt multiget async io by default (linkedin#950)" This reverts commit 02591a1.

* Fix NPE in ScatterGatherRequestHandlerImpl thenCompose requires functions that do not return null. dedup partition names

…n#1064)

…inkedin#1066) If users set deferred version swap for a push, they could decide to swap the current version at much later time than 24hr of push timeout. Starting next push after 24hr would cause the previously pushed version to be killed even though the push has finished successfully. This PR fixes that by failing new pushes till the previously pushed version is swapped to current version.

…1068) In handleValueSchemaLookup it allocated the schema response into a array of size number of schemas and sets them using the schema id as index into the array. But if there are schemas which are deleted then those insertions would throw java.lang.ArrayIndexOutOfBoundsException. This PR fixes that.

…egionMultiClusterWrapper (linkedin#1067) There are several tests that create multi-region clusters through manual, non-standard setups. An example of this is that many tests create a multi-region cluster that shares the same Kafka cluster between them. While this might be valuable for some tests, it is a maintenance overhead for most of them. Our test suite already has "VeniceTwoLayerMultiRegionMultiClusterWrapper" that can be used directly for all current use-cases. This commit replaces all tests that setup a multi-region test cluster through non-standard means to use "VeniceTwoLayerMultiRegionMultiClusterWrapper" instead. This commit also fixes an issue where if the config to enable active active by default for batch-only stores is enabled, new store creation for system stores makes them also active-active. This was masked previously by different configs set on child and parent controllers. So, running this config in parent controller would have been problematic. This change exposed another bug that storage persona only runs validations when running in a multi-region setups while there is nothing conceptually blocking it from running in a single-region setup. This will be addressed in a follow up PR.

…Config (linkedin#1070) Today, in controllers, we have two classes of configs which serve the same purpose, and it is unnecessary to maintain both. This commit merges "VeniceControllerConfig" and "VeniceControllerClusterConfig".

…e integ test (linkedin#1073) * [dvc][test] Fix missing parts for blob transfer and add one incomplete integ test (linkedin#1073)

Repush with DELETE as the last update to a certain key will fail the job, as before EOP, AASIT's processing will short-circuit to LFSIT logic, the DELETE handling there does not take the DELETE payload as it did not expect DELETE can carry RMD info and thus in the drainer side it will throw NPE.

…inkedin#1071) When fetching schemas from routers in a thin client, any transient issues can cause requests to fail, leading the client to save a null schema for the schema ID, which results in further request failures. This PR addresses the issue by forcing a refresh during schema fetches and increasing both the retry count and delay for router requests.

…edin#1052) Enforce record size limits for batch push jobs, so with push jobs with Venice records larger than `maxRecordSizeBytes` will fail, reporting a `RecordSizeTooLarge` error. Note that no push jobs will fail by default until the controller config `controller.default.max.record.size.bytes` is adjusted and deployed for each cluster. More details in the PR description.

…kedin#1056) * Implemented metadata migration from src zk to dest zk * Refactored test cases and resolved spotbugs * Added javadoc to ZkCopier class and moved cluster zk paths field to constants file * Implemented comments and refactored getVenicePaths() that takes in a list or a tree * Modified AdminTool command for metadata migration - Implemented optional arguments to build ZK clients with ZK SSL config files - Implemented more validation checks to copy over existing ZNode paths - Refactored test cases * Successful testing of metadata migration with ei zk server - fixed read and write data issues in ZkClient with help from Kai+Nisarg - removed unused test case * Added note to migrateVenicePaths() to ensure that destination zk is new or doesn't contain metadata that needs to be migrated * Addressed comments * Reverted to previous iteration with minor changes - Getting Venice-specific metadata from ZK client into a tree data structure - Then, used pathsTreeToList() to convert tree to list - Made source and destination ZK client SSL config files required arguments for metadata migration AdminTool command

Change parameter to true

Modified integration test for blob-transfer

…din#1103) Set the ReadRequestThrottler.storesThrottlers AtomicReference at the definition site, containing an empty map, to fully eliminate NPEs, even if listeners are added before the end of the constructor. Also added a unit test to repro the issue. Miscellaneous: - Reduced accesses to the ReadRequestThrottler.storesThrottlers, to make it easier to see whether mutations are within the scope of a lock or not. - Deleted the RoutersClusterManager::unSubscribeRouterCountChangedEvent function since it is unused.

* Reorganized DefaultIngestionBackend to ensure that boobstrap happens with a clean file * Fix for integration test A

…ontext for better ingestion debugging. (linkedin#1090) During ingestion debugging process, version topic information of all topic partitions on one consumer are still very useful. As we need this ingestion speed or last poll time for topic partition of all version topics for debugging slow ingestion issues. The version topic is the destination of DataReceiver inside ConsumptionTask for each PubSubTopicPartition, so here adding a one method for ConsumptionTask to get version topic. --------- Co-authored-by: Hao Xu <xhao@xhao-mn3.linkedin.biz>

no flush in quota enforcement handler Use separate thread pool for quota rejected request's response

This reverts commit a419089.

This reverts commit 0e89b97.

This reverts commit edae8d4.

This reverts commit 09039a6.

This reverts commit e9fb60f.

… to reduce contention Use old nextUpdateTimeVal in refillCount

majisourav99 and others added 30 commits June 20, 2024 17:25

[common] Add Netty based P2P lib for blob transfer (linkedin#1041)

ef4000d

[docker] Update venice docker image versions in quick start guides (l…

9f6b5f1

…inkedin#1048) - Use v0.4.263 for Venice docker images

[doc] Update open-source docs to make getting started easier (linkedi…

c81f413

…n#1051)

[test] Fixed unit test related to multi threading (linkedin#1054)

1894f95

* [server][davinci] Created interface for Blob dicovery and added a server implementation * Fixed inconsistent unit test * Fixed inconsistent unit test

Query blob discovery bootstrap (linkedin#1046)

3bb03f5

* commit bootstrap changes --------- Co-authored-by: “ishwarya-personal” <“mishwarya76@gmail.com”>

[test] remove unnecessary blob transfer feature flags from tests (lin…

129dbfa

…kedin#1060) remove the feature flags from tests Co-authored-by: “ishwarya-personal” <“mishwarya76@gmail.com”>

Revert "[server][da-vinci] Bumped RocksDB dep and adopt multiget asyn…

f9af2bf

…c io by default (linkedin#950) (linkedin#1063) Revert "[server][da-vinci] Bumped RocksDB dep and adopt multiget async io by default (linkedin#950)" This reverts commit 02591a1.

[router] Fix NPE in ScatterGatherRequestHandlerImpl (linkedin#1044)

adbc590

* Fix NPE in ScatterGatherRequestHandlerImpl thenCompose requires functions that do not return null. dedup partition names

[admin-tool] Make end-date arg optional in query-kafka-topic (linkedi…

9c65bc8

…n#1064)

[controller] Avoid NPE in DIV check during kill-job (linkedin#1065)

65a92eb

[dvc][test] Fix missing parts for blob transfer and add one incomplet…

cd7889d

…e integ test (linkedin#1073) * [dvc][test] Fix missing parts for blob transfer and add one incomplete integ test (linkedin#1073)

[config] Changed parameter to allow for blob transfer (linkedin#1078)

e348008

Change parameter to true

[admin-tool]Modified integration test for blob-transfer (linkedin#1080)

f95f663

Modified integration test for blob-transfer

FelixGV and others added 29 commits August 9, 2024 09:54

[da-vinci] Open RocksDB after blob transfer is completed (linkedin#1105)

0ad2fe1

* Reorganized DefaultIngestionBackend to ensure that boobstrap happens with a clean file * Fix for integration test A

Threadpool for quota

a419089

no flush in quota enforcement handler Use separate thread pool for quota rejected request's response

Revert "Threadpool for quota"

a1c3d34

This reverts commit a419089.

Add netty stats

01f42b1

Add server connection stats

b6bcb3e

Change log level to debug for ServerConnectionStatsHandler

57d08e8

rename sensors

596c71e

Measure writeAndFlushTime for quota rejected requests

37cb75b

Add metrics for ResponseWriteAndFlushTimeForDataRequest

8d3accf

Fix broken code

efd1489

Add queuedTasksForReadHandler

16b70bd

Add time spent in read handler

49a9c8f

Add elapsed time

21e6f69

Introduce write and flush handoff pool

0e89b97

Revert "Introduce write and flush handoff pool"

b210502

This reverts commit 0e89b97.

Add back FlushConsolidationHandler

1e9fab0

reduce flush consolidation to 16

e736e00

Update connection stats

25a916d

Add recordTimeSpentInQuotaEnforcement

78e4f45

Skip flush in quota

09039a6

Disable flush in channelReadCompletd

edae8d4

Revert "Disable flush in channelReadCompletd"

8fd262e

This reverts commit edae8d4.

Revert "Skip flush in quota"

950083f

This reverts commit 09039a6.

Use expensive handlers only for acl and quota

e9fb60f

Revert "Use expensive handlers only for acl and quota"

536ee95

This reverts commit e9fb60f.

[server][router] Replace synchronized with CAS in TokenBucket::update…

d8c5aaa

… to reduce contention Use old nextUpdateTimeVal in refillCount

Delete unnecessary set

d99fafd

sushantmane closed this Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce Quota Rejections #18

Reproduce Quota Rejections #18

sushantmane commented Aug 13, 2024

Reproduce Quota Rejections #18

Reproduce Quota Rejections #18

Conversation

sushantmane commented Aug 13, 2024