Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce Quota Rejections #18

Closed
wants to merge 80 commits into from
Closed

Reproduce Quota Rejections #18

wants to merge 80 commits into from

Conversation

sushantmane
Copy link
Owner

Reproduce Quota Rejections

Resolves #XXX

How was this PR tested?

Does this PR introduce any user-facing changes?

  • No. You can skip the rest of this section.
  • Yes. Make sure to explain your proposed changes and call out the behavior change.

majisourav99 and others added 30 commits June 20, 2024 17:25
…nkedin#1032)

When smart long tail retry is enabled, each storage node that responds is marked as fast and is used for subsequent retries.

However, if a storage node returns a TOO_MANY_REQUESTS (429) status code, it indicates that the node has exceeded its quota and retrying would be futile.
Retrying in this scenario would increase the load on already overwhelmed servers, leading to higher latency and degraded performance. This PR prevents retries against storage nodes that return a 429 status code, improving overall system efficiency.
…f store is AA enabled (linkedin#1043)

Do not use DRP for lag measurement if store is AA enabled. For AA-enabled
stores, consider lag from all fabrics to avoid premature version swaps in some
colos. This PR removes DRP dependency for lag measurement in AA-enabled stores.
…finishes before put can be executed (linkedin#1040)

* Create InternalDaVinciRecordTransformer to ensure that onStartIngestionTask completes before any puts can be executed

* Add public final to recordTransformer

* Make recordtransformer private

* Handle interrupts by restoring the interrupt flag instead of throwing an exception

* Fix incorrect package ref

* Don't have two references of record transformer

* Ensure getRecordTransformer does not return null

* Don't use apply twice

* Change back recordTransformer to final

* Don't instantiate DVRT twice

* Rename record transformer to be explicit about locking and remove linting changes

* Rename InternalDaVinciRecordTransformer to BlockingDavinciRecordTransformer. Undo variable name change, and modify description
Added a new store-level config setting `maxRecordSizeBytes`. This will be used to decide if a batch push job should fail or if nearline consumption should be paused due the ingestion of records that are too large.
  * The default value of `maxRecordSizeBytes` is `-1`. By leaving the limit as unset by default, it is easier for us to add a global config in the future should we deem necessary.
  * This default value of `maxRecordSizeBytes` will be changed to `10MB` in-memory when it reaches its eventual destination at `VeniceWriter`, even if the config is set to `-1`.
  * The adjustment is at the store-level, as we anticipate only a few stores could surpass the threshold and require an increased limit.

Added tests `testUpdateMaxRecordSize()` and `testChunkingMaxRecordSize()` in `VeniceParentHelixAdminTest`. The former verifies that the config is read and set properly, and the latter
verifies that the default value of `maxRecordSizeBytes` will be `10MB` if chunking is enabled.

Refactored tests involving "setting a value and verifying the value was set" into a reusable function `testUpdateConfig()`. All you need to do is provide `Consumer` functions for setting a value and verifying the value, which will hopefully make testing configs very straightforward.
…r data integrity validation (linkedin#1042)

[controller][server][client][test] Adds a new checksum type ADHASH for data integrity validation

This PR adds a new checksum type ADHASH for Venice's DIV. ADHASH adopts the
ideal of incremental hash from AdHash and leverages the CRC32 as the underlying
hashing algorithm.

CRC32 is commonly used in networks and storage devices for detecting data corruption.
The purpose of this algorithm is not to protect against intentional
changes (cryptographic hashing), but rather to catch accidents like network errors
and disk write errors, etc. The emphasis of CRC32 is more on speed than on security.

For testing, this PR extends all the existing checksum and DIV tests to include the
new ADHASH type.
* [server][davinci] Created interface for Blob dicovery and added a server implementation

* Created the BlobSnapshotManager that takes care of hybrid snapshots

* Adjusted concurrent users logic and added a test case to test it

* Cleaned up notes

* Implemented Kai's suggestions

* Added looger error for no path found

* Implemented comments

* Implemented comments

* Implemented comments

* Implemented comments
… regions (linkedin#1049)

When target region push is enabled, after pushing to the target region, rest of the regions do not delete the backup version before the start of the push. This increases the disk usage during the duration of the push to 3x of the store size (backup + current + future versions). This PR deletes the backup version before starting ingestion in non-target regions.
* [server][davinci] Created interface for Blob dicovery and added a server implementation

* Fixed inconsistent unit test

* Fixed inconsistent unit test
…hot and Restore tooling (linkedin#1047)

* Implemented naive solution for getVenicePaths() and wrote tests

* Implemented+tested AdminTool command for extracting Venice-specific paths into an output file

* Implemented+tested tree data structure to filter Venice-specific paths

* Refactored ZNode to TreeNode

* Tested AdminTool command to extract venice paths

* Implemented Nisarg's comments (refactored code/files, added javadoc, removed unnecessary comments)

* Implemented comments (refactored code/tests, added javadoc)

* Replaced Venice-managed ZK paths strings with constants in multiple files

* Refactored code + refined test cases
* commit bootstrap changes

---------

Co-authored-by: “ishwarya-personal” <“mishwarya76@gmail.com”>
… push (linkedin#1059)

Today, Controller would check push status system store partition by partition in the following way:
1. Iterate the partition by partition, and get all the subscribed DaVinci instances per partition.
2. Check whether instance partition status is completed or not.
3. If not, check the liveness of the instance to decide whether the status should be ignored or not.

If one DaVinci store is being used in unpartitioned way, the total number of calls to push status system
store for liveness check will be partition_num * host_num, which will be a very high number, which would
timeout the Controller status polling request for inc push, which is still doing real-time status check
against push status system store.
With this change, Controller will skip the duplicate liveness calls, so that the total call count will be
partition count + unique host name for each status polling request.
The reason I didn't batch the requests as I would like to reduce the KPS to the push status store to avoid
quota issue and with the reduction, the perf should be good enough.
…kedin#1060)

remove the feature flags from tests

Co-authored-by: “ishwarya-personal” <“mishwarya76@gmail.com”>
…c io by default (linkedin#950) (linkedin#1063)

Revert "[server][da-vinci] Bumped RocksDB dep and adopt multiget async io by default (linkedin#950)"

This reverts commit 02591a1.
* Fix NPE in ScatterGatherRequestHandlerImpl

thenCompose requires functions that do not return null.
dedup partition names
…inkedin#1066)

If users set deferred version swap for a push, they could decide to swap the current version at much later time than 24hr of push timeout. Starting next push after 24hr would cause the previously pushed version to be killed even though the push has finished successfully. This PR fixes that by failing new pushes till the previously pushed version is swapped to current version.
…1068)

In handleValueSchemaLookup it allocated the schema response into a array of size number of schemas and sets them using the schema id as index into the array. But if there are schemas which are deleted then those insertions would throw java.lang.ArrayIndexOutOfBoundsException. This PR fixes that.
…egionMultiClusterWrapper (linkedin#1067)

There are several tests that create multi-region clusters through manual, non-standard setups. An example of this is that many tests create a multi-region cluster that shares the same Kafka cluster between them. While this might be valuable for some tests, it is a maintenance overhead for most of them. Our test suite already has "VeniceTwoLayerMultiRegionMultiClusterWrapper" that can be used directly for all current use-cases.

This commit replaces all tests that setup a multi-region test cluster through non-standard means to use "VeniceTwoLayerMultiRegionMultiClusterWrapper" instead.

This commit also fixes an issue where if the config to enable active active by default for batch-only stores is enabled, new store creation for system stores makes them also active-active. This was masked previously by different configs set on child and parent controllers. So, running this config in parent controller would have been problematic.

This change exposed another bug that storage persona only runs validations when running in a multi-region setups while there is nothing conceptually blocking it from running in a single-region setup. This will be addressed in a follow up PR.
…Config (linkedin#1070)

Today, in controllers, we have two classes of configs which serve the same purpose, and it is unnecessary to maintain both. This commit merges "VeniceControllerConfig" and "VeniceControllerClusterConfig".
…e integ test (linkedin#1073)

* [dvc][test] Fix missing parts for blob transfer and add one incomplete integ test (linkedin#1073)
Repush with DELETE as the last update to a certain key will fail the job, as before EOP, AASIT's processing will short-circuit to LFSIT logic, the DELETE handling there does not take the DELETE payload as it did not expect DELETE can carry RMD info and thus in the drainer side it will throw NPE.
…inkedin#1071)

When fetching schemas from routers in a thin client, any transient issues can cause requests to fail, leading the client to save a null schema for the schema ID, which results in further request failures. This PR addresses the issue by forcing a refresh during schema fetches and increasing both the retry count and delay for router requests.
…edin#1052)

Enforce record size limits for batch push jobs, so with push jobs with Venice records larger than `maxRecordSizeBytes` will fail, reporting a `RecordSizeTooLarge` error. Note that no push jobs will fail by default until the controller config `controller.default.max.record.size.bytes` is adjusted and deployed for each cluster. More details in the PR description.
…kedin#1056)

* Implemented metadata migration from src zk to dest zk

* Refactored test cases and resolved spotbugs

* Added javadoc to ZkCopier class and moved cluster zk paths field to constants file

* Implemented comments and refactored getVenicePaths() that takes in a list or a tree

* Modified AdminTool command for metadata migration

- Implemented optional arguments to build ZK clients with ZK SSL config files
- Implemented more validation checks to copy over existing ZNode paths
- Refactored test cases

* Successful testing of metadata migration with ei zk server

- fixed read and write data issues in ZkClient with help from Kai+Nisarg
- removed unused test case

* Added note to migrateVenicePaths() to ensure that destination zk is new or doesn't contain metadata that needs to be migrated

* Addressed comments

* Reverted to previous iteration with minor changes

- Getting Venice-specific metadata from ZK client into a tree data structure
- Then, used pathsTreeToList() to convert tree to list
- Made source and destination ZK client SSL config files required arguments for metadata migration AdminTool command
FelixGV and others added 29 commits August 9, 2024 09:54
…din#1103)

Set the ReadRequestThrottler.storesThrottlers AtomicReference at the
definition site, containing an empty map, to fully eliminate NPEs, even
if listeners are added before the end of the constructor. Also added a
unit test to repro the issue.

Miscellaneous:

- Reduced accesses to the ReadRequestThrottler.storesThrottlers, to make
  it easier to see whether mutations are within the scope of a lock or
  not.

- Deleted the RoutersClusterManager::unSubscribeRouterCountChangedEvent
  function since it is unused.
* Reorganized DefaultIngestionBackend to ensure that boobstrap happens with a clean file
* Fix for integration test A
…ontext for better ingestion debugging. (linkedin#1090)

During ingestion debugging process, version topic information of all topic partitions on one consumer are still very useful. As we need this ingestion speed or last poll time for topic partition of all version topics for debugging slow ingestion issues. The version topic is the destination of DataReceiver inside ConsumptionTask for each PubSubTopicPartition, so here adding a one method for ConsumptionTask to get version topic.

---------

Co-authored-by: Hao Xu <xhao@xhao-mn3.linkedin.biz>
no flush in quota enforcement handler

Use separate thread pool for quota rejected request's response
… to reduce contention

Use old nextUpdateTimeVal in refillCount
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.