-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[controller] Harden update-store workflow #1091
base: main
Are you sure you want to change the base?
Conversation
9edf55a
to
a019bc8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks like a good change! I didn't have time to look at it fully yet, but I left one minor comment so far.
...es/venice-controller/src/main/java/com/linkedin/venice/controller/util/UpdateStoreUtils.java
Outdated
Show resolved
Hide resolved
e42ddde
to
a81e40e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a pass. Would recommend having another pair of eyes especially on the following files due to my limited context in the code base
- VeniceParentHelixAdmin
- VeniceHelixAdmin
- UpdateStoreUtils
internal/venice-common/src/main/java/com/linkedin/venice/ConfigKeys.java
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/ConfigKeys.java
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/helix/StoragePersonaRepository.java
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/meta/HybridStoreConfig.java
Outdated
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/meta/PartitionerConfigImpl.java
Show resolved
Hide resolved
services/venice-controller/src/main/java/com/linkedin/venice/controller/Admin.java
Show resolved
Hide resolved
services/venice-controller/src/main/java/com/linkedin/venice/controller/Admin.java
Show resolved
Hide resolved
...oller/src/main/java/com/linkedin/venice/controller/init/SystemStoreInitializationHelper.java
Show resolved
Hide resolved
...ntroller/src/main/java/com/linkedin/venice/controller/kafka/consumer/AdminExecutionTask.java
Show resolved
Hide resolved
...es/venice-controller/src/main/java/com/linkedin/venice/controller/util/UpdateStoreUtils.java
Show resolved
Hide resolved
0d3e2c1
to
916b14d
Compare
a296585
to
33d1620
Compare
e4ebdbe
to
408d463
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I mostly made sure that the checks and description in the summary matches w/ the code changes.
Final comments on the strategy about releasing the changes
- Do we have branch based releases for complex changes/features?
- It feels merging this w/ master and finding issues during certification can be potentially difficult to deal/mitigate as surface area of the changes is large and reverting such large commit accurately (assuming other changes pile in) might be tricky.
How do plan to certify this w/ our current certification flow?
...es/venice-controller/src/main/java/com/linkedin/venice/controller/util/UpdateStoreUtils.java
Outdated
Show resolved
Hide resolved
...es/venice-controller/src/main/java/com/linkedin/venice/controller/util/UpdateStoreUtils.java
Outdated
Show resolved
Hide resolved
...es/venice-controller/src/main/java/com/linkedin/venice/controller/util/UpdateStoreUtils.java
Show resolved
Hide resolved
...es/venice-controller/src/main/java/com/linkedin/venice/controller/util/UpdateStoreUtils.java
Show resolved
Hide resolved
...es/venice-controller/src/main/java/com/linkedin/venice/controller/util/UpdateStoreUtils.java
Outdated
Show resolved
Hide resolved
...es/venice-controller/src/main/java/com/linkedin/venice/controller/util/UpdateStoreUtils.java
Outdated
Show resolved
Hide resolved
services/venice-controller/src/main/java/com/linkedin/venice/controller/util/AdminUtils.java
Show resolved
Hide resolved
...s/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceParentHelixAdmin.java
Show resolved
Hide resolved
408d463
to
2389668
Compare
No, we don't. I think since the integration tests didn't have to change apart from fixing invalid configs, I can fix all stores to make sure that they have valid configs before we merge this PR. There is a concern that this impacts more than just LI, but I think no one else is using most of the advanced features that have added validations. Also, this won't impact any reads/writes, but control paths would get impacted.
That is correct. With such a large change that modifies how update store is basically done, this is a risk I think is worth taking. There are lots of behavioral changes that make it impractical to be made in a way we can flip it on and off through feature flags. I don't think we have added any change that blocks a revert - so we have that option open.
This will get merged in our code and will only get deployed in the certification cluster. It will not be exposed to user clusters and user stores until we certify the change |
2389668
to
626f083
Compare
626f083
to
9f3e225
Compare
Harden update-store workflow
This PR unifies the
UpdateStore
logic between child controller and parent controllers. We've faced many issues due to the structure of the code (especiallyVeniceParentHelixAdmin
) where a set of store-update operations puts it into a non-healthy state. These are all the changes in this PR:UpdateStoreUtils
that is called by both parent and child controllers to apply the store update and perform the necessary validations.NON_AGGREGATE
data replication policyACTIVE_ACTIVE
data replication policy is only supported when Active-Active replication is enabledDataReplicationPolicy
isAGGREGATE
ordinal
fromBackupStrategy
enum was being used to write to the admin channel. This is problematic when the enums evolve. Added agetValue
function and used that instead.controller.external.superset.schema.generation.enabled
has been added to replacecontroller.parent.external.superset.schema.generation.enabled
because external superset schema generation must be allowed in single-region mode also.controller.parent.external.superset.schema.generation.enabled
has been marked deprecated, but it has not been completely removed yet for backward compatibility reasons.Some side-effects of this change are:
SupersetSchemaGeneratorWithCustomProp
had a bug where if the first schema has a custom prop, the future superset schema generation would fail as Avro doesn't allow overriding custom props. This got caught as the update store logic now also tries creating superset schema if a store enabled RC or WC, or if it previously had a superset schema.There are a few other changes that we should do, but are not done in this PR:
All major operations should only be allowed on the parent controller - Create a store, delete a store, add schemas. We should exclude some system stores from this check like we have for the check allowing batch push to admin in child
Recommendation for Reviewers
I recommend going through the changes in at least two passes. In the first pass, look through all the stuff that has been purely deleted. (Skip
ParentControllerConfigUpdateUtils
as that has been renamed toPrimaryControllerConfigUpdateUtils
and it's contents partially moved toUpdateStoreUtils
. So, GitHub doesn't detect is as a renaming). While doing this pass, you can also glance through various small changes that do not need much pondering over.In the second pass, follow this review order:
VeniceControllerClusterConfig
,VeniceControllerConfig
,VeniceControllerMultiClusterConfig
VeniceControllerService
Admin
UpdateStoreUtils
PrimaryControllerConfigUpdateUtils
AdminUtils
UpdateStoreWrapper
VeniceHelixAdmin
VeniceParentHelixAdmin
SupersetSchemaGeneratorWithCustomProp
How was this PR tested?
Added new tests. Modified existing tests. GH CI
Does this PR introduce any user-facing changes?