-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VIP-2] Removing Per Record Offset Metadata From Venice-Server Storage With Heartbeats #513
base: main
Are you sure you want to change the base?
Conversation
> | ||
>HEARTBEAT: {3, 199, 4500} | ||
> | ||
>Record 1: {<3, 200, <4500} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the backfill information like <3
, does that mean we can always gurantee between HEARTBEAT and Record 1, there is no update from other regions. Only region #2 with offset 200 mute that key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll include a case here for the two updates as it's an important one. We have two choices, we can either rely on the heartbeat + previous events to build a running highwatermark from which we can apply this backfill, or, we can compact key updates which occur between two heartbeats.
My intuition is that key updates are the way to go because if we apply an increasingly growing highwatermark into the offsets of individual records, then we narrow the common window between two colos. I've got a spec half done which models this, I'll post back here once I've determined conclusively that this intuition is right. But it's a good call out because with two to the same key within the heartbeat interval then backfilling with the less then up to the last heartbeat is an incorrect generalization, and will lead to false positives in some simple cases.
|
||
That said, it's not actually a requirement to be able to do this on every single event we consume. It's possible to meet the first two requirements at a courser granularity of updates. | ||
|
||
### Heartbeat Algorithm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure my understanding is correct or not, just want to confirm: If we have 3 regions, the Heartbeat algorithm can save RMD space from storing 3 regions's offset to 1 region's offset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, but we'll actually store none of the regions offset metadata in rocksdb. We'll only persist it to PubSubBroker as envelope metadata. None of which has to go into rocksdb.
layout: default | ||
title: [VIP-2] Removing Per Record Offset Metadata From Venice-Server Storage With Heartbeats | ||
parent: Community Guides | ||
permalink: /docs/proposals/VIP_TEMPLATE.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For generating a reachable URL:
permalink: /docs/proposals/VIP_TEMPLATE.md | |
permalink: /docs/proposals/vip-2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also update the Proposals table.
[VIP-2] Removing Per Record Offset Metadata From Venice-Server Storage With Heartbeats
This VIP explores a strategy for removing the offset metadata stored per record in Venice by utilizing replica heartbeats.