-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre-RFC: incremental improvements to DA #136
Comments
Do you have some data? IMO systematic recovery is the best option to use for large PoVs. The current alternatives are regular chunk recovery (same bandwidth but more CPU consumption) and full recovery from backers (this is the one that would risk exhausting the bandwidth of a backer if everyone is trying to recover from them). Systematic chunks are spreading out the load among a different set of n/3 validators for each core.
These make sense to me, with the slight caveat that collators already started upgrading to a version that supports the new protocol (intentionally or not) so we'd reset this effort |
hopefully the omninode initiative will make the upgrade process a breeze
Sorry, that was an incorrect statement. What I actually wanted to say is that the backing fast path only happens for small PoVs due to bandwidth requirements and the systematic recovery can only work with almost ideal networking scenario where everyone is connected to the corresponding third of validators and as such we need to ensure the system will sustain the load in the worst case scenario. |
I think this is only true if backers do not have the hw spec bandwidth requirements met, or approval voting is escalating heavily. In practice, there is more overhead both in time and bandwidth when recovering n/3 chunks versus getting the whole thing from a single guy. We could always fetch from backers no matter the PoV size if backers became smarter. A honest backer can decide if it can provide the full PoV or not based on their bandwidth usage. This means we need to define an upper bound on how much bandwidth backing is allowed to use in a window of time. |
I believe the reed-solomon-simd benchmarks were reproduced in-house by someone, so yes this sounds like a good change. A binary merkle tree is definitely better: Binary costs 320 byte per chunk on Kusama, while ETH tree pays like 1152 bytes. If we count by block instead of by chunk, then binary costs We've 3-4 recovery types both in Polkadot and JAM right?
Only 3 pays the reconstruction CPU time, but all three pay the reencoding CPU time. 2 fails over cleanly into 3, so 2 has minimal downsides. 2b requires that backers/guarantors be counted among the chunk suppliers for all systematic chunks, and 2b pays this merkle proof bandwidth, but 2b avoids requiring that "everyone be connected to the corresponding k of validators." At 5 backers, there are 6 nodes who contain each systematic chunk. If 1 does full blocks, then 1 does not fail over cleanly into 2 and 3, well not without adding some hackery. If 1 works by chunks, then afaik 2b could basically replace 1. If I recall, we did 1 first primarily as a small block optimization, so we probably cannot replace 1 by 2b, but maybe larger blocks could skip 1 and go directly to 2a+2b? This means large blocks always pay the merkle proof overhead, but avoid the wasteful failure mode for 1. We could make this choice by backer reputation too, but that feels overly complex. Anyways, there should be 2-3 flavors of systematic recovery here, whose characteristics differ between large and small blocks, based upon whether they attach merkle proofs. |
RFC 47 proposed a way to enable systematic chunk recovery for Polkadot's DA, improving the efficiency/reducing the CPU overhead. However, systematic recovery can only be (reasonably) used for small PoVs given the bandwidth constraints of the validators. On top of that, enabling it requires making a breaking change to the protocol (including the collator node side).
The idea of this Pre-RFC is to backport some of the changes made in JAM to Polkadot and bundle them alongside with the breaking change of RFC-47.
In particular, 2 changes are being proposed:
According to these benches, a proper SIMD implementation (like this one) of reed-solomon is 3-4x faster in encoding and up to 9x in full decoding.
The current erasure coding format and algorithm has not been specified and here we have an opportunity to fix that too.
The text was updated successfully, but these errors were encountered: