Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-RFC: incremental improvements to DA #136

Open
ordian opened this issue Jan 21, 2025 · 4 comments
Open

Pre-RFC: incremental improvements to DA #136

ordian opened this issue Jan 21, 2025 · 4 comments

Comments

@ordian
Copy link
Contributor

ordian commented Jan 21, 2025

RFC 47 proposed a way to enable systematic chunk recovery for Polkadot's DA, improving the efficiency/reducing the CPU overhead. However, systematic recovery can only be (reasonably) used for small PoVs given the bandwidth constraints of the validators. On top of that, enabling it requires making a breaking change to the protocol (including the collator node side).

The idea of this Pre-RFC is to backport some of the changes made in JAM to Polkadot and bundle them alongside with the breaking change of RFC-47.

In particular, 2 changes are being proposed:

  1. Switch the format and the library for reed-solomon encoding/decoding.
    According to these benches, a proper SIMD implementation (like this one) of reed-solomon is 3-4x faster in encoding and up to 9x in full decoding.
    The current erasure coding format and algorithm has not been specified and here we have an opportunity to fix that too.
  2. We are using Merkle Patricia Trie for computing the erasure root. Switching to a simple merkle binary tree would reduce the CPUoverhead of merkle chunks and make them smaller.
@alindima
Copy link
Contributor

However, systematic recovery can only be (reasonably) used for small PoVs given the bandwidth constraints of the validators.

Do you have some data? IMO systematic recovery is the best option to use for large PoVs. The current alternatives are regular chunk recovery (same bandwidth but more CPU consumption) and full recovery from backers (this is the one that would risk exhausting the bandwidth of a backer if everyone is trying to recover from them). Systematic chunks are spreading out the load among a different set of n/3 validators for each core.

In particular, 2 changes are being proposed:

These make sense to me, with the slight caveat that collators already started upgrading to a version that supports the new protocol (intentionally or not) so we'd reset this effort

@ordian
Copy link
Contributor Author

ordian commented Jan 21, 2025

These make sense to me, with the slight caveat that collators already started upgrading to a version that supports the new protocol (intentionally or not) so we'd reset this effort

hopefully the omninode initiative will make the upgrade process a breeze

Do you have some data? IMO systematic recovery is the best option to use for large PoVs. The current alternatives are regular chunk recovery (same bandwidth but more CPU consumption) and full recovery from backers (this is the one that would risk exhausting the bandwidth of a backer if everyone is trying to recover from them). Systematic chunks are spreading out the load among a different set of n/3 validators for each core.

Sorry, that was an incorrect statement. What I actually wanted to say is that the backing fast path only happens for small PoVs due to bandwidth requirements and the systematic recovery can only work with almost ideal networking scenario where everyone is connected to the corresponding third of validators and as such we need to ensure the system will sustain the load in the worst case scenario.

@sandreim
Copy link
Contributor

IMO systematic recovery is the best option to use for large PoVs. The current alternatives are regular chunk recovery (same bandwidth but more CPU consumption) and full recovery from backers (this is the one that would risk exhausting the bandwidth of a backer if everyone is trying to recover from them). Systematic chunks are spreading out the load among a different set of n/3 validators for each core.

I think this is only true if backers do not have the hw spec bandwidth requirements met, or approval voting is escalating heavily. In practice, there is more overhead both in time and bandwidth when recovering n/3 chunks versus getting the whole thing from a single guy.

We could always fetch from backers no matter the PoV size if backers became smarter. A honest backer can decide if it can provide the full PoV or not based on their bandwidth usage. This means we need to define an upper bound on how much bandwidth backing is allowed to use in a window of time.

@burdges
Copy link

burdges commented Jan 21, 2025

I believe the reed-solomon-simd benchmarks were reproduced in-house by someone, so yes this sounds like a good change.

A binary merkle tree is definitely better: Binary costs 320 byte per chunk on Kusama, while ETH tree pays like 1152 bytes. If we count by block instead of by chunk, then binary costs 32 k log_2 n bytes per block, where k lies between n/4 and n/3, and n is the number of validators.

We've 3-4 recovery types both in Polkadot and JAM right?

  1. systematic recovery from backers/guarantors, optionally sending the full block, not by chunks, so no chunk merkle proofs.
  2. systematic recovery by chunks from (2a) the systematic k of validators, and ideally also (2b) backers/guarantors.
  3. non-systematic recovery by chunks from any k of validators.

Only 3 pays the reconstruction CPU time, but all three pay the reencoding CPU time.

2 fails over cleanly into 3, so 2 has minimal downsides. 2b requires that backers/guarantors be counted among the chunk suppliers for all systematic chunks, and 2b pays this merkle proof bandwidth, but 2b avoids requiring that "everyone be connected to the corresponding k of validators." At 5 backers, there are 6 nodes who contain each systematic chunk.

If 1 does full blocks, then 1 does not fail over cleanly into 2 and 3, well not without adding some hackery. If 1 works by chunks, then afaik 2b could basically replace 1.

If I recall, we did 1 first primarily as a small block optimization, so we probably cannot replace 1 by 2b, but maybe larger blocks could skip 1 and go directly to 2a+2b? This means large blocks always pay the merkle proof overhead, but avoid the wasteful failure mode for 1. We could make this choice by backer reputation too, but that feels overly complex.

Anyways, there should be 2-3 flavors of systematic recovery here, whose characteristics differ between large and small blocks, based upon whether they attach merkle proofs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants