Skip to content

Latest commit

 

History

History
53 lines (29 loc) · 12.4 KB

sharding.md

File metadata and controls

53 lines (29 loc) · 12.4 KB
description
"How do you horizontally scale capacity?"

Sharding

One of the biggest problems of blockchains is if they can scale their capacity horizontally. Horizontal scaling refers to adding more capacity to the system rather than trying to upgrade the existing system in hopes to add more capacity(which is called as vertical scaling). One of the most common ways of horizontal scaling is sharding. Although sharding as a concept is cursed in crypto, with Ethereum pivoting to L2s and new L1s demonstrating high-performance without sharding, we believe that sharding as a concept will be adopted in some form by the space in the coming years.

But why? The answer is quite simple -- at any point of time, compute utilization is isolated to only one person in a network, the leader. This fundamentally limits the scalability of the network to the computation capacity of the leader. Sure, validators can upgrade their machines but this harms decentralization. Sharding provides a subtle approach to this problem, wherein the global compute of the network is identified as the compute resources of multiple leaders combined. These leaders operate over different fragments(or "shards") of state. The "root shard" or the "consensus shard" ensures consistency and validity of shards and cross-shard calls.

But, why are we thinking of sharding? The aim of Spicenet is to offer performant, decentralized, reliable and scalable trading services to users, and scale current DeFi experiences by atleast 10x or more. We quickly realized that continuously upgrading the centralized sequencer hits a roadblock in terms of costs and scalability. Moreover, we also want to decentralize the sequencer, and be a truly decentralized network, while sustaining performance. Sharding allows us to do exactly this.

And how are we implementing sharding? While the spec of sharding on Spicenet is not fully final, in this section we will discuss the different approaches of sharding identified by us, and their properties.

Sequencer Clustering

Sequencer clustering refers to decoupling the various processes undertaken by a centralized sequencer and distributing them across various specialized and hyper-optimized machines. Another way to think of this is the centralized sequencer being represented by a collection of machines with each machine handling a portion of the load. Depending on demand, more machines can be spun up.

Admission to the sequencer cluster is controlled by the protocol, so that control over configuration of the machines can be closely monitored by the protocol, allowing for quick modifications and upgrades. We aim to partner with leading node platforms and firms to run machines within the sequencer cluster. But, let's understand how the sequencer cluster fits into the transaction flow and the noticeable differences observed.

The sequencer cluster has a main entrypoint which refer to as the SequencerDAG, whose main job is to determine ordering of transactions and direct them to different includers. SequencerDAG can be thought of as a record book that monitors the transaction flow, directs transactions to different includers(also referred to as "machines") and builds the Direct Acyclic Graph(DAG). The DAG is built as a collection of all pending transactions and the includers they're allocated to.

An includer has 3 broad responsibilities -- giving an inclusion pre-confirmation, preparing a mini batch to be posted to Celestia, and posting a micro batch to be sent to the executor. When an includer receives a transaction, it responds with a pre-confirmation and asynchronously sends a "micro" batch, consisting of a few transactions to the executor. On the other hand, it works on preparing a "mini" batch, consisting of slightly more transactions, and posts them to Celestia.

The executor receives micro batches from includers and executes them in the order specified by SequencerDAG. It also verifies the authenticity of the includer by cross-verifying with SequencerDAG. This is possible because SequencerDAG maintains a DAG of all pending transactions and the respective includers they're allocated to. Every hundred milliseconds or so, the executor produces a new global state and disseminates the result of execution to the network of RPC providers and full nodes, which finally results in the end user seeing data update on their client.

This process includes some, but not significant overheads in the form of includer -> executor communication, but since both parties are co-located, the latency penalty is negligible. While pre-confirmations are given instantly, it can take upto a few hundred milliseconds more for the execution outcome to be reached, and the state refreshing on the user side. Execution outcome is what matters for the end user, and not a pre-confirmation(obvious, a deposit would succeed when the user balance increases, not when a user receives a pre-confirmation). And in situations where two transactions originating from different machines collide, the primary has a right to overwrite the pre-confirmation of one of the transaction, to allow the other to pass. And this decision is made using the SequencerDAG which provides ordering integrity defined by timestamp. For example, if a market maker is trying to cancel their order and an arbitrageur is trying to take the same order, SequencerDAG provides ordering for these transactions and allows the primary to choose what transaction came first(although both may receive pre-confirmations).

Here's a glossary of different roles within the Sequencer cluster:

  • SequencerDAG: Ensures ordering integrity by timestamp and builds a Direct Acylic Graph of transactions being handled by different includers(also referred to as "machines").
  • Includers: Hyper-optimized machines that receive transactions and respond with a pre-confirmation. Also send their own batches to Celestia. Each batch is referred to as a "mini batch".
  • Executor(also known as "primary"): Typically just one party, responsible for executing transactions pre-conf'ed by includers in the order specified by SequencerDAG.

Full Sharding

We also take interest in fully sharded protocols as a form of hyper-scalable horizontal scaling. Sharding refers to running different instances of the network at the same time. Sharding horizontally scales computation and access to state. We are currently evaluating sharding as one of the approaches to achieve our vision of the "permissionless trading chain". Spicenet aims to diversify its product offerings from just crypto derivatives to private intent-based trading, auctions and betting markets, and specialized RWA markets. This significantly expands the user base of Spicenet opening the doors to much higher network traffic. Hence, there is a need to sustainable scale the network to handle such elevated demand.

Sharding allows us to run different product verticals across different shards, while still offering a UX similar to as if everything was on a single, atomic chain. This allows Spicenet to leverage the biggest upside of sharding -- multiple, non-conflicting proposers working in cohesion. The network will broadly consist of two types of shards, the consensus shard and the execution shards.

The consensus shard is responsible for computing the global state and ensuring consistency of cross-shard calls. The consensus shard consists of validators running on a multi-proposer consensus algorithm built on top of the Sui Mysticeti protocol. The consensus shard works like a normal PoS chain with each validator having stake in the system, and subject to specific slashing conditions, leading to slashing of validator stake if met.

The execution shards are a collection of different shards, each containing their own global state and sequenced by a set of validators. We utilize a concept called as stake versatility, allowing the same stake securing the consensus shard to also secure the execution shard. This means that the same validator set that validate the consensus shard also perform the duties of execution shards, such as execution, ordering and batching. This also means that PoS slashing rules apply to execution shards as well and any form of mass shard corruption may attribute toward byzantine faults. Why is this required? In a sharded system, it is important to secure both the consensus and the execution shards and hence it is required to have some form of unified stake attribution system that works across both types of shards. This is also why a custom consensus protocol is warranted, to incorporate byzantine fault tolerant systems within both consensus and execution shards.

But, how does the network actually function in a sharded environment? A transaction corresponds to one and only one shard, although it can invoke other shards within the shard it belongs to. Every 50ms, one validator is appointed for each shard, wherein for that duration, they are responsible for determining ordering and execution for any inbound transactions for that shard. In the happy case, there would be no cross-shard calls required, and the sequencer(a validator) would give an instant pre-confirmation. However, in the case where a cross-shard call is to be made, the sequencer for shard A sends a priority message via Spicenet's Unified Messaging Layer(UML) to sequencer of shard B prompting it to modify it's state in accordance with the cross-shard transaction. This priority messaging system allows sequencers to exchange messages at really low-latency, and using a priority messaging queue present within every sequencer, specifically designed to receive messages from other sequencer. In a situation where a transaction needs to be executed by another shard, it adds some form of a latency penalty, as the transaction needs to be diverted to the sequencer of that shard, which then needs to include and execute the transaction.

The asynchronous environment of Spicenet allows the sequencer of slot N-1 of shard A to put the transactions of slot N-1 in a batch, compute the ZK proof and send them to the consensus shard, while the sequencer of shard N can receive inbound transactions and work on executing them. The consensus shard receives ZK proofs from various execution shards, and the multi-proposer design of Mysticeti allows multiple leaders to propose a fraction of the consensus shard block. The consensus shard block computes the global merkle tree using the various "local" merkle trees received in the form of a ZK proof by various execution shards.

How does settlement work in a sharded system? To understand this, we need to first understand what the consensus shard adds to the L2 network. The consensus shard can be thought of as a shared sequencer(although never explicitly marketed as such) for independent, yet interoperable chains(shards). The only difference being that the chains here just represent the different product verticals of Spicenet, rather than totally independent and unrelated chains. This means that the consensus shard is an intermediate finality and ordering layer for the shards(that computes the global state root and verifies cross-shard message passing consistency).

However, the consensus shard is not a full blockchain on it's own, and sequences the shards optimistically, and hence requires a single source of truth with which it can verify the data it receives from different shards. This single source of truth is the DA layer. Shard sequencers communicate with the consensus shard using Spicenet's Unified Messaging Layer or UML, which is a set of TCP connections allowing seamless and single-shot communication between entities. Execution shards also post their batches / ZK proofs to Celestia as well, allowing the consensus shard to verify data received from the execution shards with the data downloaded from Celestia.

If the data of an execution shard received from Celestia is not the same as that of the one sent by the shard's sequencer, the consensus network can halt the execution shard from settling and slashes the sequencer for that shard. In the case where the data matches, the proposers of the consensus shard prepare a block and broadcast to the network. Consensus nodes can then re-execute the block to verify correct computation. Once the consensus shard reaches consensus on the block, it is considered finalized and the execution shards whose transactions are included in that block are also implicitly considered finalized. Execution shards are only required to send batches / ZK proofs to the consensus every 400ms.