Skip to content

Chainweb Merkle Tree

Lars Kuhtz edited this page Oct 11, 2024 · 30 revisions

Merke Log Trees

Chainweb uses the merkle-log package for the chain Merkle tree (see. merkle-log-rs for a Rust version).

Merkle trees are balanced binary trees which can be nested by including the root of a sub-tree as leaf in a Merkle tree.

A node in a Merkle tree is represented as the hash of its subtrees.

data Node = Node Digest

There are two types of leafs in a Merkle tree:

  1. Tree nodes: the value is the root of a nested sub-Merkle tree. The value is used without additional hashing or tagging.
  2. Input nodes: the value is the tagged hash of the indexed data.

The Tag values are domain specific values that disambiguate serialized input data based on the type (cf. below).

data Input = SubTree Node | Leaf Tag ByteString

mkLeafNode :: Input -> Node
mkLeafNode (Leaf t d) hash ([0x00] <> t <> d)
mkLeafNode (SubTree n) = n

mkInnerNode :: Node -> Node -> Node
mkInnerNode l r = hash ([0x01] <> l <> r)

For a list of leaf nodes ls a Merkle tree is constructed as follows:

merkleRoot :: [Input] -> Node
merkleRoot ls = go ls []
  where
    go :: [Input] -> [(Int, Node)] -> Node

    -- empty tree
    go [] [] = hash ""

    -- Create new inner node
    go t ((h0, v0) : (h1, v1) : s) | h0 == h1 = go t ((h0 + 1, mkInnerNode(v1, v0)) : s)

    -- Add new leaf to stack
    go (l : t) !s = go t ((0, mkLeafNode l) : s)

    -- When all leafs are consumed, process remaining nodes on stack
    go [] ((h0, v0) : (_, v1) : s) = go [] ((a + 1, mkInnerNode(v1, v0)) : s)

    -- done
    go [] [root] = root

Chainweb Merkle Hash Function

The chainweb Merkle tree uses NIST SHA-512/256 from the SHA2 family as hash function.

The domain specific function tag assigns each type a unique a 16-bit value in big endian encoding before the value is hashed (in addition to the Merkle node tag as described above). The following tag values are used in Chainweb:

    ChainIdTag = 0x0002
    BlockHeightTag = 0x0003
    BlockWeightTag = 0x0004
    FeatureFlagsTag = 0x0006
    BlockCreationTimeTag = 0x0007
    ChainwebVersionTag = 0x0008
    HashTargetTag = 0x0011
    TransactionTag = 0x0013
    TransactionOutputTag = 0x0014
    MinerDataTag = 0x0017
    CoinbaseOutputTag = 0x0018
    EpochStartTimeTag = 0x0019
    BlockNonceTag = 0x0020

    -- Event Proofs (currently not indexed in the header tree)
    OutputEventsTag = 0x0030
    BlockEventsHashTag = 0x0031
    RequestKeyTag = 0x0032
    PactEventTag = 0x0034

Legend for Tree Diagrams

  • diamond nodes: Merkle tree roots of nested Merkle trees
  • rounded nodes: data leafs
  • round nodes: internal Merkle tree nodes
  • trapezoid: sub-Merkle trees with the listed leaf nodes
  • sub-graphs: nested Merkle trees

Payload Merkle Trees

Transactions and transaction outputs are indexed in separate Merkle trees. Usually, SPV proofs only use transaction outputs.

The list of transactions and transaction outputs are of the same size.

The order of transactions and outputs is the order in which they are processed within the block, which is also order in which they are returned in API queries. MinerData is the first entry in the transaction tree and Coinbase the first entry in the outputs tree.

MinerData and Coinbase are always present. The lists of tranactions and outputs can be empty for some blocks.

graph TD
  subgraph bpt [BlockPayload]
    direction TB
    bph{{BlockPayloadHash}}
    bph --> btt
    bph --> bot

    subgraph btt [BlockTransactions]
      direction TB
      bth{{BlockTransactionsHash}}
      bth --> txs[/MinerData, Transactions \]
    end

    subgraph bot [BlockOutputs]
      direction TB
      boh{{BlockOutputsHash}}
      boh --> outs[/Coinbase, Outputs\]
    end
  end
Loading

The REST API of chainweb-node returns the payload structures in base64Url encoding. For the Merkle tree the unencoded binary content (usually some unicode text) is hashed (subject to proper tagging as described above).

The format of payload data is defined by the Pact smart contract language and Pact service in chainweb-node. The content of the payload is not relevant for chainweb consensus. Details of the payload structure can be found in the wikipage about the Chain database.

Block Header Merkle Tree

The binary encoding for block headers is described on the Block Header Binary Encoding page.

The order of the fields block header fields in the Merkle tree is as follows (cf. implementation):

(flags, time, parent, target, payload, chain, weight, height, version, epoch_start, nonce) + adjacent_hashes

The adjacent_hashes are sorted numerically by chain id.

The depth of the tree and thus of Merkle proofs is logarithmic in the size of the block (header + payload).

The following diagram shows the structure of a Block Header Merkle tree for a twenty chain graph.

graph TD
  subgraph bht [BlockHeader]
    direction TB
  
    %% root
    bhash{{BlockHash}}

    %% leafs
    bf([BlockFlags])
    bc([BlockCreationTime])
    bp{{BlockParentHash}}
    bt([BlockTarget])
    bpt{{BlockPayloadHash}}
    bi([BlockChainId])
    bw([BlockWeight])
    bh([BlockHeight])
    bv([BlockChainwebVersion])
    be([BlockEpochStart])
    bn([BlockNonce])
    ba1{{BlockAdjacentParentHash_1}}
    ba2{{BlockAdjacentParentHash_2}}
    ba3{{BlockAdjacentParentHash_3}}

    %% inner nodes
    n0( ) --> bf & bc
    n1( ) --> bp & bt
    n2( ) --> n0 & n1
    n3( ) --> bpt
    n3( ) --> bi
    n4( ) --> bw & bh
    n5( ) --> n3 & n4
    n6( ) --> n2 & n5
    n7( ) --> bv & be
    n8( ) --> bn & ba1
    n9( ) --> n7 & n8
    n10( ) --> ba2 & ba3
    n11( ) --> n9 & n10
    bhash --> n6 & n11
  end
Loading

Chainweb Merkle Tree

Block header Merkle trees for a single Chainweb chain are linked linearly. The depth of the tree is thus linear in the block height. Hence the size of Merkle proofs for a range of blocks are linear in the size of the range.

Merkle trees for the chainweb chains are linked together in a DAG according to the Chainweb graph. Therefor with increasing depth there is an increasingly (exponentially) large number of different path between any two blocks.

A benefit of a linear tree is that a block at height n can be created and verified by using only blocks at height n - 1, which enables shallow nodes both for mining and validation. Even with full nodes, a balanced Merkle tree over block history would require random access history lookup during block creation and validation. Making the tree creation and validation inductive (so that only constant depth history lookups are needed), would add logarithmic (in the block history size) storage overhead to the block headers.

The following diagram shows the overall structure of the chain Merkle tree:

graph TB
  subgraph b0[Block_0]
    direction TB
    r0{{BlockHash_0}}
    h0[/BlockHeader_0\]
    subgraph p0[Payload_0]
      x0[/ \]
    end
    r0 --> h0
    h0 --> p0
    %% h0 --> r0
  end
  subgraph b1[Block_1]
    direction TB
    r1{{BlockHash_1}}
    subgraph p1[Payload_1]
       x1[/ \]
    end
    h1[/BlockHeader_1\]
    r1 --> h1
    h1 --> p1
  end
  subgraph bn[Block_n]
    direction TB
    rn{{BlockHash_n}}
    subgraph pn[Payload_n]
       xn[/ \]
    end
    hn[/BlockHeader_n\]
    rn --> hn
    hn --> pn
  end
  subgraph bn1[Block_n-1]
    direction TB
    rn1{{BlockHash_n-1}}
    subgraph pn1[Payload_n-1]
       xn1[/ \]
    end
    hn1[/BlockHeader_n-1\]
    rn1 --> hn1
    hn1 --> pn1
  end
  subgraph bn2[Block_n-2]
    direction TB
    rn2{{BlockHash_n-2}}
    subgraph pn2[Payload_n-2]
       xn2[/ \]
    end
    hn2[/BlockHeader_n-2\]
    rn2 --> hn2
    hn2 --> pn2
  end

  an((OtherChains_n))
  an1((OtherChains_n-1))
  an2((OtherChains_n-2))
  a1((OtherChains_1))
  a0((OtherChains_0))

  hn ---> rn1
  hn1 ---> rn2
  hn2 -.....->|....| r1
  h1 ---> r0

  an ----> rn1
  hn ----> an1
  an1 ----> rn2
  hn1 ----> an2
  an2 -......->|....| r1
  hn2 -......->|....| a1
  a1 ----> r0
  h1 ----> a0

  an ~~~~~ an1
  a1 ~~~~~ a0
Loading

Chainweb Graphs

At genesis the Kadena mainnet was using the Petersen graph as Chainweb graph:

petersen-graph

Since block height 852,054 (~2020-08-20 16:00:00) the Chainweb on the Kadena mainnet graph is the twenty chain graph:

twenty-chain-graph

The graphs have the following properties:

Petersen graph twenty chain graph
order 10 20
size 30 60
degree 3 3
diameter 2 3

Appendix: Chain Graph Sources

Petersen graph:

\documentclass[tikz,crop]{standalone}
\usetikzlibrary{math}
\usetikzlibrary{graphs.standard}
\begin{document}
\begin{tikzpicture}
  \graph [clockwise,n=5, nodes={draw, circle}] {
    { 0; 1; 2; 3; 4; } --
    subgraph C_n [name=a, V={5, 6, 7, 8, 9}];
    0 -- 2 -- 4 -- 1 -- 3 -- 0;
  };  
\end{tikzpicture}
\end{document}

Twenty chain graph:

\documentclass[tikz,crop]{standalone}
\usetikzlibrary{math}
\usetikzlibrary{graphs.standard}
\begin{document}
\begin{tikzpicture}
  \graph [clockwise,n=5, nodes={draw, circle}] {
    { 5; 6; 7; 8; 9; } --
    subgraph I_n [V={0,1,2,3,4}] --[matching]
    {
        subgraph I_n [phase=100, V={10,16,12,18,14}];
        % how can we draw edges here?
        subgraph I_n [phase=80, V={15,11,17,13,19}];
    };
    subgraph I_n [V={0,1,2,3,4}] --
    subgraph I_n [phase=80, V={15,11,17,13,19}];
    5 -- 7 -- 9 -- 6 -- 8 -- 5;
    10 -- 11 -- 12 -- 13 -- 14 --
    15 -- 16 -- 17 -- 18 -- 19 -- 10;  
  };  
\end{tikzpicture}
\end{document}