Skip to content

Chain Database Format

Lars Kuhtz edited this page Sep 3, 2021 · 19 revisions

The chain data, i.e. block headers and payloads, are stored in RocksDb.

Key format Summary:

<Database namespace> '-' (<Table name component> '/' <Table name component> ...) '$' <Binary key data>

For production databases the namespace is the empty string "".

Databases and Tables

Database

It is possible to store more than one chain database in a single RocksDb. This is used during testing, but may also become useful for implementing database migrations and other advanced features. Each chain database uses a unique database namespace, which is implemented as a key prefix that is separated by the character - from the remainder of the key.

Database namespace must not contain the character -.

Production Chainweb chain databases use the empty namespace "".

Keys spaces of different chain databases are terminated by the key <NAMESPACE>.

This means that for all keys k of a chain database with namespace s it holds that

"s-" <= k < "s."

Tables

Each chain database contains a number of tables, that are identified by a list of table name components.

Table name components must NOT contain any of the characters $, %, and /. A user error is raised if the namespace contains any of these characters.

For keys k in database namespace s and a table with name components ns it holds that

("s-" + intercalate "/" ns + "$") <= k < ("s-" + intercalate "/" ns + "%")

Chainweb Chain Database

Block Header Table

For each chain c there is table with table name components ["BlockHeader", c, "header"] that maps a tuple of block height and block hash to the corresponding block header for the respective chain.

Keys are binary encoded as height + hash, where height is the block height encoded as big endian unsigned 64 bit word. The hash is the binary encoded the block hash.

(Usually, little endian encoding is used throughout chainweb. Here big endian encoding is used in order to preserve lexicographic ordering of keys.)

Values are binary encoded block headers.

Block Rank Table

For each chain c there is a table with table name components ["BlockHeader", c, "rank"]. that maps the block hash to the corresponding block height for the respective chain.

Keys are the binary encoded hash.

Values are block heights encoded as little endian unsigned 64 bit words as described in binary encoded block headers.

Payload Tables

Payloads are stored in a normalize format in the following tables. * marks the primary key.

  • ["BlockPayload"]: *BlockPayloadHash, BlockTransactionsHash, BlockOutputsHash
  • ["BlockTransactions"]: *BlockTransactionsHash, Vector Transactions, MinerData
  • ["BlockOutputs"]: *BlockOutputsHash, Vector TransactionOutput, CoinbaseOutput
  • ["TransactionTree"]: *BlockTransactionsHash, MerkleTree
  • ["OutputTree"]: *BlockOutputsHash, MerkleTree

BlockPayloadHash are unique up to ChainId and BlockHeight. I.e. if two different blocks have they some BlockPayloadHash, these blocks have the same ChainId and BlockHeight, i.e. the at least one of the blocks is an orphan. The only exception are Genesis blocks which can share 'BlockPayloadHashes` across chains.

The BlockOutputsHash is unique for the respective BlockPayloadHash. BlockTransactionsHash is not unique, i.e. different BlockPayload entries can share a BlockTransactionsHash.

Keys are the binary encoded Merkel Root Hashes of the respective payload component. Values are JSON structures with the respective Merkel hashes or base64 payload components as provided by Pact. The database makes no assumption about the content of the transactions, outputs, and other payload components provided by Pact. Details can be found in the module Chainweb.Payload.

The format is chosen to support to support creation of certain SPV proofs even without reevaluating the outputs.

Block outputs are are not required for mining and consensus. They currently only used by the payloads/outputs REST endpoint and for off-chain SPV proof construction via the respective Pact REST endpoint. A node that isn't used with endpoints can safely delete old outputs (and also the outputs trees).

CutHashes Table

The ["CutHashes"] table stores the all validated cut hashes in json format. At node startup the node traverses stored cut hashes starting from the most recent entry until it finds a cut hash for which it can load all dependencies.

Key: height + weight + cutId, where height is the cut height encoded as big endian unsigned 64bit integer, weight is the weight of the cut encoded as big endian unsigned 256bit integer, and cutId is encoded as plain 32 byte value (it is a SHA512_256 hash of the block hashes that make up the cut).

Value: JSON encoded CutHashes structure as in the following example:

{
  "origin": null,
  "height": 36382508,
  "weight": "HJfMUHbV72UuGwAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
  "hashes": {
    "0": {
      "height": 1819125,
      "hash": "_aPVKsLYVSmwnjzyEBRBlKZZj1_x4LHqy-xqyrTG-QQ"
    },
    "1": {
      "height": 1819126,
      "hash": "7oTGD6hx0PmZm7sl4I_OrDTf-cfTfeL_fd63b-P_MAU"
    },
    "2": {
      "height": 1819126,
      "hash": "z_ELoH9E2WemIu7BN9xFzGIP7N8O1Z4KkHc8b4YczUg"
    },
    "3": {
      "height": 1819126,
      "hash": "6ltOuuZllJR-68t9W4DJM_Iwod2cIS2jq7Vv3gRudMg"
    },
    "4": {
      "height": 1819126,
      "hash": "iacRP4rVnas8CgCt6ThNjjyDvqwJp7eA4mJUvBf53JA"
    },
    "5": {
      "height": 1819126,
      "hash": "G3awWoOoVd96wX29GMqjO9utURPu0ieHuZfl4UKT71o"
    },
    "6": {
      "height": 1819126,
      "hash": "DKGzpcmc5TmYhA_diHnfuXw4HDETSDT-ClANZY-92W8"
    },
    "7": {
      "height": 1819125,
      "hash": "Kl6_NQgzFwUuZbxNSPSMSXRFopM0w1cRtfxn-k48aN0"
    },
    "8": {
      "height": 1819126,
      "hash": "6m9PZXibgzcgh487-zUIBmqoTBjMZLdYiEH0ty_mP80"
    },
    "9": {
      "height": 1819125,
      "hash": "Z9AwUQVEATpMiumyHzFtrC8R4zzmtFx9IzH4UmLph98"
    },
    "10": {
      "height": 1819125,
      "hash": "17zaXOjddEwDmPt2d5EEJ_erKyfZJE0oUy81iX2Dcdw"
    },
    "11": {
      "height": 1819125,
      "hash": "LgpL2YOjSdqWjIpeli-HNuZNLN7xRF3cdz0sARzKAqI"
    },
    "12": {
      "height": 1819126,
      "hash": "VcsvP0G7gO5qtcGkNy5WlvuKykwVnQe6eT0oPxU5v6w"
    },
    "13": {
      "height": 1819125,
      "hash": "Oo64d1aycceDerlJCpCffSYmgOGZTFmZLWX3frRhhyY"
    },
    "14": {
      "height": 1819125,
      "hash": "oMy8RThSV1bhANfBh7b9gGr44BpPYUNvTn3wVytbCBo"
    },
    "15": {
      "height": 1819124,
      "hash": "08rT1NsUT9ALJA9yJp41oNnc8-uwrNDVVCwHU0vPyFY"
    },
    "16": {
      "height": 1819125,
      "hash": "jxtrwB8CuooOrTs7t0vL-72AFc8nmXZU0kczpRZiHPQ"
    },
    "17": {
      "height": 1819125,
      "hash": "h2I7py0rXFY0yQSuikJMuiaiC15GcFRJ30_I2MZtWjc"
    },
    "18": {
      "height": 1819126,
      "hash": "wUns6xvrWx7qe0a3k0kr-Nfm_76Eb5_tzRo5yCFHmXw"
    },
    "19": {
      "height": 1819125,
      "hash": "m8vofpy2xyTcyxpVQPlGxpkJk0Ou68N0ffSw5afTKmM"
    }
  },
  "id": "ONT6DXvCQ9c4QyHJfY-PArUJUfV_Z8GGInQibu1FMyA",
  "instance": "mainnet01"
}

CutHashes can take a large portion of the storage space of a chainweb chain database. Because they are only used to resolve possible forks at startup, it is usually safe to deleted cuts once they reached a depth in the chain history, such that chain reorgs aren't an issue any more. Deleting all cuts up to a couple weeks from the current time can reduce the overall size of the RocksDb by more than 30%.

Recipes

Dump all binary hex encoded block headers for all chains from the RocksDB:

l=$(echo -n "0x" ; echo -n "-BlockHeader/" | xxd -p)
u=$(echo -n "0x" ; echo "-BlockHeader~" | xxd -p -g0)
rocksdb_ldb --db=../chainweb-master/tmp/mainnet/db/rocksDb dump --hex --from=$l --to=$u | cut -d ' ' -f 3 | grep -v '^$'

Extract some fields in csv format from the block header (see here for the offsets):

# This example extracts the chain, block height, and payload hash
#
# (requires gnu cut, if on MacOS install gnu-utils with brew and use gcut)
#
function extract_hdr () { cut --output-delimiter=',' -c $((222*2+3))-$((225*2+4)),$((258*2+3))-$((265*2+4)),$((190*2+3))-$((221*2+4)) ; }

Dump BlockPayload table in csv format from the RocksDB:

l=$(echo -n "0x"; echo -n "-BlockPayload$" | xxd -p)
u=$(echo -n "0x"; echo -n "-BlockPayload%" | xxd -p)
rocksdb_ldb --db=db/0/rocksDb dump --key_hex --from=$l --to=$u | cut -d ' ' -f 3 | jq -r 'to_entries | [ .[] | .value ] | @csv'

on MacOS X one may have to increase the ulimit via ulimit -n 10000 in order to use rocksdb_ldb.

Useful jq functions

paste the following into ~/.jq

# ############################################################################ #
# Base64

def base64dUrl: .
    | gsub("(?<x>[-_])"; if .x == "-" then "+" else "/" end)
    | "\(.)==="
    | @base64d
    ;

def fromjson64: .
    | base64dUrl
    | fromjson
    ;

# ############################################################################ #
# Chainweb Debugging tools

# Parse Payload With Outputs
def pWo: .
    | .minerData = (.minerData | fromjson64)
    | .coinbase = (.coinbase | fromjson64)
    | .transactions =
        [ .transactions[]
        |
            { tx: (.[0] | fromjson64 | .cmd = (.cmd | fromjson))
            , result: (.[1] | fromjson64)
            }
        ]
    ;

# Extra Transfer Events
# (Note that Pact events are avaiable only for block heights >= 1138000)
def transfers: .
    | .transactions[]
    | .tx.cmd.meta.creationTime as $creationTime
    | .tx.cmd.meta.chainId as $chainId
    | .tx.hash as $requestKey
    | .result.events[]?
    | select(.name == "TRANSFER")
    |
        { sender: .params[0]
        , receiver: .params[1]
        , amount: .params[2]
        , module: .module
        , creationTime: $creationTime
        , requestKey: $requestKey
        , chainId: $chainId
        }
    ;

Examples

Kets of first three cuts in the database:

u=$(echo -n "0x"; echo -n "-CutHashes%" | xxd -p)
l=$(echo -n "0x"; echo -n "-CutHashes$" | xxd -p)
ldb --db=/db/0/rocksDb scan --from=$l --to=$u --max_keys=3 --no_value --key_hex | 
cut -c 26-40

Number of bytes that is used for storing cut hashes

u=$(echo -n "0x"; echo -n "-CutHashes%" | xxd -p)
l=$(echo -n "0x"; echo -n "-CutHashes$" | xxd -p)
ldb --db=/db/0/rocksDb approxsize --from=$l --to=$u --key_hex

Key of a testnet cut from 10 days ago:

u=$(echo -n "0x$(echo -n "-CutHashes$" | xxd -p)$(printf '%016x' $(($(curl -sk "https://34.89.134.90:1789/chainweb/0.0/testnet04/cut" | jq ".height") - 28800)))")