Append-Only Merkle tree with O(1) append gas cost.
⚠️ Please DO NOT use this code in production level yet. This code is not audited.
I implemented an append-only merkle tree data structure with
- Appending new data with
$O(1)$ writes.- always 2-3
SSTORE
s no matter how many data it has (1 non-zero slot, 1-2 zero slots)
- always 2-3
- Querying merkle root with
$O(log{n})$ - Less efficient than original merkle tree but it's reasonable complexity.
- Even though it has
$O(logn)$ complexity, you can reuse it with$O(1)$ if caching. - You don't need to save merkle root histories. You can calculate them anytime without saving them.
- Auto-Incremental level (depth) of merkle tree. (pre-defined depth not required)
- It is also possible to modify, but it needs about
$O(log{n})$ SSTORE
s same as original merkle tree.
I call it as CMT(Confirmed Merkle Tree) for convenience. I think it is useful for maintaining merkle trees especially in EVM based smart contracts.
I implemented CMT by solidity for testing. I made a merkle tree with dummy data and test its append and root query performance. It gets average gas costs by appending and querying 128 times. I use keccak256 algorithm but, of course, using other hash would work well.
Please do not use production level yet because it's not audited.
dummy size | average gas fee |
---|---|
2^16 - 1 | 79804 |
2^32 - 1 | 79804 |
2^64 - 1 | 79804 |
As you can see, append costs small gas fee no matter how many data it has. it uses 1-2 zero slot SSTORE
and
1 non-zero slot SSTORE
.
dummy size | average gas fee |
---|---|
2^8 - 1 | 92160 |
2^16 - 1 | 146426 |
2^32 - 1 | 281750 |
2^64 - 1 | 569950 |
You can see it costs incrementally with
My implementation is not optimized. It calls query function by recursive. I think there are additional optimizations by reducing memory and stack size.
Merkle trees are widely used for data validation in smart contracts. It could be achieved with low gas fee because we just put only merkle roots to smart contracts.
In some cases, the reliability of merkle root is also important. Especially, contracts which are using ZKP(Zero-Knowledge Proof) (ex. zk-SNARKs), they should maintain merkle trees. However, to maintain all merkle tree nodes in smart contract, it requires O(log n) for every appendation and it's expensive in EVM.
So, I've done deep dive into how to maintain merkle trees efficiently in smart contract. Not only in smart contracts but in other systems (such as centralized servers), it is useful if there are many appending.
The total number of nodes in merkle tree is
If node has two child nodes, we can calculate its value with
Every append, we should insert merkle tree nodes with constraints below.
- CMT inserts only "confirmable node".
- CMT inserts maximum two "confirmable nodes" per an append.
There is a problem : where to insert nodes.
We should insert one or two confirmed nodes for every append. There is a trivial node that can be confirmed for every append: it's leaf node. We should insert one leaf node for every append first.
For example, assume that there is a merkle tree such like that.
Let's insert new node. We should insert third node in level 0 (=leaf node level).
Next, we should confirm other node. (second constraint: insert two nodes per an append) But we can't insert anymore since there is no node that can be confirmed (first constraint: only inserting node can be confirmed). So, in this case, finish appendation.
Let's see other example.
In this case, there are two node that can be confirmed. So, we can insert just two nodes. After inserting, root node can be also confirmed but we don't confirm it because of second constraint.
How about this example?
In this case, there are three confirmable nodes. By second constraint, we should insert two of them. Because we have to insert leaf node for every append, we should select one of two nodes (excluding leaf node) to insert.
So, we have to make a rule for getting confirmable node index to insert. I suggest a good formula working well.
for level
The result of
By this, you can maintain merkle tree with O(1) append and O(log n) querying merkle tree.
We can know when the node inserted from formula.
If
We can get
So,
Therefore,
So, it means the node
When we insert n-th data to CMT, we also insert
First, assume that there exists
The
So, if
Now, let's prove that
If
If
Assume that
Assume that
By mathematical induction, when leaf node
Because we only insert confirmable nodes, there are some unconfirmed nodes in CMT. When
Therefore, it requires
Fortunately, in most systems (including ethereum smart contracts), loading from storage is cheaper than writing. So,
it's more efficient. Besides, merkle root can be cached! So, if we store it when we read once, we can get it with
Also, there are some optimizations for querying merkle tree.
- we can pre-calculate null-value (
$\phi$ ) hash list.
Also, we can check whether all leaf nodes from node
- We don't have to read unconfirmed nodes for checking it's unconfirmed since we know it by
$\mathbf{A}$ in proof. By this, we can reduce the number of read operations.
Because we save only confirmed node, you can get any merkle root history even if you don't save them.
As we can see
Even though you can get any merkle root histories, you can cache them for every append by getting them with
Though, it's more efficient if read operation is much cheaper than write operation
(