In order to create a secure blockchain consensus algorithm using disk space, a Proof of Space is scheme is necessary. This document describes a practical contruction of Proofs of Space, based on Beyond Hellman’s Time-Memory Trade-Offs with Applications to Proofs of Space [1]. We use the techniques laid out in that paper, extend it from 2 to 7 tables, and tweak it to make it efficient and secure, for use in the Chia Blockchain. The document is divided into three main sections: What (mathematical definition of a proof of space), How (how to implement proof of space), and Why (motivation and explanation of the construction) sections. The Beyond Hellman paper can be read first for more mathematical background.
- Chia Proof of Space Construction
-
$[X]$ denotes the set${0, 1, ..., X-1}$ -
$AES_{256}(c, K) : [2^{128}] \rightarrow [2^{128}]$ is a 14 (13+1) round AES256 encryption with a 256-bit key$K$ . -
$AES_{128}(c; K) : [2^{128}] \rightarrow [2^{128}]$ is a 2 (2+0) round AES128 encryption with a 128-bit key$K$ -
$e$ is the natural logarithm -
$\ll, \gg$ are bitwise left-shift and right-shift operators -
$%$ is the modulus operator $\text{divmod}(x, m) = (\lfloor \frac{x}{m} \rfloor, x % m)$ -
$\oplus$ denotes bitwise-XOR operation -
$x \mathbin{|} y$ denotes zero-padded concatenation: for the implied domain$y \in [2^z]$ , it is$(x \ll z) + y$ -
$\underset{b}{\text{trunc}}(x)$ = the first (most significant)$b$ bits of$x$ . If$x \in [2^z]$ is the implied domain, equal to$x \gg (z - b)$ . -
$x[a:b]$ If$x$ has implied domain$[2^z]$ , then this is the$a^{\text{th}}$ to$(b-1)^{\text{th}}$ bits of$x$ taken as a number, where the first (most significant) bit is considered the$0^{\text{th}}$ . Equal to$(x \gg (z - b))\text{ }%\text{ }(1 \ll (b-a))$ . For example,$(0b100100)[2:5]$ on implied domain$[2^6]$ is$0b010 = 2$ . Also, the default values are$a=0$ and$b=z$ if they are not specified.
For a given
${\Large\pi}{\text{Chall; plot_seed, k}} = x_1, x_2, ..., x{64}$ (with
$$ \begin{aligned} \mathbf{M}(f_1(x_1), f_1(x_2)), \mathbf{M}(f_1(x_3), f_1(x_4)), \dots &\small\text{ [32 matches: } (x_1, x_2), (x_3, x_4), (x_5, x_6), \cdots]\
\mathbf{M}(f_2(x_1, x_2), f_2(x_3, x_4)), \mathbf{M}(f_2(x_5, x_6), f_2(x_7, x_8)), \dots &\small\text{ [16 matches: } (x_1, \dots, x_4), (x_5, \dots, x_8), \cdots]\
\mathbf{M}(f_3(x_1, x_2, x_3, x_4), f_3(x_5, x_6, x_7, x_8)), \dots &\small\text{ [8 matches: } (x_1, \dots, x_8), (x_9, \dots, x_{16}), \cdots]\
\mathbf{M}(f_4(x_1, \dots, x_8), f_4(x_9, \dots, x_{16})), \dots &\small\text{ [4 matches: } (x_1, \dots, x_{16}), (x_{17}, \dots, x_{32}), \cdots]\
\mathbf{M}(f_5(x_1, \dots, x_{16}), f_5(x_{17}, \dots, x_{31})), \dots &\small\text{ [2 matches: } (x_1, \dots, x_{32}), (x_{33}, \dots, x_{64})]\ \mathbf{M}(f_6(x_1, \dots, x_{32}), f_6(x_{33}, \dots, x_{64})) &\small\text{ [1 match]}\ \underset{k}{\text{trunc}}\big(\text{Chall} \big) = \underset{k}{\text{trunc}}\big(f_7(x_1, x_2, \dots, x_{64})\big) &\ \end{aligned} $$
(All the
Here,
-
$\mathbf{M}(x, y)$ is a matching function$\mathbf{M} : [2^{k+param_EXT}] \times [2^{k+param_EXT}] \rightarrow {\text{True}, \text{False}}$ ; -
$f$ is a high level function that composes$\mathcal{A}$ with$\mathcal{C}$ ; -
$\mathcal{A'}$ and$\mathcal{A_t}$ are high level hash functions that call$AES_{256}$ or$AES_{128}$ on their inputs in some way, using the plot seed as a key -
$\mathcal{C}$ is a collation function$\mathcal{C}_t : \underset{(2^{t-2} \text{times})}{[2^k] \times \cdots \times [2^k]} \rightarrow [2^{bk}]$ for$b = \text{colla_size}(t)$ - Plot seed is an initial seed that determines the plot and proofs of space
The quality string (which can be used for blockchain consensus) for a valid proof ${\Large\pi}{\text{Chall; plot_seed, k}}$ is defined as
$x{2a+1} \mathbin{|} x_{2a+2}$ where
$$ \text{for s in } [1, 2, 3, 4, 5] \text{ for i in } [0, \dots, 2^{6-s}-1], \text{ in order} $$ $$ L = [x_{i2^{s}+1}, \dots, x_{i2^{s}+2^{s-1}}] , R= [x_{i*2^{s}+2^{s-1}+1}, \dots, x_{(i+1)*2^{s}}] $$
We have the following parameters, precalculated values, and functions: $$ \begin{aligned} \text{param_EXT} &= 5\ \text{param_M} &= 1 \ll \text{param_EXT} = 32\ \text{param_B} &= 60\ \text{param_C} &= 509\ \text{param_BC} &= \text{param_B} * \text{param_C}\ \text{param_c1} = \text{param_c2} &= 10000\ \text{bucket_id}(x) &= \big\lfloor\frac{x}{\small\text{param_BC}}\big\rfloor\ \text{b_id}(x), \text{c_id}(x) &= \text{divmod}(x % \text{param_BC}, \text{param_C}) \ \text{colla_size(t)} &= \begin{cases} 1 &\text{if } t = 2\ 2 &\text{if } t = 3 \text{ or } t = 7\ 3 &\text{if } t = 6\ 4 &\text{if } t = 4 \text{ or } t = 5\ \text{undefined} &\text{else} \end{cases} \end{aligned} $$
For convenience of notation, we have functions
$$ \begin{aligned}
f_t &: \underset{(2^{t-1} \text{times})}{[2^k] \times \cdots \times [2^k]} \rightarrow [2^{k + \small\text{param_EXT}}] \
f_1(x_1) &= \mathcal{A}'(x_1)\
f_2(x_1, x_2) &= \mathcal{A}_2(\mathcal{C}_2(x_1), \mathcal{C}_2(x_2)) \oplus f_1(x_1)\
f_3(x_1, x_2, x_3, x_4) &= \mathcal{A}_3(\mathcal{C}_3(x_1, x_2), \mathcal{C}_3(x_3, x_4)) \oplus f_2(x_1, x_2)\
f_4(x_1, \dots, x_8) &= \mathcal{A}_4(\mathcal{C}_4(x_1, \dots, x_4), \mathcal{C}_4(x_5, \dots, x_8)) \oplus f_3(x_1, \dots, x_4)\
f_5(x_1, \dots, x_{16}) &= \mathcal{A}_5(\mathcal{C}_5(x_1, \dots, x_8), \mathcal{C}5(x_9, \dots, x{16})) \oplus f_4(x_1, \dots, x_8)\
f_6(x_1, \dots, x_{32}) &= \mathcal{A}6(\mathcal{C}6(x_1, \dots, x{16}), \mathcal{C}6(x{17}, \dots, x{32})) \oplus f_5(x_1, \dots, x_{16})\
f_7(x_1, \dots, x_{64}) &= \mathcal{A}7(\mathcal{C}7(x_1, \dots, x{32}), \mathcal{C}7(x{33}, \dots, x{64})) \oplus f_6(x_1, \dots, x_{32})\ \end{aligned} $$
Then the matching function
\text{False} & \text{else} \end{cases} \end{aligned} $$
for
$$ \begin{aligned} \mathcal{A}_t(x, y) := { \begin{cases} \underset{k+\text{param_EXT}}{\text{trunc}}\big( A(x \mathbin{|} y) \big) &\text{if } 0 \leq \text{size} \leq 128 \
\underset{k+\text{param_EXT}}{\text{trunc}}\big( A(A(x) \oplus y) \big) &\text{if } 129 \leq \text{size} \leq 256 \
\underset{k+\text{param_EXT}}{\text{trunc}}\big( A(A(x_{high}) \oplus A(y_{high}) \oplus A(x_{low} \mathbin{|} y_{low})) \big) &\text{if } 257 \leq \text{size} \leq 384 \
\underset{k+\text{param_EXT}}{\text{trunc}}\big( A(A(A(x_{high}) \oplus x_{low}) \oplus A(y_{high}) \oplus y_{low}) \big) &\text{if } 385 \leq \text{size} \leq 512 \ \end{cases} } \end{aligned} $$
where $$ \begin{aligned} A(c) &= AES_{128}(c, \underset{128}{\text{trunc}}(plot_seed)) \ x_{high}, x_{low} &= \text{divmod(}x\text{, 1 }\ll\text{ 128)}\ y_{high}, y_{low} &= \text{divmod(}y\text{, 1 }\ll\text{ 128)}\ \text{size} &= 2 * k * \text{colla_size}(t)\ \end{aligned} $$
$$ \begin{aligned} \mathcal{C}{t}(x_1, ..., x{2^{t-2}}) := { \begin{cases} x_1 &\text{if } t = 2\
x_1 \mathbin{|} x_2 &\text{if } t = 3\
x_1 \mathbin{|} x_2 \mathbin{|} x_3 \mathbin{|} x_4 &\text{if } t = 4\
(x_1 \mathbin{|} \dots \mathbin{|} x_4) \oplus (x_5 \mathbin{|} \dots \mathbin{|} x_8) &\text{if } t = 5\
\underset{3k}{\text{trunc}}\big( (x_1 \mathbin{|} \dots \mathbin{|} x_4) \oplus (x_5 \mathbin{|} \dots \mathbin{|} x_8) \oplus (x_9 \mathbin{|} \dots \mathbin{|} x_{12}) \oplus (x_{13} \mathbin{|} \dots \mathbin{|} x_{16}) \big) &\text{if } t = 6\ = (x_1 \mathbin{|} x_2 \mathbin{|} x_3) \oplus (x_5 \mathbin{|} x_6 \mathbin{|} x_7) \oplus (x_9 \mathbin{|} x_{10} \mathbin{|} x_{11}) \oplus (x_{13} \mathbin{|} x_{14} \mathbin{|} x_{15}) \
\underset{2k}{\text{trunc}}\big( (x_1 \mathbin{|} \dots \mathbin{|} x_4) \oplus \cdots \oplus (x_{29} \mathbin{|} \dots \mathbin{|} x_{32}) \big) &\text{if } t = 7\
= (x_1 \mathbin{|} x_2) \oplus (x_5 \mathbin{|} x_6) \oplus \cdots \oplus (x_{29} \mathbin{|} x_{30}) \end{cases} } \end{aligned} $$
Proof of space is composed of three algorithms: plotting, proving and verication.
Plotting is our term for the method of writing data to disk such that we can quickly retrieve 1 or more proofs (if they exist) for a given challenge.
The plot refers to the contents of the disk. While proof retrieval must be fast, the plotting process can take time, as it is only done once, by the prover.
In the Chia Blockchain, these provers are referred to as farmers, since they create and maintain the plots.
That is, a farmer creates and efficiently stores data on disk, for a given plot seed, that allows them to find proofs
There are 7 tables, each with
For convenience, we will denote the new matching condition introduced at each table:
We will also refer to the total matching conditions of a table and all previous tables: $$ \widetilde{\mathbf{M}}t(x_1, \dots, x{2^t}) = \widetilde{\mathbf{M}}{t-1}(x_1, \dots, x{2^{t-1}}) \land \widetilde{\mathbf{M}}{t-1}(x{2^{t-1} + 1}, \dots, x_{2^{t}}) \land \mathbf{M}t(x_1, \dots, x{2^t}) $$
Tables conceptually are broken down into a sequence of entries that represent
We also refer to the left and right halves of
These entries can also be ordered, so that (relative to some ordering
$$ \text{Table}{t; \sigma} = E{{t, 0}}, E_{{t, 1}}, \cdots, E_{{t, N_t - 1}} $$
where
The final table,
For each entry, instead of writing the
For example, in the third table, each entry conceptually is
This makes it clear that (given we have stored all previous tables), storing
We have points (positions to the previous table)
We can exhibit a bijection from this space to a line
$$ \begin{aligned} (x, y) \in \text{Triangle}_t \Rightarrow& \frac{x(x-1)}{2} + y \in \text{Line}_t\
p \in \text{Line}_t \Rightarrow& \Big(x, p - \frac{x(x-1)}{2}\Big) \in \text{Triangle}_t, \text{where } x = \Big\lfloor \frac{\sqrt{8p + 1} + 1}{2} \Big\rfloor\
\end{aligned} $$
(Note: this data is not actually in a discrete uniform distribution (eg. no line points are repeated), though we will see that this has a negligible effect.)
We can compress these points in
Suppose we have a set of
We can sort these points and store only the deltas:
We can encode these deltas (from points in a discrete uniform distribution) using an Asymmetric Numeral System (ANS [4]) encoding, an encoding scheme that is essentially lossless and offers extremely fast compression and decompression speeds for fixed distributions. A requirement to do so is that we know the distribution of
In the previous discussion, the
After, we can write the stubs directly, and an encoding of the
We can also calculate the appropriate space to entries ratio parameter for the current table:
For each table
Our strategy is to group these entries into parks:
$$ \text{Park}i \text{ encodes } E{{t, i * \text{param_EPP}}}, E_{{t, i * \text{param_EPP} + 1}}, \dots, E_{{t, i * \text{param_EPP} + \text{param_EPP} - 1}} $$
Each park encodes information about its entries (
- The value of the first entry
$e_0 \in \text{Line}_t$ ; - The
$(\text{param_EPP} - 1)$ $\Delta$ 's:- The
$(\text{param_EPP} - 1)$ $\text{stub}$ 's, written directly [$k-2$ bits per stub]; - The
$(\text{param_EPP} - 1)$ $\delta$ 's, ANS encoded;
- The
Then, to find the
One wrinkle with this system is that it is highly desired that each park starts at a fixed location on disk. One simple way to achieve this is to allocate more than enough space to store the ANS encoding of the
We can bound the probability that the parks have sufficient space to store these entries, by calculating the mean and standard deviation of the information content in an entire park:
For eg.
More complicated systems of writing parks to disk are possible that offer minor compression improvements, based on recovering the endpoints of parks that have been coalesced. For a more thorough discussion, see Variable Sized Parks.
Retriving a proof of space for a challenge, as explained in Proving, requires efficienty looking up a final output
We can efficiently encode these deltas with a variable scheme like Huffman encoding. For simplicity, we allocate a fixed amount of space for encoding deltas for
However, every
Let
The final disk format is the following:
Table | Data | Disk Storage | Approximate N |
---|---|---|---|
Mapped to line point, deltafied, encoded, and parked | |||
Mapped to line point, deltafied, encoded, and parked | |||
Mapped to line point, deltafied, encoded, and parked | |||
Mapped to line point, deltafied, encoded, and parked | |||
Mapped to line point, deltafied, encoded, and parked | |||
Mapped to line point, deltafied, encoded, and parked | |||
Parked | |||
Byte aligned storage | |||
Byte aligned storage | |||
Deltafied, encoded, and parked |
The plotting algorithm
# 1 Forward propagation phase
For x in 0...2^k - 1:
Compute f1(x)
Write (f1(x), x) to table1
For table in 1..6:
Sort tablei by (fi(x), pos, offset). for table1, by f1(x)
For entry in tablei:
if entries L and R match:
Compute fi+1(CL, CR) for table1, M=x
C = Collation_i(CL, CR)
Store (fi+1, pos, offset, C) in tablei+1
# 2 Backpropagation phase
For table in 6..1:
Iterate through tablei and tablei+1:
Drop unused entries in tablei
sort_key = table == 7 ? fi : tablei+1pos
Rewrite used entries in tablei as (sort_key, pos, offset)
Rewrite entries in tablei+1 as (sort_key, pos, offset)
If i > 1:
Sort tablei by (pos, offset)
# 3 Compression phase
For table in 1..6:
Iterate through tablei and tablei+1:
Read sort_key, pos, offset from tablei+1
Read tablei entries in that pos and offset: eL, eR
y, x = sort(eL.newPos, eR.newPos)
line_point = x*(x-1)//2 + y
Rewrite tablei+1 entry as (line_point, sort_key)
Sort tablei+1 by line_point
For entry e, i in enumerate(tablei+1):
newPos = i
Rewrite e as (sort_key, newPos)
Write compressed e.line_point deltas to table Pi
Sort tablei+1 by sort_key
# 4 Checkpoints phase
For entries in table7:
Compress f7 entries into C1,C2,C3 tables
Write pos6 entries to table P7k
The purpose of the forward propagation phase is to compute all tables where the matching conditions are met, and to compute all final outputs
Then,
File
Table | Data | Disk storage |
---|---|---|
Sorted by f_1 | ||
Sorted by |
||
Sorted by |
||
Sorted by |
||
Sorted by |
||
Sorted by |
||
Sorted by |
Pointers in Phase 1 and 2 are stored in pos, offset format.
Pos offset format represents a match, by storing a
Duting phase 1, entries are close to their matches, so storing the offset is efficient.
This will change into double pointer format, which stores two
Note that after phase 1, the temporary file is sufficient to create proofs of space.
The purpose of the backpropagation step is to remove data which is not useful for finding proofs.
During this phase, all entries which do not form part of the matching condition in the next table, are dropped.
Positions in the next table are adjusted to account for the dropped entries.
The Collation outputs (
Implementation of this phase requires iterating through two tables at once, a left and a right table, checking which left entries are used in the right table, and dropping the ones that are not. Finally, a sort by position is required before looking at the right table, so it's possible to read a window into both tables at once, in a cache-local manner, without reading random locations on disk.
File
Table | Data | Disk storage |
---|---|---|
Sorted by f_1 | ||
Sorted by |
||
Sorted by |
||
Sorted by |
||
Sorted by |
||
Sorted by |
||
Sorted by |
The purpose of phase 3 is to convert from the (pos, offset) format, which requires tables to be sorted according to their
With the compressed format, two pointers can be stored in around
The goal for the double pointer format is to store two
As we iterate from
File
Table | Data | Disk storage |
---|---|---|
Sorted by f_1 | ||
Sorted by |
||
Sorted by |
||
Sorted by |
||
Sorted by |
||
Sorted by |
||
Sorted by |
File
The main bottlenecks in the reference implementation are searching for matches, and sorting on disk.
Phase 4 involved iterating through
Sort on disk is an external sorting algorithm that sorts large amounts of data with significantly less memory that the data size.
On large plots, this function quickly becomes the bottleneck of the plotting algorithm.
It works by recursively dividing data into smaller partitions, until there is enough memory to a partition in memory.
Data is grouped into partitions by the first
SortOnDisk(data, bits_begin):
If enough memory to sort data:
SortInMemory(data, bits_begin)
Else:
For entry in data:
Move entry to partition entry[bits_begin:bits_begin+4]
For partition in 0..15:
SortOnDisk(partition[i], bits_begin+4)
While partitioning data, everything must fit into memory, so we continuously extract from all 16 buckets into memory, and write back from memory into the target buckets.
Let
The algorithm works by constantly filling memory by reading a few entries from each bucket, then extracting from memory and writing into the correct disk location.
At any step, number of entries written back to bucket
Bstore is implemented as concatenated linked lists, each bucket having its own linked list. Moreover, the empty positions inside the memory form their own linked lists as well.
SortOnDisk(disk_begin, bits_begin, entry_len):
Sort in memory if possible, otherwise do this:
total = 0
For i in 0..15:
bucket_begins[i] = total
total = total + bucket_sizes[i]
disk_position[i] = disk_begin + bucket_begins[i] * entry_len
Read some entries starting with disk_position[i]
Foreach(read_entry):
Move the entry into a spare (temporary) zone of the file
consumed_per_bucket[i] += 1
While (spare zone has entries and bstore is not full):
Read one entry from the spare zone
Store the entry into bstore
While (bstore is not empty):
bucket_order = range(0, 15)
Sort bucket_order decreasingly by bucket sizes in bstore
For b in bucket_order:
to_extract = min(bstore.bucket_size(b),
consumed_per_bucket[b] - written_per_bucket[b])
Extract to_extract entries from bucket b of bstore
Write them on disk starting from (disk_begin+(bucket_begins[b]+written_per_bucket[b])*entry_len)
written_per_bucket[b] += to_extract
Sort bucket_order increasingly by consumed_per_bucket[i] - written_per_bucket[i]
While (bstore is not full):
Populate bstore from buckets, using the bucket_order ordering
If all buckets are fully consumed but bstore is not full, populate from the spare zone
For i in 0..15:
SortOnDisk(disk_position[i], bits_begin+4, entry_len)
Sorting in memory is done in two ways: quicksort (when the data is not random) or our custom SortInMemory algorithm (when the data is random).
The custom SortInMemory algorithm is similar to the SortOnDisk, as each entry is put into a bucket.
The main difference is we are using twice as many buckets as the number of entries, guaranteeing that the buckets will have a sparse number of entries, and collisions will slow down the algorithm.
Since we know that the data is evenly distributed, we know approximately where each entry should go, and therefore can sort in
Let
SortInMemory():
Let k be the smallest number such as 2^b >= 2 * num_entries.
Read entries in batches.
Foreach(entry):
Let bucket = entry[bits_begin:bits_begin+k]
While (bucket is not free in memory):
Let other_entry be the entry in memory from the position bucket.
Store in memory min(entry, other_entry) on position bucket.
entry = max(entry, other_entry).
bucket += 1
Store entry in position bucket in memory.
Iterate over memory and find occupied positions.
Retrieve entries from them and write them in batches.
The algorithm is designed with cache locality in mind, so that random disk lookups are not required, and a large amount of memory is not necessary. The sort on disk algorithm will perform better if more memory is allocated.
However, multiple passes on disk are required, both in the sorts, and in the other parts of the algorithm. On slow CPU's plotting can be CPU bound, especially since AES256 must be computed on O(2^k) values. On newer CPU's, AES instruction sets are available, which makes that part extremely fast. Still, comparisons, arithmetic, and memory allocation are required during plotting, which can cause the algorithm to be CPU bound. Multi-threading can be used to split up the work, as well as to separate CPU from disk I/O.
In addition, the usage of an SSD or RAM drive can significantly speed up the plotting process.
During backpropagation, each left table has
$$
p_5 = 1 - (\frac{(2^k-2)}{2^k})^{2^k}
$$
Therefore, the proportion of entries not dropped from
Assuming we encode each line point delta in exactly k bits, that the checkpoint tables use negligible space, and that Hellman attacks are not performed, the total size of the file
The total size of the temporary file
- 5*(k+param_EXT + (k+1) + param_offset_size) \
- (2 + 4 + 4 + 3 + 2) * k\
- 2k + k \ = 2^k*(30k + 6param_EXT + 5param_offset_size + 5) $$
For the proposed constants, the temporary file is around 6 times the size of the final file.
The proving process is the process by which a farmer, someone who stores a plot, efficiently retrieves proofs from disk, given a challenge.
Suppose a farmer Alice wants to prove to a verifier Bob that she is using 1TB of data.
Bob generates a random challenge, gives it to Alice, and if Alice can immediately create a valid proof of space for that challenge and the correct
In the blockchain use case, there is no interactivity, but the concept of fast proving for a random challenge still applies.
In the compression step, when two pointers to a previous table,
The proof retrieval algorithm takes in a plot file
- Initialization step: load
$k$ ,$table_{c2}$ into memory. -
$target \gets \underset{k}{\text{trunc}}(chall)$ . - Find the index of the final
$c_2$ entry where$target < c_2$ - Seek to
$table_{c_1}$ , based on the index from the previous step, and read$param_c2$ entries. - Find the index of the final
$c_1$ entry where$target < c_1$ - Read a park from
$c_3$ based on the index from the previous step - Find all indeces where
$f_7 = target$ - If there are no such indeces, return []
- For each proof, recursively follow all table pointers from
$table_7$ to$table_1$ and return 64$x$ values. - For each proof, reorder the proof from plot ordering to proof ordering
The proof retrieval algorithm takes in a plot file
- Perform the proof retrieval algorithm up to step 8
- Calculate
$qual_id \gets Chall%32$ - Follow table pointers from
$Table_6$ to$Table_1$ . For each bit of$qual_id$ , if the bit is 0, follow the smaller position, and if the bit is 1, follow the larger position of the pair. - Retrieve two
$x$ values from the$Table_1$ entry,$x_1$ and$x_2$ , where$x_1 < x_2$ . - Return
$x_1 \mathbin{|} x_2$ . Note that is is equivalent to the Proof Quality String.
The prover needs to be able to create proofs immediately, for blockchain consensus applications. For challenges that don't have any proofs of space, steps 2-7 in Proof Retrieval require only two disk lookups.
When proofs exist, only 6 additional disk lookups are required to find the quality, per proof. If the whole proof needs to be fetched, it will cost around 64 disk lookups. Using the quality allows us to avoid fetching the full proof, until a good quality is found.
Given a challenge
The motivation for Chia's proof of space construction, is for use as an alternative resource to Proof of Work, in a Namakoto-consensus style blockchain. Space miners (also referred to as farmers) allocate space on their hard drives, respond to challenges, and occasionally win rewards. Challenges are random 256 values that result from each block. In Chia's blockchain, this challenge is the unpredictable output of a Verifiable Delay Function that is required for each block.
In order for such a protocol to be secure against 51% attacks and other consensus attacks, it is clear that an attacker should not be able to create a proof of space for
Furthermore, the initialization phase (plotting), should be as efficient as possible, but farmers should not be able to perform this phase within the average block interval.
Finally, prover efficiency, verifier efficiency, proof size, noninteractivity of the initialization phase, and simplicity of honest operation, are all concerns that must be taken into account, and these are some reasons for a lot of the details of the construction, explained in depth below. We will build up from the simple ideas, up to the full definition of proof of space from section 1.
This construction by Abusalah, Alwen, Cohen, Khilko, Pietrzak and Reyzin[1] aims to create a simple and efficient proof of space scheme, compared to exists pebbling based PoSpace schemes.
One of the simplest ways to think about contructing a proof of space, is for the verifier to specify a random function
The problem with the above construction is that the verifier can perform a Hellman Attack (time space tradeoff), where only
The new idea in the paper, is to construct a function
The function that has to be inverted is
In order to implement such a proof of space, we can first compute the
While function
For example, for
Therefore
Even in the cases where the attacker can invert
The drawback of increasing the number of
Since increasing the number of
This can be improved by collating the left and right values, using some other function than just concatenation, as in collation. By limiting the output of the collation function to 4 values (and the domain of the
The collation function output size can be reduced to
Why the numbers
Using a very simple matching function, such as
Consider a plot stored in pos, offset format. Let's assume there is an island
This saves around
The number of islands in a random graph of
Although not random, the matching function below adds more "randomness" to matches, making the number of islands very close to the random graph.
Unfortunately, during backpropagation up to 20% of entries in a table are dropped, and thus it is no longer a graph of
Let's consider an underlying graph of a table: the digraph where each node is an entry, and an edge exists if that pair of entries is a match.
There are two main motivations for the matching function:
- Efficiency: All matches can be found efficiently. Generally this means that:
- the set of matches to search is quasilinear to the number of entries, and
- the matches can be made to be local to some region of the disk, so as to minimize disk seeks.
- No small-length cycles: In the underlying graph (considering each edge as undirected), 4-length cycles don't exist, as small length cycles can be compressed later (see: Cycles attack)
The first requirement can be fulfilled by only matching things close together under some ordering of entries.
The second requirement is fulfilled by requiring the underlying graph to be a subset of some overlay graph that doesn't have small-length cycles (as any subset of this graph will also not have small-length cycles).
Focusing on the second requirement, consider for parameters
for all
Consider a cycle in this graph of length
With a simple program (or by careful analysis), we can verify there are no 4-cycles:
from itertools import product
B, C, M = 60, 509, 32
for r1, r2, s1, s2 in product(range(M), repeat = 4):
if r1 != s1 and (r1,r2) != (s2,s1) and (r1-s1+r2-s2) % B == 0:
for p1, p2 in product(range(2), repeat = 2):
assert ((2*r1+p1)**2 - (2*s1+p1)**2 +
(2*r2+p2)**2 - (2*s2+p2)**2) % C != 0
The minimum plot size is important to prevent grinding attacks.
If
The maximum plot size is chosen at 59, a conservative maximum even accounting for future hard drive improvements, still allowing everything to be represented in 64 bits, and keeping k-truncated AES ciphertexts difficult to invert.
Call a tuple
Given a weakly distinct tuple
Furthermore, a tuple that is not weakly distinct cannot be a valid proof, since matches never occur for the same left and right halves (ie.
Thus, the expected number of proofs per challenge is
A random function
As written in A' function,
Using counter mode with multiple outputs saves CPU time, and also allows us to evaluate
Note that even though the key and plaintext are fixed and known,
For
Note that the inputs to
Since matches are computed sequentially, if an entry
Compare this with CBC, where the first time a bit changes, it affects the entire ciphertext, and therefore cached values can no longer be used. Since
A potential optimization in the reference code is to take advantage of this, and cache AES values where appropriate.
Finally,
The quality string
The reason why only two out of the 64 proof values are used, is that reading two
It is important that the choice of which
Furthermore, since the farmer does not know the challenge in advance, the qualities cannot be precomputed.
During the plotting process, entry data is compressed by reordering tuples, which means the ordering of the proof
The
Since proof of space plots are meant to be very large (due to the minimum
This is particularly important in the Forward Propagation phase, where pairs of entries
In the Compression Phase, positions to previous table entries are based on
The current implementation puts sections of empty space between parks, which is wasteful and can be used. In the following discussion, a "house" prepends each park and contains additional metadata for that park, and recall that houses in the original scheme contained the value of the first entry
In an alternate scheme that has "park alignment", we'll have houses also containing a fixed number of bits to store "park alignment", and parks are written sequentially, "skipping" over the allocated space of any house. This saves space as it mitigates the
When seeking to a park, we will read some extra bits (
The issue with this approach is that the maximum (absolute value) deviation of a random walk of N steps is roughly
One way to solve this approach is to have a variable number of entries per park. Specifically, suppose each park (except possibly the last) either has
Additionally on the side, we need “house hinting”. Suppose EPP = 2048, and we want the 2500th entry. We would guess that it is in the second park, since roughly speaking entries 1-2048 are in the first park, 2049-4096 are in the second park, and so on. Our idea is that whenever our guess would be wrong, we create a hint in memory that specifies an interval of entries for which we should modify our guess by some number. We can binary search this hint table before seeking to a park.
In reality, the hint table will be very small, as the size differences need to have an absolute value exceeding EPP for us to record a hint. As an implementation level detail, we actually need to read two adjacent parks worth of information to ensure our hint table is small - if we don’t, our hint table will be very big as every change of sign between the running sum of size differences will cause a new hint for entries near the borders of each park.
Considering the underlying graph of a table (see Matching Function Requirements), it's possible to write say, 3 of the associated entries corresponding to edges of a 4-cycle to disk, plus specify how to walk the edges of this cycle (possibly some edges walked in reverse) to find the last edge (entry).
Because there are no 4-cycles (or 5-cycles, because each edge changes the parity of the first coordinate), the minimum length cycle is a 6-cycle.
For every cycle on an island in the graph of a table it's possible to save one of the edges by specifying how to walk the edges to find the the pair instead of giving the counterpart directly. There are two types of cycles which occur: 4-cycles which happen when there's a bucket which has two things in it and has at least two things it connects to, and 6-cycles, which can happen with two buckets with two things in them being connected but are more common as happenstance through the graph as a whole. The cycles which depend on multiple things in the same bucket can be made much less common by increasing the degree of the graph (and correspondingly the B-group size) at the direct linear expense of CPU time. The length of other types of cycles can be increased by increasing the C-group size, at the expense of requiring correspondingly more cache space when plotting.
Hellman attacks, or time space tradeoffs, refer to techniques where certain checkpoints are stored, allowing reversal of random functions using only
The basic Hellman attack on
When looking up a quality or proof, we can evaluate
To invert, we store checkpoint chains as in Rainbow Tables[2], check which chain we're on, and evaluate it.
There can also be collisions and false positives, i.e we invert
If
A caveat of this optimization, is that it potentially requires a large amount of memory to set up the rainbow tables.
This is due to the fact that the algorithm needs the check whether each
Potentially, some tradeoffs can be performed to make the attack require less CPU and less memory, which slightly decrease the efficiency of the plot.
A fraction of the stored line points in
This optimization, like Hellman Attacks, relies on using CPU to minimize
During the Forward Propagation phase, the metadata is only needed to compute the next table. Once the next table is computed, the metadata can be dropped, to save disk space.
Furthermore, for each pair of tables during Forward Propagation, the left table entries which don't lead to any matches can be immediately dropped.
Although backpropagation drops any entry which doesn't lead to an
Both of these optimizations save working space, but increase the number of passes which must be performed, and thus increase plot size. For very large plots, where sort times matter more than number of passes, these optimizations may also yield faster plot times.
Most of the time when a sort on disk is completed the final pass in memory is written out to disk. If there is an operation which must be performed on the sorted table (such as during Forward Propagation), it is more efficient to perform this operation while sorting, to minimize disk seeks, as opposed to waiting until the entire table is sorted.
Our method does not take into account that every
To give a good approximation of the theoretical loss from not considering this information, consider a table of
Let
The information content of tables where every position is pointed to by at least one entry is
Most entries stored in the plot cannot be removed without without reducing the per-byte effectiveness of the plot. However, removing entries which lead to few or only one final solution might lead to an increase in the per-byte effectiveness. This will be a small number of entries, since the chances of finding multiple such entries that contribute to the same final output gets exponentially lower from
Some plots will by random noise have greater per-byte effectiveness. The amount of this is proportional to the square root of the size of the plot with a small constant factor so the potential gains aren't high. Furtermore, this requires evaluating the entire forward propagatio phase. If a farmer has multiple plots, and wants to delete one of them, the least effective one can be reclaimed first.
By using a more efficient encoding for storing deltas of
Plotting on an SSD or a ramdisk will provide massive performance improvements, especially for large plots, where sort on disk becomes the bottleneck.
Multithreading can provide significant performance improvements. The CPU usage of most phases is completely paralellizable, and disk I/O blocking can be minimized.
[1] Hamza Abusalah, Joel Alwen, Bram Cohen, Danylo Khilko, Krzysztof Pietrzak, Leonid Reyzin: Beyond Hellman’s Time-Memory Trade-Offs with Applications to Proofs of Space (2017) https://eprint.iacr.org/2017/893.pdf
[2] Philippe Oechslin: Making a Faster Cryptanalytic Time-Memory Trade-Of (2013) https://lasec.epfl.ch/pub/lasec/doc/Oech03.pdf
[3] Krzysztof Pietrzak, Bram Cohen: Chia Greenpaper (2019)
[4] Mart Simisker: Assymetrical Number Systems (2017) https://courses.cs.ut.ee/MTAT.07.022/2017_fall/uploads/Main/mart-report-f17.pdf