diff --git a/assets/img/5-Substrate/dev-4-8-qr-radix-tree-visualization.png b/assets/img/5-Substrate/dev-4-8-qr-radix-tree-visualization.png new file mode 100644 index 000000000..fd4bcfea2 Binary files /dev/null and b/assets/img/5-Substrate/dev-4-8-qr-radix-tree-visualization.png differ diff --git a/assets/img/5-Substrate/dev-storage-externalities-full.svg b/assets/img/5-Substrate/dev-storage-externalities-full.svg new file mode 100644 index 000000000..770b5998d --- /dev/null +++ b/assets/img/5-Substrate/dev-storage-externalities-full.svg @@ -0,0 +1,320 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/assets/img/5-Substrate/dev-trie-backend-proof.svg b/assets/img/5-Substrate/dev-trie-backend-proof.svg index 5ad63fa43..07c985452 100644 --- a/assets/img/5-Substrate/dev-trie-backend-proof.svg +++ b/assets/img/5-Substrate/dev-trie-backend-proof.svg @@ -1 +1 @@ - \ No newline at end of file + \ No newline at end of file diff --git a/syllabus/5-Substrate/1-Intro-to-Substrate_Slides.md b/syllabus/5-Substrate/1-Intro-to-Substrate_Slides.md index 089ef918e..99a8f2b17 100644 --- a/syllabus/5-Substrate/1-Intro-to-Substrate_Slides.md +++ b/syllabus/5-Substrate/1-Intro-to-Substrate_Slides.md @@ -350,9 +350,11 @@ _The way to make a protocol truly upgradeable is to design a meta-protocol._ Note: + In this figure, the meta-protocol, the substrate node, is not forklessly upgrade-able. It can only be upgraded with a fork. + ---v ### 🏦 Governance + Upgradeability diff --git a/syllabus/5-Substrate/2-WASM-Meta-Protocol-Slides.md b/syllabus/5-Substrate/2-Wasm-Meta-Protocol-Slides.md similarity index 100% rename from syllabus/5-Substrate/2-WASM-Meta-Protocol-Slides.md rename to syllabus/5-Substrate/2-Wasm-Meta-Protocol-Slides.md diff --git a/syllabus/5-Substrate/3-Merklized-Storage_Slides.md b/syllabus/5-Substrate/3-Merklized-Storage_Slides.md index d97dbbfa3..2eaa3f689 100644 --- a/syllabus/5-Substrate/3-Merklized-Storage_Slides.md +++ b/syllabus/5-Substrate/3-Merklized-Storage_Slides.md @@ -1,6 +1,6 @@ --- title: Substrate Merklized Storage -duration: 60mins +duration: 90mins --- # Substrate Storage @@ -11,51 +11,64 @@ duration: 60mins ----v - -### What We Know So Far - -- Recall that at the `sp_io` layer, you have **opaque keys and values**. +Notes: -- `sp_io::storage::get(vec![8, 2])`; - - `vec![8, 2]` is a "storage key". -- `sp_io::storage::set(vec![2, 9], vec![42, 33])`; +- Runtime interacts with Client/Host using Host functions. +- sp_io::storage helps with saving runtime state in the client using these host functions. ---v -### What We Know So Far - -Nomenclature (with some simplification): +#### Externalities -> Environment providing host functions, namely storage ones: "`Externalities` Environment". +> Externalities: An environment in which the runtime can access host functions, namely storage ones. Notes: - In Substrate, a type needs to provide the environment in which host functions are provided, and can be executed. -- We call this an "externality environment", represented by [`trait Externalities`](https://paritytech.github.io/substrate/master/sp_externalities/trait.Externalities.html). +- We call this an "externality environment", represented + by [`trait Externalities`](https://paritytech.github.io/substrate/master/sp_externalities/trait.Externalities.html). - By convention, an externality has a "**backend**" that is in charge of dealing with storage. +- Externality is a trait that provides functionality to interact with storage and other extensions registered in the + node. ---v ### What We Know So Far + + +---v + +### What We Know So Far + +- Recall that at the `sp_io` layer, you have **opaque keys and values**. + ```rust -sp_io::TestExternalities::new_empty().execute_with(|| { - sp_io::storage::get(..); -}); + let storage_key = vec![8, 2]; + sp_io::storage::get(storage_key); + sp_io::storage::set(storage_key, vec![42, 33]); + ``` ---v ### What We Know So Far - +```rust + sp_io::TestExternalities::new_empty().execute_with(|| { + sp_io::storage::get(..); + }); +``` + +Notes: + +- TestExternalities mimic a client. --- ## Key Value -- How about a key-value storage externality? why not? 🙈 +> All this seems to indicate our storage externality is a simple key value database. ---v @@ -67,22 +80,28 @@ sp_io::TestExternalities::new_empty().execute_with(|| { ### Key Value -- "_Storage keys_" (whatever you pass to `sp_io::storage`) directly maps to "_database keys_". +- Concatenate all data and hash to get the root. - O(1) read and write. -- Hash all the data to get the root. + +> Spoiler, that is not how data is stored internally in the database. + Notes: -Good time to hammer down what you mean by storage key and what you mean by database key. +- "_Storage keys_" (whatever you pass to `sp_io::storage`) directly maps to "_database keys_". +- Probably don't wanna introduce storage key and db key right now. + Good time to hammer down what you mean by storage key and what you mean by database key. literally imagine that in the implementation of `sp_io::storage::set`, we write it to a key-value database. ---v -### Key Value +### Key Value: Proof sizes -- If alice only has this root, how can I prove to her how much balance she has? +- Suppose there is a large database. +- Alice has the state root of this database, wants to lookup her balance from this database. +- How can Alice verify the balance she receives from a full node is correct? SEND HER THE WHOLE DATABASE 😱. @@ -94,39 +113,37 @@ Alice is representing a light client, I represent a full node. ---v -### Key Value +### Key Value: State Root -- Moreover, if you change a single key-value, we need to re-hash the whole thing again to get the updated state root 🤦. +- If you change a single key-value, we need to re-hash the whole thing again to get the updated state root 🤦. --- ## Substrate Storage: Merklized -- This brings us again to why blockchain based systems tend to "merkelize" their storage. - ----v +> Substrate uses a base-16, (patricia) radix merkle tree. -### Merklized +Notes: -> Substrate uses a base-16, (patricia) radix merkle trie. +- Find the code at [paritytech/trie](https://github.com/paritytech/trie). ---v +### Recap + %%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true }}}%% flowchart TD - A["A \n value: Hash(B|C)"] --> B["B \n value: Hash(B|E)"] + A["A \n value: Hash(B|C)"] --> B["B \n value: Hash(D|E)"] A --> C["C \n value: Hash(F) \n"] B --> D["D \n value: 0x12"] B --> E["E \n value: 0x23"] C --> F["F \n value: 0x34"] - - @@ -138,36 +155,36 @@ flowchart TD ---v +### Recap + %%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true }}}%% flowchart TD - A --b--> C["C \n Hash(F) \n"] - A["A \n value: Hash(B|C)"] -- a --> B["B \n value: Hash(B|E)"] - B --c--> D["D \n value: 0x12"] - B --d--> E["E \n value: 0x23"] - C --e--> F["F \n value: 0x34"] + A["A \n value: Hash(B|C)"] -- v --> B["B \n value: Hash(D|E)"] + A --w--> C["C \n Hash(F) \n"] + B --"x"--> D["D \n value: 0x12"] + B --y--> E["E \n value: 0x23"] + C --z--> F["F \n value: 0x34"] - - -- Trie. +- Trie - Assuming only leafs have data, this is encoding: - + - + - +
"ac" => 0x12 "vx" => 0x12
"ad" => 0x23 "vy" => 0x23
"be" => 0x34 "wz" => 0x34
@@ -176,24 +193,25 @@ flowchart TD Notes: -this is how we encode key value based data in a trie. +- this is how we encode key value based data in a trie. +- Optimization of simple trie, ---v +### Recap + %%{init: {'theme': 'dark', 'themeVariables': { 'darkMode': true }}}%% flowchart TD - A["A \n Hash(B|C)"] -- a --> B["B \n Hash(B|E)"] - A --be--> F["F \n value: 0x34"] - B --c--> D["D \n value: 0x12"] - B --d--> E["E \n value: 0x23"] + A["A \n Hash(B|C)"] -- v --> B["B \n Hash(D|E)"] + A --wz--> F["F \n value: 0x34"] + B --"x"--> D["D \n value: 0x12"] + B --y--> E["E \n value: 0x23"] - - @@ -202,13 +220,13 @@ flowchart TD - + - + - +
"ac" => 0x1234 "vx" => 0x12
"ad" => 0x1234 "vy" => 0x23
"be" => 0x1234 "wz" => 0x34
@@ -217,7 +235,7 @@ flowchart TD Notes: -more resources: +More resources: - https://en.wikipedia.org/wiki/Merkle_tree - https://en.wikipedia.org/wiki/Radix_tree @@ -230,10 +248,12 @@ Namely: > Donald Knuth, pages 498-500 in Volume III of The Art of Computer Programming, calls these > "Patricia's trees", presumably after the acronym in the title of Morrison's paper: "PATRICIA - -> Practical Algorithm to Retrieve Information Coded in Alphanumeric". Today, Patricia tries are seen +> Practical Algorithm to Retrieve Information Coded in Alphanumeric". +> Today, Patricia tries are seen > as radix trees with radix equals 2, which means that each bit of the key is compared individually > and each node is a two-way (i.e., left versus right) branch. -> ---v + +---v ### Merklized @@ -243,10 +263,9 @@ Namely: ### Merklized -- Substrate does in fact use a key-value based database under the hood.. -- In order to store trie nodes, not direct storage keys! - -
+- Substrate does in fact use a key-value based database under the hood. +- But this KV based DB is used to store the trie nodes, not directly the storage keys. +
@@ -270,9 +289,9 @@ Namely: Notes: -imagine: +- imagine: `sp_io::storage::get(b"ad")` +- We will traverse the path later. -sp_io::storage::get(b"ad") ---v @@ -287,20 +306,31 @@ simplification. --- -## Trie Walking Example +## Traversing the Trie - We know the state-root at a given block `n`. -- assume this is a base-26, patricia trie. English alphabet is the key-scope. +- Assume this is a base-27, patricia trie. + English alphabet along with '_' is the key-scope. - Let's see the steps needed to read `balances_alice` from the storage. ---v +Notes: + +- We start with the state root node. +- Read its children. + ---v +Notes: + +- We are interested in "balances_" so we read that node from database. +- Did you notice the mistake in the slide? "_" technically would not be allowed in base-26, so it really is base-27. + ---v @@ -313,11 +343,21 @@ simplification. +---v + +## Q/A Break + + + +Try inserting (and deleting) bunch of keys and see how you fill up the trie in +the [radix tree visualization](https://www.cs.usfca.edu/~galles/visualization/RadixTree.html). + --- ## Merklized: Proofs -- If alice only has this root, how can I prove to her how much balance she has? +Back to our question +> If alice only has this state root, how can she verify her balance is correct? ---v @@ -325,21 +365,22 @@ simplification. Notes: -The important point is that for example the whole data under `_system` is not hidden away behind one hash. +Give 30 seconds to students to make sense of the image by themselves. -Dark blue are the proof, light blue's hashes are present. +The important point is that for example the whole data under `_system` is hidden away behind one hash. Receiver will hash the root node, and check it against a publicly known storage root. This differs slightly from how actual proof generation might work in the code. -In general, you have a tradeoff: send more data, but require less hashing on Alice, or opposite (this is what we call "compact proof"). +In general, you have a tradeoff: send more data, but require less hashing on Alice, or opposite (this is what we call " +compact proof"). ---v ### Merklized: Proofs -- 🏆 Small proof size is a big win for light clients, _and_ **Polkadot**. +- 🏆 Small proof size is a big win for light clients. --- @@ -347,7 +388,7 @@ In general, you have a tradeoff: send more data, but require less hashing on Ali
-- Storage key (whatever you pass to `sp_io`) is the path on the trie. +- Storage key (`balances_alice`) is the path on the trie.
@@ -355,13 +396,14 @@ In general, you have a tradeoff: send more data, but require less hashing on Ali
- Storage key is arbitrary length. +
- Intermediary (branch) nodes could contain values. - - `:code` contains some value, `:code:more` can also contain value. + - `:code` contains some value, `:code:more` can also contain value.
@@ -384,64 +426,54 @@ it will be `O(LOG_n)`. --- -## Base 2, Base 16, Base-26? - -- Instead of alphabet, we use the base-16 representation of everything. - -> Base-16 (Patricia) Merkle Trie. - -- `System` -> `73797374656d` -- `:code` -> `3a636f646500` - ----v - -### Base 2, Base 16, Base-26? - - - -Tradeoff: "_IO count vs. Node size_" - - - -Between a light clint and a full node, which one cares more about which? +## Substrate Storage: The Updated Picture - + -Notes: +--- -TODO: update figure to represent node size. +## Large data nodes 🤔 -Light client cares about node size. When proof is being sent, there is no IO. +- Two common problems that merkle proofs have: + - If the one of the parent nodes has some large data. + - If you want to prove the deletion/non-existence of a leaf node. -First glance, the radix-8 seems better: you will typically have less DB access to reach a key. -For example, with binary, with 3 IO, we can reach only 8 items, but with radix-8 512. +---v -So why should not chose a very wide tree? because the wider you make the tree, the bigger each node -gets, because it has to store more hashes. At some point, this start to screw with both the proof -size and the cost of reading/writing/encoding/decoding all these nodes. + ---v -### Base 2, Base 16, Base-26? +## Large data nodes 🤔 - +New "trie format" 🌈: -Note: +- All data containing more than 32 bytes are replaced with their hash (pointer to the actual value). +- The (larger than 32 bytes) value itself stored in the database under this hash. -Here's a different way to represent it; the nodes are bigger on the base-16 trie. +```rust +struct RuntimeVersion { + ... + state_version: 0, +} +``` ----v + -### Base 2, Base 16, Base-26? +---v -- base-2: Small proofs, more nodes. -- base-8: Bigger proofs, less nodes. + + -✅ 16 has been benchmarked and studies years ago as a good middle-ground. +What is the ramification of this for full nodes, and light clients? Notes: -Anyone interested in blockchain and research stuff should look into this. +Both read and write have an extra step now, but proof are easier. + +Note from emeric: the green node is not really a "real" node, it is just `{ value: BIG_STUFF }` stored in the database. +I will skip this detail for the sake of simplicity. +One can assume that the green node is like any other node in the trie. --- @@ -453,7 +485,8 @@ Anyone interested in blockchain and research stuff should look into this. ### Unbalanced Tree -- Unbalanced tree means unbalanced performance. An attack vector, if done right. +- Unbalanced tree means unbalanced performance. + An attack vector, if done right. - More about this in FRAME storage, and how it is prevented there. Notes: @@ -465,59 +498,7 @@ Notes: --- -## WAIT A MINUTE... 🤔 - -- Two common scenarios that merkle proofs are kinda unfair: - - If the one of the parent nodes has some large data. - - If you want to prove the deletion/non-existence of a leaf node. - ----v - - - ----v - -## WAIT A MINUTE... 🤔 - -New "tie format" 🌈: - -- All data containing more than 32 bytes are replaced with their hash. -- The (larger than 32 bytes) value itself stored in the database under this hash. - -```rust -struct RuntimeVersion { - ... - state_version: 0, -} -``` - - - ----v - - - -What is the ramification of this for full nodes, and light clients? - -Notes: - -TODO: update figure. - -Both read and write have an extra step now, but proof are easier. - -Note from emeric: the green node is not really a "real" node, it is just `{ value: BIG_STUFF }` -stored in the database. I will skip this detail for the sake of simplicity. One can assume that the -green node is like any other node in the trie. - ---- - -## Substrate Storage: The Updated Picture - - - ---- - -## WAIT A MINUTE... 🤔 +## Trie Caching 🤔 - We rarely care about state root and all the trie shenanigans before the end of the block... @@ -527,10 +508,11 @@ green node is like any other node in the trie. Notes: -in other words, one should one care too much about updating a "trie" and all of its hashing details -while the block is still being executed? all of that can be delayed. +In other words, one should care too much about updating a "trie" and all of its hashing details while the block is still +being executed? +All of that can be delayed. ---- +---v ## Overlay @@ -542,10 +524,10 @@ while the block is still being executed? all of that can be delayed. ### Overlay - Almost identical semantic to your CPU cache: - - Once you read a value, it stays here, and can be re-read for cheap. - - Once you write a value, it will only be written here. - - It can be read for cheap. - - All writes are flushed at the end of the runtime api call. + - Once you read a value, it stays here, and can be re-read for cheap. + - Once you write a value, it will only be written here. + - It can be read for cheap. + - All writes are flushed at the end of the runtime api call. - No race conditions as runtime is single-threaded. ---v @@ -585,7 +567,7 @@ while the block is still being executed? all of that can be delayed. - + @@ -593,16 +575,16 @@ while the block is still being executed? all of that can be delayed. Notes: -- In your code, you often have an option to either pass stack variables around, or re-read code from - `sp-io`. Most often, this is a micro-optimization that won't matter too much, but in general you - should know that the former is more performant, as won't go the the host at all. +- In your code, you often have an option to either pass stack variables around, or re-read code from `sp-io`. + Most often, this is a micro-optimization that won't matter too much, but in general you should know that the former is + more performant, as won't go the the host at all. - A deletion is basically a write to `null`. ---v ### Overlay -- The overlay is also able to spawn child-overlays, know as "_storage layer_". +- The overlay is also able to spawn child-overlays, known as "_storage layer_". - Useful for having a _transactional_ block of code. ```rust @@ -637,46 +619,21 @@ Notes: ### Overlay -- There is a limit to how many nested layers you can spawn -- It is not free, thus it is attack-able. - -```rust -with_storage_layer(|| { - let foo = sp_io::storage::read(b"foo"); - with_storage_layer(|| { - sp_io::storage::set(b"foo", b"foo"); - with_storage_layer(|| { - sp_io::storage::set(b"bar", foo); - with_storage_layer(|| { - sp_io::storage::set(b"foo", "damn"); - Err("damn") - }) - Ok("what") - }) - Err("the") - }); - Ok("hell") -}) -``` - ----v - -### Overlay - - What if I call `sp_io::storage::root()` in the middle of the block? - Can the overlay respond to this? Notes: -NO! overlay works on the level on key-values, ot knows nothing of trie nodes, and to compute the -root we have to go to the trie layer and pull a whole lot of data back from the disk and build all -the nodes etc. etc. +NO! +The overlay works on the level on key-values, it knows nothing of trie nodes, and to compute the root we have to go to +the trie layer and pull a whole lot of data back from the disk and build all the nodes etc. ---v ### Overlay: More Caches -- There are more caches in the trie layer as well. But outside of the scope of this lecture. +- There are more caches in the trie layer as well. + But outside of the scope of this lecture. ```bash ./substrate --help | grep cache @@ -690,7 +647,7 @@ https://www.youtube.com/embed/OoMPlJKUULY ### Substrate Storage: Final Figure - + ---v @@ -699,11 +656,11 @@ https://www.youtube.com/embed/OoMPlJKUULY There are multiple implementations of `Externalities`: - [`TestExternalities`](https://paritytech.github.io/substrate/master/sp_state_machine/struct.TestExternalities.html): - - `Overlay` - - `TrieDb` with `InMemoryBackend` + - `Overlay` + - `TrieDb` with `InMemoryBackend` - [`Ext`](https://paritytech.github.io/substrate/master/sp_state_machine/struct.Ext.html) (the real thing 🫡) - - `Overlay` - - `TrieDb` with a real database being the backend + - `Overlay` + - `TrieDb` with a real database being the backend ---v @@ -723,7 +680,7 @@ let x = sp_io::storage::get(b"foo"); ```rust // ✅ SomeExternalities.execute_with(|| { - let x = sp_io::storage::get(b"foo"); + let x = sp_io::storage::get(b"foo"); }); ``` @@ -786,16 +743,62 @@ Notes: --- -## Trie Format Matters! +## Base 2, Base 16, Base-26? -- Recall that in our "trie walking", we took the state root, and got the root node from the DB. -- The state root of any substrate-based chain, including Polkadot, is the hash of the "Trie Node". +- Instead of alphabet, we use the base-16 representation of everything. -> Trie format matters! and therefore it is part of [the polkadot spec](https://spec.polkadot.network). +> Base-16 (Patricia) Merkle Trie. + +- `System` -> `73797374656d` +- `:code` -> `3a636f646500` + +---v + +### Base 2, Base 16, Base-26? + + + + +Tradeoff: "_IO count vs. Node size_" + + + +Between a light client and a full node, who cares more about which? + + Notes: -Meaning, if another client wants to sync polkadot, it should know the details of the trie format. +Light client cares about node size. +When proof is being sent, there is no IO. + +First glance, the radix-8 seems better: you will typically have less DB access to reach a key. +For example, with binary, with 3 IO, we can reach only 8 items, but with radix-8 512. + +So why should not choose a very wide tree? +Because the wider you make the tree, the bigger each node gets, because it has to store more hashes. +At some point, this start to screw with both the proof size and the cost of reading/writing/encoding/decoding all these +nodes. + +---v + +### Base 2, Base 16, Base-26? + + + +Note: + +Here's a different way to represent it; the nodes are bigger on the base-8 trie. + +---v + +### Base 2, Base 16, Base-26? + +- base-2: Small proofs, more nodes. +- base-26: Bigger proofs, less nodes. + +Notes: +Anyone interested in blockchain and research stuff should look into this. --- @@ -809,7 +812,7 @@ Meaning, if another client wants to sync polkadot, it should know the details of - Merklized storage, and proofs - Large nodes - Radix order consequences -- Unbalanced trie +- Unbalanced tree - State pruning @@ -824,7 +827,8 @@ Meaning, if another client wants to sync polkadot, it should know the details of ## Additional Resources! 😋 -> Check speaker notes (click "s" 😉) +- Check speaker notes (click "s" 😉). +- Follows some additional content that is not covered. @@ -836,10 +840,11 @@ Notes: - About state version: - - https://github.com/paritytech/substrate/pull/9732 - - https://github.com/paritytech/substrate/discussions/11824 + - https://github.com/paritytech/substrate/pull/9732 + - https://github.com/paritytech/substrate/discussions/11824 -- An "old but gold" read about trie in Ethereum: https://medium.com/shyft-network/understanding-trie-databases-in-ethereum-9f03d2c3325d +- An "old but gold" read about trie in + Ethereum: https://medium.com/shyft-network/understanding-trie-databases-in-ethereum-9f03d2c3325d - On optimizing substrate storage proofs: https://github.com/paritytech/substrate/issues/3782 - Underlying trie library maintained by Parity: https://github.com/paritytech/trie @@ -850,13 +855,56 @@ Notes: - https://research.polytope.technology/state-(machine)-proofs -- An interesting, but heretical idea: can the runtime of block N, access state of block N-1? HELL - NO. It might sound like a "but why nooooot" type of situation, but it breaks down all assumptions - about what a state transition is. The runtime is the state transition function. Recall the formula - of that, and then you will know why this is not allowed. +- An interesting, but heretical idea: can the runtime of block N, access state of block N-1? + HELL. + NO. + It might sound like a "but why nooooot" type of situation, but it breaks down all assumptions + about what a state transition is. + The runtime is the state transition function. + Recall the formula of that, and then you will know why this is not allowed. ### Post Lecture Feedback -Double check the narrative and example of the `BIG_STUFF` node. An example/exercise of some sort +Double check the narrative and example of the `BIG_STUFF` node. +An example/exercise of some sort would be great, where students call a bunch of `sp_io` functions, visualize the trie, and invoke -proof recorder, and see which pars of the trie is exactly part of the proof. +proof recorder, and see which parts of the trie is exactly part of the proof. + +--- + +### Overlay + +- There is a limit to how many nested layers you can spawn +- It is not free, thus it is attack-able. + +```rust +with_storage_layer(|| { + let foo = sp_io::storage::read(b"foo"); + with_storage_layer(|| { + sp_io::storage::set(b"foo", b"foo"); + with_storage_layer(|| { + sp_io::storage::set(b"bar", foo); + with_storage_layer(|| { + sp_io::storage::set(b"foo", "damn"); + Err("damn") + }) + Ok("what") + }) + Err("the") + }); + Ok("hell") +}) +``` + +--- + +## Trie Format Matters! + +- Recall that in our "trie walking", we took the state root, and got the root node from the DB. +- The state root of any substrate-based chain, including Polkadot, is the hash of the "Trie Node". + +> Trie format matters! and therefore it is part of [the polkadot spec](https://spec.polkadot.network). + +Notes: + +Meaning, if another client wants to sync polkadot, it should know the details of the trie format. diff --git a/syllabus/5-Substrate/9-SCALE_Slides.md b/syllabus/5-Substrate/9-SCALE_Slides.md index dc8128cb1..ce038a9bd 100644 --- a/syllabus/5-Substrate/9-SCALE_Slides.md +++ b/syllabus/5-Substrate/9-SCALE_Slides.md @@ -20,6 +20,10 @@ Simple Concatenated Aggregate Little-Endian SCALE is a light-weight format which allows encoding (and decoding) which makes it highly suitable for resource-constrained execution environments like blockchain runtimes and low-power, low-memory devices. +Notes: +- It is a encoding format used to communicate over the wire. Similar to json, protobuf. +- Extremely light weight, we will see how. + --- ### Little-Endian @@ -42,6 +46,11 @@ Wasm is a little endian system, which makes SCALE very performant. +Notes: +- Endianness is the order of bytes. +- Big Endian => Most significant byte at the smallest address. Similar to English. Generally used in network protocols. +- Little Endian => Least significant byte at the smallest address. + --- ### Why SCALE? Why not X? @@ -55,6 +64,11 @@ Wasm is a little endian system, which makes SCALE very performant. - Supports a copy-free decode for basic types on LE architectures. - It is about as thin and lightweight as can be. +Notes: +- MaxEncodedLen: Maximum encoded size to make some runtime guarantees about computation. +- TypeInfo: Used to generate metadata. +- Bijective exception later. + --- ### SCALE is NOT Self-Descriptive @@ -176,11 +190,11 @@ fn main() { [45] [45] 101010 -[2a, 00] [00, 2a] +[2a, 00] 111111111111111111111111 -[ff, ff, ff, 00] [00, ff, ff, ff] +[ff, ff, ff, 00] ``` diff --git a/syllabus/5-Substrate/9-Substrate-Interactions_Slides.md b/syllabus/5-Substrate/9-Substrate-Interactions_Slides.md index 5fe6c4edd..fb9a915f1 100644 --- a/syllabus/5-Substrate/9-Substrate-Interactions_Slides.md +++ b/syllabus/5-Substrate/9-Substrate-Interactions_Slides.md @@ -7,17 +7,56 @@ duration: 60 minutes --- +## Before we start + +Find all the commands that will be used in this workshop: +[tinyurl.com/hk24-substrate](https://hackmd.io/@ak0n/hk24-substrate-interaction) + +--- + +## Before we start + +- Clone polkadot-sdk + +```sh +git clone https://github.com/paritytech/polkadot-sdk.git +``` + +
+ +- Compile your node + +```sh +cargo build --release -p minimal-node +``` + +--- + ## Interacting With a Substrate Blockchain - +> How does a user or an application interact with a blockchain? Notes: -Many of these interactions land in a wasm blob. +- Wait for 1 answer from students or at least for 10 seconds. -So what question you need to ask yourself there? which runtime blob. +---v + +## Interacting With a Substrate Blockchain -almost all external communication happens over JSPN-RPC, so let's take a closer look. +- Usually they connect to a public RPC server, i.e. a substrate node that exposes its RPC interface publicly. + +
+ +- Run their own node. + + + +---v + +## Interacting With a Substrate Blockchain + + --- @@ -34,7 +73,10 @@ almost all external communication happens over JSPN-RPC, so let's take a closer { "jsonrpc": "2.0", "method": "subtract", - "params": { "minuend": 42, "subtrahend": 23 }, + "params": { + "minuend": 42, + "subtrahend": 23 + }, "id": 3 } ``` @@ -42,7 +84,11 @@ almost all external communication happens over JSPN-RPC, so let's take a closer
```json -{ "jsonrpc": "2.0", "result": 19, "id": 3 } +{ + "jsonrpc": "2.0", + "result": 19, + "id": 3 +} ``` @@ -54,79 +100,209 @@ almost all external communication happens over JSPN-RPC, so let's take a closer - Entirely transport agnostic. - Substrate based chains expose both `websocket` and `http` (or `wss` and `https`, if desired). -> with `--ws-port` and `--rpc-port`, 9944 and 9934 respectively. +Notes: + +- You could choose which port to run the ws or http server on by using the flags `--ws-port` and `--rpc-port` + respectively. By default, port 9944 is used. ---v ### JSON-RPC -- JSON-RPC methods are conventionally written as `scope_method` +The RPC methods that a substrate node exposes are scoped and has the pattern `"_"`. + +```sh + wscat \ + -c ws://localhost:9944 \ + -x '{"jsonrpc":"2.0", "id": 42, "method":"rpc_methods" }' \ + | jq +``` - - e.g. `rpc_methods`, `state_call` +---v -- ­ `author`: for submitting stuff to the chain. +### JSON-RPC: Scopes + +- ­ `author`: for submitting extrinsic to the chain. - ­ `chain`: for retrieving information about the _blockchain_ data. - ­ `state`: for retrieving information about the _state_ data. - ­ `system`: information about the chain. - ­ `rpc`: information about the RPC endpoints. Notes: - recall: https://paritytech.github.io/substrate/master/sc_rpc_api/index.html https://paritytech.github.io/substrate/master/sc_rpc/index.html -The full list can also be seen here: https://polkadot.js.org/docs/substrate/rpc/ +- The full list can also be seen here: https://polkadot.js.org/docs/substrate/rpc/ +- Specs: https://paritytech.github.io/json-rpc-interface-spec/introduction.html +- Upcoming changes to JSON-RPC api: https://forum.polkadot.network/t/new-json-rpc-api-mega-q-a/3048 + +--- + +### Workshop: Intro + +- Transfer tokens from Alice to Bob. + +
+ +- We will cheat a bit and take help sometimes from [Polkadot.js app](https://polkadot.js.org/apps/#/explorer). + + + +Notes: + +- When we start up a dev chain, some well known accounts are already minted some balance at genesis. We will use Alice + and Bob which are well known accounts. +- The parts we cheat is because we will need to know more about FRAME to be able to calculate some storage keys. ---v -### JSON-RPC +### Workshop: Spin up your node + +- Check out cli docs +```sh +./target/release/minimal-node --help +``` + +
+ +- Spin up your dev node. +```sh +./target/release/minimal-node --chain=dev --tmp +``` + + -- Let's look at a few examples: +Notes: -- `system_name`, `system_chain`, `system_chainType`, `system_health`, `system_version`, `system_nodeRoles`, `rpc_methods`, `state_getRuntimeVersion`, `state_getMetadata` +- What does --chain=dev and --tmp do? What other flag can you use? + +---v + +### Workshop: Check balance + +- Query current balance of Alice and Bob. ```sh wscat \ - -c wss://kusama-rpc.polkadot.io \ - -x '{"jsonrpc":"2.0", "id": 42, "method":"rpc_methods" }' \ + -c ws://localhost:9944 \ + -x '{"jsonrpc":"2.0", "id": 42, "method":"state_getStorage", "params": [""] }' \ | jq ``` +Notes: + +- You will learn how the storage key is calculated in FRAME based substrate chains in the FRAME module. +- What do you get? + ---v -### JSON-RPC: Runtime Agnostic +### Workshop: Metadata -- Needless to say, RPC methods are runtime agnostic. Nothing in the above tells you if FRAME is - being used or not. -- Except... metadata, to some extent. +- Recall type information is lost in SCALE encoded data. +- Substrate exposes type information using metadata. ----v +```sh +wscat \ + -c ws://localhost:9944 \ + -x '{"jsonrpc":"2.0", "id": 42, "method":"state_getMetadata" }' \ + | jq +``` -### JSON-RPC: Runtime API +
-- While agnostic, many RPC calls land in a runtime API. -- ­ RPC Endpoints have an `at: Option`, runtime APIs do too, what a coincidence! 🌈 - - ­ Recall the scope `state`? +- This itself is Scale Encoded. See [frame-metadata](https://github.com/paritytech/frame-metadata). +- Derive type of AccountInfo using this metadata. + + + +Notes: + +- Use PJS app to get frame-metadata: Developer > RPC Calls > state > getMetadata. +- [Metadata](https://hackmd.io/@ak0n/rJUhmXmK6) with most details not relevant stripped off. +- Read more about + metadata: https://docs.substrate.io/build/application-development/#exposing-runtime-information-as-metadata. ---v -### JSON-RPC: Extending +### Workshop: Decoding balance -- The runtime can extend more custom RPC methods, but the new trend is to move toward using `state_call`. +- Use [scale decoder](https://www.shawntabrizi.com/substrate-js-utilities/codec/) to decode balance. +- Use the following type information for AccountInfo. + +```json +{ + "info": { + "nonce": "u32", + "ignore": "(u32, u32, u32)", + "balance": { + "free": "u64", + "ignore": "(u64, u64, u128)" + } + } +} +``` + +Notes: +The actual type is: + +```json +{ + "info": { + "nonce": "u32", + "ignore": "u32", + "providers": "u32", + "sufficients": "u32", + "balance": { + "free": "u64", + "reserved": "u64", + "frozen": "u64", + "flags": "u128" + } + } +} +``` ---v -### JSON-RPC: Safety +### Workshop: Transfer some tokens + +- ­ Take PJS help to get the signed extrinsic. +- ­ Use the following command to submit the extrinsic. +```sh +wscat \ + -c ws://localhost:9944 \ + -x '{"jsonrpc":"2.0", "id": 42, "method":"author_submitExtrinsic", "params": [""] }' \ + | jq +``` +- ­ Check balance again for both accounts. +- ­ What happens to nonce of Alice? + +Notes: -- Some PRC methods are unsafe 😱. +- Students will learn how to build the signed extrinsic themselves in their assignment. +- Let students do the second part themselves. ---v -### JSON-RPC: Resilience +### Workshop: Versions + +- Find runtime version of the polkadot and westend chain `state_getRuntimeVersion`. +- Find node version of the polkadot and westend chain `system_version`. +- Change RPC provider and see if any of the above value changes? + +
+ +- For runtime version, you read `specVersion` : `1,005,000` as `1.5.0`. -RPC-Server vs. Light Client + + +Notes: + +- `wscat -c wss://polkadot-rpc.dwellir.com -x '{"jsonrpc":"2.0", "id":1, "method":"state_getRuntimeVersion"}' | jq`. +- `wscat -c wss://polkadot-rpc.dwellir.com -x '{"jsonrpc":"2.0", "id":1, "method":"system_version"}' | jq`. +- Show polkadot telemetry: https://telemetry.polkadot.io/. --- @@ -135,7 +311,6 @@ RPC-Server vs. Light Client - On top of `SCALE` and `JSON-RPC`, a large array of libraries have been built. - ­ `PJS-API` / `PJS-APPS` -- ­ `capi` - ­ `subxt` - ­ Any many more! @@ -144,6 +319,19 @@ Notes: https://github.com/JFJun/go-substrate-rpc-client https://github.com/polkascan/py-substrate-interface more here: https://project-awesome.org/substrate-developer-hub/awesome-substrate +Listen to James Wilson introducing subxt: https://www.youtube.com/watch?v=aFk6We_Ke1I + +--- + +## Additional Resources! 😋 + +> Check speaker notes (click "s" 😉) + +Notes: + +- see "Client Libraries" here: https://project-awesome.org/substrate-developer-hub/awesome-substrate +- https://paritytech.github.io/json-rpc-interface-spec/introduction.html +- Full subxt guide: https://docs.rs/subxt/latest/subxt/book/index.html --- @@ -154,7 +342,7 @@ In Kusama: - Find the genesis hash.. - Number of extrinsics at block 10,000,000. - The block number is stored under `twox128("System") ++ twox128("Number")`. - - Find it now, and at block 10,000,000. +- Find it now, and at block 10,000,000.
@@ -183,99 +371,9 @@ Notice that this number that we get back is the little endian (SCALE) encoded va --- -## Polkadot JS API - -A brief introduction. - -Excellent tutorial at: http://polkadot.js.org/docs - ----v - -## Polkadot JS API - - - ----v - -### PJS: Overview - -- `api.registry` -- `api.rpc` - ----v - -### PJS: Overview - -Almost everything else basically builds on top of `api.rpc`. - -- `api.tx` -- `api.query` -- `api.consts` -- `api.derive` - -Please revise this while you learn FRAME, and they will make perfect sense! - ----v - -### PJS: Workshop 🧑‍💻 - -Notes: - -```ts - -import { ApiPromise, WsProvider } from "@polkadot/api"; -const provider = new WsProvider("wss://rpc.polkadot.io"); -const api = await ApiPromise.create({ provider }); -api.stats; -api.isConnected; - // where doe this come from? -api.runtimeVersion; -// where does this come from? -api.registry.chainDecimals; -api.registry.chainTokens; -api.registry.chainSS58; -// where does this come from? -api.registry.metadata; -api.registry.metadata.pallets.map(p => p.toHuman()); -api.registry.createType(); -api.rpc.chain.getBlock() -api.rpc.system.health() -await api.rpc.system.version() -await api.rpc.state.getRuntimeVersion() -await api.rpc.state.getPairs("0x") -await api.rpc.state.getKeysPaged("0x", 100) -await api.rpc.state.getStorage() -https://polkadot.js.org/docs/substrate/rpc#getstoragekey-storagekey-at-blockhash-storagedata -await api.rpc.state.getStorageSize("0x3A636F6465"), -``` - -A few random other things: - -```ts -api.createType("Balance", new Uint8Array([1, 2, 3, 4])); - -import { blake2AsHex, xxHashAsHex } from "@polkadot/util-crypto"; -blake2AsHex("Foo"); -xxHashAsHex("Foo"); -``` - ---- - ## `subxt` -- Something analogous to `PJS` for Rust. +- Something analogous to `PJS api` for Rust. - The real magic is that it generates the types by fetching the metadata at compile time, or linking it statically. - ..It might need manual updates when the code, and therefore the metadata changes. - ---- - -## Additional Resources! 😋 - -> Check speaker notes (click "s" 😉) - -Notes: - -- see "Client Libraries" here: https://project-awesome.org/substrate-developer-hub/awesome-substrate -- https://paritytech.github.io/json-rpc-interface-spec/introduction.html -- Full subxt guide: https://docs.rs/subxt/latest/subxt/book/index.html diff --git a/syllabus/5-Substrate/README.md b/syllabus/5-Substrate/README.md index 4f78b9904..1f5abe36f 100644 --- a/syllabus/5-Substrate/README.md +++ b/syllabus/5-Substrate/README.md @@ -31,7 +31,7 @@ Ensure the `main` branch is write protected, by required a PR first`- no one sho #### Morning 1. [Introduction](./1-Intro-to-Substrate_Slides.md) (60m) -1. [WASM Meta Protocol](./2-WASM-Meta-Protocol-Slides.md) (90m) +1. [Wasm Meta Protocol](./2-Wasm-Meta-Protocol-Slides.md) (90m) 1. Activity: Finding Runtime APIs and Host Functions in Substrate