-
Notifications
You must be signed in to change notification settings - Fork 767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
D-Day Governance #5588
Comments
Acknowledging the issue. Not sure how much availability I have, but I can def mentor someone. Depends on urgency. If not urgent, I could probably get small pieces of this story done over the weeks. |
This falls into the more long term requirements of Asset Hub, not being needed until the very final days. In that sense, I was going to suggest you start working on it after your current project is done and roughly by end of DevCon? |
This is using a binary merkle tree and the chain is using a 16 patricia merkle tree. They are not compatible. We already have other code in historical session that does the checking of proofs already. Generally, with the development of JAM, we will not have this luxury of having an extra governance sitting on the relay chain. So, when in JAM all chains stop, we don't have governance as well. So, a little bit questionable if we need this pallet at all. Or do you just want it for the period where governance switches over to AH and we are afraid of it not working properly on AH? |
Afaik XCMP need real state proofs into other parachain's state, so one could abstract that somewhat. We'll avoid starving "true system parachains" ala #4632 (comment). We've not concretely defined that term yet, maybe audited like polkadot itself and no flexible execution aka no smart contracts. Also maybe no advanced collator communication, which maybe forbids elastic scaling. We've discussed reverting code upgrades automagically too, but afaik nothing currently in progress, and maybe imposes design restrictions. We've more ways individual parachains can brick of course. Also JAM should bring much new brickage, but "true system parachain" could forbid non-trivial accumulation, which again maybe forbids elastic scaling. Anyways, if collectives were kept relatively simple, than maybe collectives alone could provide this? Or maybe some simpler multi-sig derived from collectives? AH doing governance directly maybe a design mistake too, because doing so add tension between different concerns. |
@bkchr I actually switched to a compact base 16 trie because the binary tree libraries were unusable in the runtime currently |
Exactly for this period. |
I assume with this comment, there is no blocker to implement this, right? It would be great to get a prototype of a pallet that tightly couples with the parachain pallets (e.g. can only work in RC), and can request to read the state of a parachain based on its latest state root: as in, have an extrinsic where anyone can provide a state proof of a parachain, and it would verify it based on the last known state root of the given para. /// Provide the state `proof` for `id` at `block`, or the latest block if not provided
fn poc_read_para_state(id: ParaId, proof: Vec<Vec<u8>>, block: Option<BlockNumber>) @bkchr do you know if this exists anywhere? If this can be built, I will have no doubts that the rest of this issue can also be done. |
It doesn't exist yet. However, building it should be straightforward, but it would also not support every parachain. Parachains are not required to use any specific state layout. But for the system chains we can make it work. |
Some code to demonstrate:
https://github.com/paritytech/polkadot-sdk/compare/kiz-dday-demo?expand=1 |
We'll want parachains that never stall for PJR tests and DKGs, but they'd avoid censorship vectors like smart contracts, and never make too many blocks either, aka no elastic scaling. In principle, relay chain governance could always take place on some non-stallable parachain, so not AssetHub, but using proofs into AssetHub state. |
I guess what is missing there maybe is a double map:
then people call the Then we should be able to do all local operations on that Head. We also probably want a way to migrate the total issuance number over for things like the voting curves, so we know when we reach certain levels of voter thresholds. |
What is non-stall-able? It has no bugs + gets infinite POV limit? I am not sure if we have such a thing or can build it fast enough. Although, if this is easier to build, I agree that we should still build the pallet I said above, but instead of RC, put it in this special parachain, and let it work on-demand: it will only start working when it detects AH is in trouble. This is more JAM-compatible. @eskimor any comments from you? |
This is only relevant if we want to do multiple voting son the same frozen AH, right? I hadn't thought of this, as I assumed the only voting will be for something that will un-block AH. It is good optimization.
Indeed, it can be provided with the same mechanism quite trivially. |
We don't have anything like that. However, if a separate parachain that has only the rescue pallet, the failure surface is quite small. The chain also would not really need any kind of state only for the one proposal that would need to be executed there. |
Yes, stall-able is a metric, not a yes or no. At a high level, fewer features means harder to stall. You could make an almost-impossible-to-stall PJR check chain, by replacing the parachain state root by just the score, and allowing another block that improves the score. This means a staking miner could advance the state of the PJR check chain only by knowing the relay chain state, not the previous PJR check results. This is removing the feature of having state to make the PJR check chain harder to stall. It's harder to make DKG chains similarly hard to stall, but somewhat possible Fully utilized chains would permit partial functionality stalls, because being fully utilized means not reserving anything. Smart contracts would typically open attack vectors that partially stall chains, becuase adversaries could find tricks that consume all the resources. Elastic scaling would often permit chain takeovers by not giving other collators enough sync time. We should expact AH can be stalled more easily because AH shall have all three. All that is why you're proposing d-day governance, but.. Fallbacks suck. Why not always do RC governance on some parachain that's harder to stall than AH? We could leave treasury on AH, because treasury stalling doesn't break anything, but do system code upgrades and parameters somewhere safer. |
I see. Let's first discuss the failure-surface. Note, the relay chain will have some code in its runtime that handles parachains (para-runtime). I assume there is in principle the possibility to also have a bug in this, in which case all parachains could stop working, no matter their code.
Putting the rescue pallet in another parachain has the benefit that it is more JAM-compatible, but it does not help with the second failure. Putting it in the RC is not JAM-compatible, but handles both failures. I might be paranoid by thinking the second failure is actually a feasible one. @eskimor implied in conversation off-band that I might be wrong to worry about this. In this case, having a similar rescue system in a separate on-demand parachain makes more sense. Also cc @ordian |
While we could break the relay chain runtime in a way that only parachain consensus is entirely broken, I would doubt that the risk is much higher than messing up the relay chain runtime in some other way (preventing relay chain governance from working). If this happened, we would need a hardfork to fix it, just as if we messed up a relay chain upgrade right now. Asset hub no longer making progress is disastrous enough, that we should work hard to make this as unlikely as possible. Also purely hypothetical: If all of parachain consensus broke, then we would want to have this fixed as quickly as possible and not do some governance dance, but instead indeed likely a hard fork will be demanded by pretty much everybody. Same is likely true if asset hub breaks. |
We've fixed bad upgrades before using on-chain governance, and not hardforks, although sometimes only barely, and maybe we no longer make those mistakes. I'm assuming the RC continues running correctly, including elves/approvals and grandpa. I suppose AH might continue running correctly-ish too. Yet, we have problems backing honest AH parachain blocks, maybe because of malicious actors, or maybe unintentionally like from high or wierd usage. In particular, we'll seemingly want AH to push a high tps for bragging rights, but this requires full AH blocks get used by transactions, meaning no reserved space for the ellection. That's problematic. It's not bugs per se, but parachain choices that trade away resiliance for throughput and flexibility. In theory, a parachain project could always run "better" infrastructure, and that maybe how you land insane tps, but we're the L1 so their "better" might feel centralized to us. Also.. There maybe similar robustness arguments going the opposite way, like the governance chain needing reliable infrastructure. If that's the case, then maybe a seperate d-day chain makes sense? It's unclear if AH failures could be detected though, so maybe activating the d-day chain should be the d-day chain's first act? Anyways I worried mostly that we were going to have a fallback that barely worked, or required double the debugging time, when we should be doing it right in one place, but maybe that's not an easy choice to make right away. |
After a conversation with @burdges and @eskimor at the retreat about this and specifically where the functionality could live, we wondered about the option of having a track on the collectives chain which can achieve root in specific scenarios, or some subset of operations with root perms, or at the very least restart AH if it stalls. If we have the ability to restart AH from Collectives and vice versa, and then add a constraint that these chains need to be upgraded at different times then we solve part of the problem, missing only some problem which takes a long time to show itself on both chains. This is not along the lines of "minimal system upgrades chain" but it's a clean solution to part of the problem. The back up option if we want a dedicated chain is fairly straightforward with a system parachain which is registered but dormant and can be spun up with on-demand coretime in the case that we need to recover one or both of AH/collectives if we bork them. This means the hard fork sledgehammer is only needed for relay chain problems or where all parachains are not making progress (which is likely also a relay chain problem). I think my takeaway from this conversation is that the functionality does not need to be on the relay chain |
Yeah, I'd worried about maintenance costs of having two difference governance systems, but actually maintenance need not be problematic if we debug and run the same code in both places. We could turn off conviction for code upgrades maybe, so then only dot ownership and fellowship status matter, which simplifies using remote dot ownership proofs. As a first step, we could re-engeneer the storage interface, so that remote storage proofs can be first class citizens, alongside local storage proofs. That's a huge win for polkadot overall regardless. It doesn't matter if this reengeneering cannot work within the macro DSL, becuase we could port the governance code to the new storage interface. I suppose the macro DSL could differ in different build units, but afaik the fellowship lives in collectives, so every governance vote needs proofs into both collectives and AH, with one remote and one local. That's amazingly cool, but that's also some scary complexity. I suppose foilk here envision this being fellowship or collectives, instead fo fellowship and collectives. That's fine, but that's a bigger governance change than merely dropping conviction, no? Anyways, we should break down the single chain stall conditions:
In other words, AH has far more functionality than collectives, like contracts, so any exploit or weakness of collectives implies an exploit of AH, but not coversely. |
Write a new governance pallet that should reside in the relay chain, while the main governance apparatus resides on Asset Hub.
The main usage of this pallet is when AH and/or Collectives are not producing blocks, and therefore can no longer access
Root
on the relay chain.The assumption of this pallet is that it can have access to the latests state root of both Collectives and AH, and also has some notion of "soft metadata" of Collectives and AH. As in, it knows that a state proof corresponding to a specific hard-coded key prefix is associated with e.g. the balance of a user in AH.
A few key properties of this pallet:
Proposals creation:
origin
sends a proof offellowshipCollective::members(who) -> rank
, then ensure that theorigin
waswho
, andrank
is high enough.pallet-referenda
.Voting
Tally
and then linked topallet-referenda
.aye/nay
).Option 1: Simple
type MinimumVotingPower
.transaction pool validation step, which will require at least reading one storage item.
Option 2: Meta Transaction Style
who
regarding the vote (signed(aye/nay)
), allowingorigin
to vote on behalf ofwho
.origin
if it is the first valid vote ofwho
who
changed their mind) must payorigin
proportional to the claimed voting power.origin
is slashed if invalid.Relies on #5400. @shawntabrizi would you like to work on this after your current work? It seems to fit your aptitude very well.
Demo branch: https://github.com/paritytech/polkadot-sdk/compare/kiz-dday-demo?expand=1
The text was updated successfully, but these errors were encountered: