State cleanup / efficiency proposal: replace Code CIDs in actors with stable ids #1090

rvagg · 2024-12-12T10:27:13Z

rvagg
Dec 12, 2024
Maintainer

Introduction

The current Filecoin actor object uses a Code field containing a CID linking to a raw IPLD block containing the WASM for the particular builtin actor type of that actor.

This is used in two ways:

When the FVM needs to execute a method on an actor, it uses the code field to load the WASM to execute and then runs that against the actor's state, in particular its own balance and its state (Rust) / Head (Go) state root. i.e. Code + Data come together in the Actor object. Here: https://github.com/filecoin-project/ref-fvm/blob/551e24a9b7b4c8b1e42731b497d12202732beed1/fvm/src/call_manager/default.rs#L749-L756
When inspecting state, in Lotus in particular, we use the Code CID to determine the network version by searching the historic actor CIDs for a match; then we are able to determine the schema of the actor's state. e.g. here: https://github.com/filecoin-project/lotus/blob/ae5b84503cea5b996d4a9d3ed46c4bdabd4ddea8/chain/actors/builtin/miner/miner.go#L33-L39

Problem

One of the negative results of storing the current actor's code CID in each actor object is that when we replace the code of an actor, we need to update all actor objects in the state tree to point to their new code CIDs. For most network version upgrade migrations, this takes the majority of the time and it's increasing over time as we add more actors.

The latest migration benchmark shows a growth of ~75k actors (to 3,266,879) and an increase in migration time of ~2.5s. With ~average hardware we're nearing the 30s mark for migrations, where we almost exclusively are updating the code CID of actors. There's also a lot of state churn in this operation as we have to rebuild the entire actors tree.

There's also size, the actor code CIDs take 39 bytes in each actor object, pointing to the same 16 IPLD blocks.

Unfortunately, actors have worked like this from the begining, even before WASM builtin actors, they used to use identity CIDs with the actor type and versions in a string. So a lot of code assumes that an actor object has the data+code pairing in it.

Proposal

Let's replace the Code field in the actor object with a stable identifier for the actor type. The identifier could be an integer for efficiency and we'd establish a stable mapping of integer to actor type that would remain consistent over time (we could avoid the 0-99 range to escape the confusion with the singleton actor IDs [e.g. f01, etc.] —which confused me when I started looking at this tbh). Then we just need a place to store that mapping such that we can easily fetch it when needed so we end up with the code+data pairing wherever we need it.

Option: Use the system actor's state

The shape of the system actor's state is currently a map of actor string names to code CIDs, we update this at each network version upgrade.

Two options exist here if we wanted to repurpose this:

Use a string identifier in the actor objects that allow us to look up the code CID in the system actor's state.
Change the system actor state schema to contain a mapping of actor integer identifiers to code CIDs and then use integers to look up the code CID in the system actor's state.

Unfortunately, this has a downside of being a slightly recursive, or at least having a recursive smell: looking for a specific actor f00 to load its state to find the code of any other actor.

Option: place the mapping at the top of the state root

The current state root object:

Has a version, a link to the actors HAMT (where the 3M actors live) and a link to an "info" object; which is simply the empty array CBOR block []; i.e. it's never been used for anything interesting.

We could park a mapping of actor integer identifiers to code CIDs off the state root block, as a separate block, and we could even do it without changing the schema of the state root by repurposing the info CID to be an actor_mapping CID. Each network version upgrade we'd get a new one of these blocks and the CID to it would be stored in the state roots from that point until the next upgrade.

Usage

With either option, the mapping is accessible from the state root, so whenever we want to make use of an actor, we need to either be accessing the actor from the state root, or have the state root at hand.

FVM

The FVM has no concept of historical state, it's called on a particular state root. When we execute an actor's method, we start from an actor ID and we load the actor from the state root that the FVM "engine" is instantiated with: https://github.com/filecoin-project/ref-fvm/blob/551e24a9b7b4c8b1e42731b497d12202732beed1/fvm/src/call_manager/default.rs#L665-L668

Currently we simply use that actor object as the code+data pair. But we could just as easily also load the code CID from the mapping in the state root and create a new code+data pair when we load the actor and then execute off this.

Lotus (etc.)

Typically we use actors in Lotus APIs by ID or address. We're either finding them in the latest state root (no TipSetKey specified) or in a specific state root (a specific TipSetKey specified). In both cases we're starting from a state root and finding the actor and then using that actor to determine the state schema and then making use of the state data for whatever the API requires.

e.g. StateMinerSectors works like this:

LoadActorTsk loads the actor from a given TipSetKey and an actor address. The LoadActor* family of methods all start from a state root and drill down to find the actor and just return that.
Call miner.Load(actor) to do a miner-specific load of the actor's state data to make use of it; this in turn uses the actor's Code to work out which network version it was and then uses that to determine the schema of the state data that it needs to load.
Calls a LoadSectors method on the object returned by miner.Load which abstracts the loading of the sectors from the state data across all network versions such that they act the same way regardless of the underlying schema.

This is fairly typical of actor use in Lotus, starting from the state root and using the code of the code+data pair to determine the schema of the data. We could simply fetch this code from the mapping in the state root instead of from the actor object itself once we have the actor object and the id to map.

Migrations

Migrations would skip individual actor migrations except where something about the actor's state needs to change. So we don't mutate actors where there hasn't been a FIP that touches it. We would need to update the mapping in the state root, but that's new, single IPLD object and a new CID for that object placed in the state root.

jennijuju · 2024-12-12T22:39:16Z

jennijuju
Dec 12, 2024
Maintainer

This proposal SGTM. However, please be careful to communicate breaking changes to users - i know there are toolings that are currently dependent on Code CIDs to get a hold of what kind of actor it is, and execute logic based on that. i.e only get traces if the code cids match msig's.

3 replies

Stebalien Dec 12, 2024
Collaborator

One option we discussed was to hide this change away by performing translations at API layers. E.g., when the user asks for an "actor root", we'd pre-resolve the code ID to a code CID. So users wouldn't see any changes.

jennijuju Dec 12, 2024
Maintainer

i guess the question is if we plan to migrate the existing code cids with IDs during upgrade migration? If yes, this option may not work? if not, that works but users who (1) depend on that field (2) deal with info across multiple network versions, will need to keep 2 logics in their systems.

Stebalien Dec 12, 2024
Collaborator

I think the assumption is that:

Code "IDs" stay the same between upgrades.
Code CIDs change between upgrades. That is, we change the ID -> CID mapping each time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

State cleanup / efficiency proposal: replace Code CIDs in actors with stable ids #1090

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

State cleanup / efficiency proposal: replace Code CIDs in actors with stable ids #1090

rvagg Dec 12, 2024 Maintainer

Introduction

Problem

Proposal

Option: Use the system actor's state

Option: place the mapping at the top of the state root

Usage

FVM

Lotus (etc.)

Migrations

Replies: 1 comment · 3 replies

jennijuju Dec 12, 2024 Maintainer

Stebalien Dec 12, 2024 Collaborator

jennijuju Dec 12, 2024 Maintainer

Stebalien Dec 12, 2024 Collaborator

rvagg
Dec 12, 2024
Maintainer

Replies: 1 comment 3 replies

jennijuju
Dec 12, 2024
Maintainer

Stebalien Dec 12, 2024
Collaborator

jennijuju Dec 12, 2024
Maintainer

Stebalien Dec 12, 2024
Collaborator