Built-in market cron risk and mitigations (FIP-0060) #638
Replies: 8 comments 8 replies
-
Thank you for the write-up. I'm very glad we finally looked at this problem in depth, and am thrilled we have an easy solution that drastically improves the problem without any significant impact on UX or product quality. I'm in favour of including the Short-term mitigation in nv19. Regarding its implementation, I would be interested in at least considering a migration-based approach -- I suspect we might be able to do so relatively efficiently, since (I suspect) the result might be almost entirely perfectly pre-computable ahead of time. This will depend on the actual proposed change, of course. |
Beta Was this translation helpful? Give feedback.
-
In the long term solution, it means now the SP/client doing the deal will be the ones paying for the gas consumed by their deal payment processing? Is that the solution, to have the network no longer subsidize this process? This appears to be a very gas demanding process, exceeding the block gas limit, so this would be essentially dealing with the problem by congestion pricing? pricing out excessive deal payment processing? (I understand the short term solution alleviates this demand problem for the moment, making it 1/30 as gas demanding?) |
Beta Was this translation helpful? Give feedback.
-
Great stuff! @anorth I have a question how this change interacts with "PSD sent but never sealed a sector with that DealID". At present my logic to detect this situation is as follows ( courtesy of @ZenGround0 )
I suspect |
Beta Was this translation helpful? Give feedback.
-
A draft FIP is in progress #648 |
Beta Was this translation helpful? Give feedback.
-
Any estimates on how much gas is expected to rise once cron gas is properly computed? |
Beta Was this translation helpful? Give feedback.
-
FIP-0060 appears to have had the hoped-for impact on block validation times in nv19. We slightly under-estimate the urgency, as the processing time started growing significantly shortly after the discussions above. We still need to implement a permanent fix for this or the issue will recur. |
Beta Was this translation helpful? Give feedback.
-
A large part of the market cron gas cost was related to massive state utilization of the AMT (~60GB written per 2000 epochs). The following is a summary of the analysis behind this. The TL;DR is that evidence supports the long term proposal of removing cron processing of deal states entirely. We will also save ~4 GB of snapshot space today if we do this. AMT Churn ExplanationBackgroundMarket ChurnChurn is defined to be the size in bytes of the data in a datastructure that is overwritten over some period of time. Because GC in lotus is somewhat difficult lots of churn in filecoin state has a high system cost We learned after nv19 that AMT churn is > 100x higher than optimal AMT size was 500 MB Churn was 50 GB in 2000 epochs, so a full rewrite of AMT values takes over 100x the space of just rewriting all of those values. AMTThe AMT is a simple hash linked tree datastructure used to implement an array. It has structure: type AMTRoot struct {
BitWidth uint64
Height uint64
Count uint64
AMTNode AMTNode
}
type AMTNode struct {
Bmap []byte
Links []cid.Cid
Values []*cbg.Deferred
} Either Links or Values is populated (link nodes vs leaf nodes), not both. The bitwidth defines the max length of Links or Values, setting both the leaf node size and the internal node branching factor. All leaf nodes have the same height. Explaining States AMT ChurnEmpirical InvestigationRunning Rerunning this with an update interval of every 30 days to match post nv19 behavior we get Raw output: ./lotus-bench amt 20.2s Wed Jul 12 23:24:34 2023
2023-07-12T23:24:40.871-0600 INFO lotus-bench lotus-bench/main.go:110 Starting lotus-bench
Populating AMT
Measuring AMT
------------
Link Count: 634922
Value Count: 40000000
9923 link nodes 27430652 bytes
625000 value nodes 527993648 bytes
Total bytes: 555424300
------------
Overwrite 1 out of 2880 values for 20 rounds
round: 0
round: 10
Measuring 20 rounds of churn
------------
Link Count: 634922
Value Count: 888896
9923 link nodes 27430652 bytes
13889 value nodes 11733173 bytes
Total bytes: 39163825
------------ Full rewrite and BitwidthAn AMT with bitwidth 6 has 64 entries per node. This high bitwidth explains why most of the states AMT is rewritten when only a fraction of values are overwritten. In order for an evenly spaced overwrite workload to rewrite all link nodes all bottom level link nodes must change. To accomplish this 1 out of 64*64 = 4096 value nodes must change since each link node addresses 64 link nodes. Since 4096 > 2880, the pre-nv19 market cron rewrote all link nodes. Additionally high bitwidth means that much more data in the leaves is churned through than the actual data being updated. About 12 MB of value data are churned through in one update at interval 2880. Over the course of a full overwrite that puts us at ~34 GB, a churn amplification on the order of 100x. Datastructure mitigations for churnA smaller bitwidths can help churn. Empirically it does not help that much. One round of churn with the same parameters and an AMT bitwidth of 2 is only a factor of 4 reduction from 39MB down to 12 MB. Additionally there is the downside that other AMT operations unrelated to cron churn will incur more overhead with lower bitwidth which makes this approach less promising. A more specialized datastructure with ipld node locality for deal states grouped by interval would probably work ok for this. An array of AMTs one per interval would work for smaller interval sizes. For the current 86400 intervals that array is too big for top level state so perhaps an AMT of low bitwidth pointing to AMTs of high bitwidth would do the trick. In summary this can’t be seriously addressed by just tweaking a parameter. More sophisticated approaches will need to keep in mind the additional constraints for the States AMT from PSD / sector activation alongside cron churn and hence it will be non trivial to get this right. All this supports the proposed approach (”Long-term fix” section of #638) of mitigating churn issues by skipping cron churn altogether by pushing market payments into user triggered messages. |
Beta Was this translation helpful? Give feedback.
-
See #800 for a proposal for the long-term resolution to this problem. |
Beta Was this translation helpful? Give feedback.
-
This is primarily the work of @Kubuxu and @ZenGround0, I just happened to be the one to discover the built-in market actor is at fault.
Background
Filecoin tipset execution includes a cron-like facility for scheduled execution of actor code at the end of every epoch. This cron activity is real work, which we can account in gas units, but is not paid for by any external party (no tokens are burnt as a gas fee). Cron is very convenient for some maintenance operations, but is essentially a subsidy from the network to whichever actors get the free execution (which is only ever built-in actors).
As network activity has grown, the amount of work done in cron has increased. Recent analysis shows that cron execution is frequently consuming 80 billion gas units each epoch. For context, the block gas limit is 5 billion, so a tipset with the expected five blocks is <25 billion. Cron is consuming a multiple of the target computational demands for all block validation.
A fast and predictable block validation time is important to the ability of validating nodes to sync the chain quickly (especially when catching up), and critical for block producers to be able to produce new blocks for timely inclusion. Although there is significant buffer to account for network delays and the variability of expected consensus, cron execution is beginning to threaten chain quality and minimum hardware specs for validators.
The built-in market actor is consuming 85% of cron execution (73B gas / epoch). It performs deal maintenance (mainly transferring incremental payments) on a regular interval of 1 day for each deal. This is far from the kind of critical network service for which cron was intended, and a complete waste for the 98% of deals that have zero fee.
The number of deals brokered by this built-in market have increased greatly over the past year. This is a 🍾 problem, but we must address this risk to chain quality on the expectation that the deal count will grow a lot more.
Proposal
The propsed resolution to this problem is in two steps: a quick short-term mitigation to buy time, then a permanent reworking of the built-in market actor.
Short-term mitigation
Increase the interval at which the built-in market performs deal maintenance from 1 day to 30 days (or perhaps longer). We expect this to reduce the per-epoch gas consumption by a factor of ~30 (73B → 2.4B).
We we think we can do this without a migration of the market actor’s state by performing the rescheduling in actor execution during the first day after code changes are released in a network upgrade. The algorithm for doing this is TBC, but must maintain the property of uniformly distributing the work over the period, robust to client or provider attempts to manipulate the schedule.
Long-term fix
Add new deal settlement methods to the built-in market actor and remove automatic deal maintenance, and hence use of cron, entirely. This is a permanent fix to the market actor’s cron costs. Automatic deal payment processing is not something that Filecoin can support at greater scale
Note that other uses of cron (e.g. miner deadline maintnance) are also growing with time, and we expect to address them too in the future.
Discussion
Urgency for short-term mitigation
As the cron workload is expected to increase as more deals come on board, a short-term mitigation is necessary. Dividing the problem by 30 will give us some months to develop good APIs and code for the longer term solution, but attempting to implement to that long term solution now will take longer than we are comfortable with the current growth rates.
As a short-term fix, we’re aiming for maximum simplicity. Leaving the state schema and all essential operations intact makes this a very tightly scoped change that we can deliver with minimum risk.
We strongly recommend that the short-term mitigation be scheduled for network version 19, the upgrade immediately following the introduction of FEVM.
Choice of update period
One idea might be to increase the deal processing period much larger, in the hope of avoiding the need to implement a more permanent fix. We’ve declined this option because
Simplification
In addition to resolving a growing risk and removing a privilege enjoyed by the built-in market actor, removing cron processing from the built-in market will simplify that actor’s code, reducing maintenance burdens and the possibility of error.
Alternatives
We also considered an alternative of splitting the market’s deals into two groups: those that have non-zero payments to process (about 2% of deals today), and those that don’t.
We rejected this option because, like the proposed short-term fix, it’s not a permanent solution to the problem. We might expect the fraction of paid deals to increase over time. It’s also more complex than either the short-term fix or the proposed permanent resolution. This alternative would have the advantage of maintaining the built-in market’s current service of automatic payment processing, but we don’t believe that service is sustainable or appropriate for the future anyway.
Beta Was this translation helpful? Give feedback.
All reactions