Document requirements on miner SW to enable retrieval testing #1086

bajtos · 2024-11-27T08:33:48Z

bajtos
Nov 27, 2024

Context

When we set out to build Spark, a protocol for testing whether payload $^1$ of Filecoin deals can be retrieved back, we designed it based on how Boost worked at that time (mid-2023). Soon after FIL+ allocator compliance started to use Spark retrieval success score (Spark RSR) in mid-2024, we learned that Venus Droplet, an alternative miner software, is implemented slightly differently and requires tweaks to support Spark. Since there was no formal specification outlining what needed to be implemented, we relied on the Venus team to reverse engineer the Boost implementation. Fortunately, the requirements were relatively trivial, so this ended up well.

Things evolved quite a bit since then. We need to rehaul most of the Spark protocol to support Direct Data Onboarding deals. We will need all miner software projects (Boost, Curio, Venus) to accommodate the new requirements imposed by the upcoming Spark v2 release.

I want to create a document specifying what Spark needs from miner software and collaborate with the community to find solutions that work well for all parties involved. A FIP/FRC document seems like the right place for such discussion.

I also want this spec to be useful beyond Spark. It would be awesome if the spec and the building blocks empowered other builders to design & implement their own retrieval-checking networks as alternatives to Spark.

Goals

What I would like to get out of this GH discussion as the first step:

Hear from the FIPs editors whether this is the right place to discuss and host the specification. If yes, then:
I'd like to get your advice on how to create the first draft. I read FIP-0001 and proposing.md. Things I don't understand: How to find which number should be assigned to a new FRC document? What should be discussed in pull request comments, and what should be discussed in comment threads in this GH discussion?

Spec outline

In Spark v2, we need to:

Map MinerID like f0123 to IPNI index publisher PeerID like 12D3KooWLGTZ(...). Right now, we assume the miner is using the same libp2p identity for blockchain-related communication and IPNI advertisements.

This seems to work well for Boost and Venus, but Curio is asking for a different approach. I'd like to use the spec review process to discuss other options Spark can support.
We plan to use IPNI Reverse Index to discover payload $^1$ blocks inside a DDO deal we are checking. To support that, we need miners to construct IPNI ContextID deterministically from deal metadata - the tuple (PieceCID, PieceSize).
We are dropping support for the Graphsync protocol. We require miners to serve payload $^1$ retrievals using the IPFS Trustless HTTP Gateway protocol.

$^1$ What is a payload? Imagine you have a video file with CID bafybei(...) and you store it on Filecoin in a piece identified by a PieceCID (baga...). In Spark, we define the payload as the Merkle tree of blocks rooted in the CID of the original content, i.e. bafybei(...).

PDP and non-IPLD deals

I am aware that at some point in the near future (3-9 months?), we will need to figure out how to test retrievals of PDP deals and deals where Piece stores arbitrary payload that's not necessarily an IPFS/IPLD content. I would prefer to leave such topics out of this proposal & spec scope unless a retrievability score limited to deals storing IPLD data (i.e. our current vision for Spark v2) would not be useful.

For context, our current approach is to improve Spark incrementally. Spark v1 checks StorageMarket (f05) deals only. Spark v2 will add support for DDO deals. We plan to implement a Spark variant to check PDP deals in 2025. Similarly, we can later enhance Spark or create a Spark alternative to check deals storing non-IPLD data.

rvagg · 2024-11-27T23:52:17Z

rvagg
Nov 27, 2024
Maintainer

#1027 is an example of ongoing FRC development if that helps.

How to assign FIP number? Don't bother, leave it as "to be assigned" until told otherwise.
What should be discussed where? This is going to depend on who's actively participating from the editor side (there's been a slight changing-of-the-guard of late so there'll be evolution in the approaches I suspect) but the historical pattern from what I've observed is something like: discussion on specifics of the PR text should be discussed in the PR because that's what they're good at, but higher-level discussions should be done here in threads so they can sprawl and won't distract from getting the PR text correct. e.g. if someone objects to you trying to enforce a ContextID construction then do it here. If someone wants to nitpick about how to describe the bytes that make up ContextID then it could be done in the PR. You could even pre-emptively open discussion blocks here about certain aspects of the proposal that you think might be good to separate out.

My advice: just start an FRC roughly conforming to the examples already in that directory, leaving the ID unassigned (make up a filename that you can rename later), and we'll go from there. IMO an FRC should be a much more relaxed process than an FIP, aiming for a clear description of a proposed convention, getting healthy amount of discussion from stakeholders, and surfacing important divergences of opinion where they exist.

1 reply

bajtos Nov 28, 2024
Author

Thank you, @rvagg. Your comment is super useful; it is exactly the kind of information I was looking for!

bajtos · 2024-12-04T13:59:29Z

bajtos
Dec 4, 2024
Author

I submitted the first draft of my FRC proposal: #1089

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document requirements on miner SW to enable retrieval testing #1086

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Document requirements on miner SW to enable retrieval testing #1086

bajtos Nov 27, 2024

Replies: 2 comments · 1 reply

rvagg Nov 27, 2024 Maintainer

bajtos Nov 28, 2024 Author

bajtos Dec 4, 2024 Author

bajtos
Nov 27, 2024

Replies: 2 comments 1 reply

rvagg
Nov 27, 2024
Maintainer

bajtos Nov 28, 2024
Author

bajtos
Dec 4, 2024
Author