-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support byte ranges #306
Conversation
The failures in here are:
The overlap they have with this is that they use a |
That code is super old you don't need to use |
Looking at the IPIP-402 proposal, this is gonna be a bit tricky, cause when the node is a directory, |
One suggestion: maybe preload should only do directories. That's all you actually need it for. Then if there is no byte range selector, we simply match the whole file. That would be a breaking change to preload, so it'd need to get to boost for the purposes of GraphSync. |
This now depends on ipfs/go-unixfsnode#52 and also ipld/go-ipld-prime#529; those are what are being pointed to in the current go.mod. The test fixtures now include gobs of ranges reading from a file, including lots of boundary cases. Things still to do:
|
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #306 +/- ##
==========================================
+ Coverage 76.24% 76.67% +0.42%
==========================================
Files 85 86 +1
Lines 6378 6674 +296
==========================================
+ Hits 4863 5117 +254
- Misses 1245 1274 +29
- Partials 270 283 +13
|
Added support for the "or" part of 402 - when specifying entity-bytes and the terminus isn't a file, it should behave as if you didn't specify entity-bytes. This required removing the hard error in the slice matcher; see ipld/go-ipld-prime#529. |
Read aside from needing these merged:
The missing piece here is negative byte ranges. Those will be rejected as unsupported at the moment. Further work in a later PR. Important to note that only Bitswap requests (and some HTTP!) are going to work with this at the moment:
BUT ipld/frisbii#9 uses this, and I have a Frisbii node online with that, serving wikipedia and also birb.mp4 (bafybeic56z3yccnla3cutmvqsn5zy3g24muupcsjtoyp3pu5pm5amurjx4) and the fixture file in here (bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq) so all of the queries in the file |
ed5ecc5
to
ae84940
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need want to be putting a 20meg test fixture into this repo? it'll make the download quite a bit more for anyone including lassie
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking this was going to be handled by the go pkg install tooling and ignored since it's in testdata
, but I confirmed that it's in my ~/go/pkg when used as a dependency, which is .. unfortunate. I don't really want this to be a burden for downloaders.
We could git submodule it and put it somewhere else (I have more to make for this). Or do you have a better suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you have code for regenerating it, can you put the fixture itself in gitignore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit bespoke; I might be able to regenerate it exactly as is but it was made to be fixed so I could inspect it and manually make fixtures that point to it.
I've been thinking that maybe putting this into ipld/ipld along with other fixtures might be the way to deal with this. Then either export a golang package from there or submodule it like we do with go-ipld-prime. We can have a section about paths traversal and unixfs and the IPLD specifics of the trustless gateway spec as it relates to these matters. I was toying with the idea of an ipld/ipld-fixtures repo, but that's really what ipld/ipld was meant to be so maybe try and use that. I'll experiment a bit and see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment in my review: my question is: how would we add to this fixture set? It seems like if we can't find a way to regenerate, that makes the work a bit harder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I think we can safely assume there will be more test cases we haven't thought of yet coming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I wanted to build a corpus of actual data and descriptive tests that can be run against them—so they can be run any place, including Frisbii and others. We currently have a lot of random-generated on-the-fly data (which I keep on having to squash flakies on because of edge cases) but the intent here is to have something fixed that can be reasoned about and inspected separately to the code that generated it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I don't see any big red flags, but I think we need to talk through the implications of deploying this (esp in Saturn) before we merge.
@@ -0,0 +1,985 @@ | |||
# unixfs_20m_variety (test fixture) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the test cases results generated with your unixfs utility?
how would someone who wanted to make a change reproduce this or modify it? I'm concerned a little bit about lock in.
@@ -55,8 +55,6 @@ type Config struct { | |||
MaxBlocks uint64 // set a budget for the traversal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eek so, I just realized that we're gonna need to figure out what to do with .storage, as they are not yet supporting ranges. I can pop over and try to add it but... in the meantime, maybe we need to figure out some content negotiation in the http retriever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise we're looking at 100% failure here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100%. Bitswap will work, Frisbii will work, but yeah the rest will bork if we start pushing them through.
I had some thoughts about negotiation over in 412: ipfs/specs#412 (comment)
If we were to request a byte range but then be told the server doesn't support it, at least we could switch modes to parsing their response. What we do on the way out of Lassie is another matter—do we give the user more than they asked for and just give our own negotiation signal that we don't support range cause we couldn't find anyone else that does? Either way this is really hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically if we knew that we were getting a full response, we could modify the verifier to just skip blocks till you got to the expected blocks in a range request (since technically a full request contains all the blocks for a range request in the correct order, just with other blocks in between). In fact, turning this on when entity-bytes is present by default might be the hack needed to get this over the line.
Rebased to main and I extracted the large fixture file to ipld/ipld#290 (open for discussion!). I removed and squashed it out of the commits on this branch so it shouldn't weigh it down:
|
Ready for deeper review, current blockers for merge are these:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Initial exploration work; it seems like this might not be too tricky after all.