-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-86809: Add support for HTTP Range header in HTTPServer #118949
base: main
Are you sure you want to change the base?
Conversation
I didn't use it, but LGTM. Except I would prefer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I've just reviewed at the same time you marked some comments as outdated)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments (I won't be available afterwards but I may comment tomorrow)
Thanks for the suggestions, I've made some changes based on them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aaaand some final comments I think. Sorry to break down my reviews like that but I usually comment incrementally but I don't want people to be frustrated by my nitpicking (also, I don't see the point of reviewing something that could change after addressing other comments).
No worries, and thanks a lot for the detailed suggestions for my code! Your suggestions are very helpful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't have more comments to add (but as always, it's easy to miss something obvious when you've reviewd the same thing multiple times).
@arhadthedev Please review if possible. |
Ping! (I hope it's ok to ping every 6 months or so, especially if it seems to be so nearly done :) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I've got more insights on how CPython's workflow works, I'd like you to address the following changes.
Misc/NEWS.d/next/Library/2024-05-12-00-15-44.gh-issue-86809._5vdGa.rst
Outdated
Show resolved
Hide resolved
Additional work is needed (work that I wasn't aware of in May...)
… change function name
# parse_range() collapses (None, None) to None | ||
# and thus `end` can not be None here | ||
start = max(0, fs.st_size - end) | ||
end = fs.st_size - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, we can have fs.st_size = 0
. With bytes=-N
for
start = max(0, fs.st_size - N) # = 0
end = fs.st_size - 1 # = -1
So, we should do something in this case (I don't know what the RFC says in this case). We could just sent something empty (mimicking ""[:-N]
in Python) but maybe we should also sent a 416 instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation would hit the start >= fs.st_size condition below and send 416 Range Not Satisfiable for empty files. This was one reason why I wanted “Content-Range: bytes */0” to be included.
Another option would be to ignore the range request and send 200 Okay for empty files, or even all files where the requested range encompasses the whole file. This could be more complicated to implement, but would allow including fields like Content-Type in the response for clients that just want to limit the response size (denoland/std#3354). And it would be more compliant with RFC 9110, which explicitly says “Range: bytes=-N” (N > 0) is “satisfiable” for a zero-length representation, and implies 416 Range Not Satisfiable is mainly for when no ranges are “satisfiable”. (206 Partial Content is not possible with an empty range, because RFC 9110 requires Content-Range with 0 <= start <= end < size.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To complete my previous comment:
For GET requets, the RFC (https://datatracker.ietf.org/doc/html/rfc9110#section-14.1.2-12) says:
When a selected representation has zero length, the only satisfiable form of range-spec in a GET request is a suffix-range with a non-zero suffix-length.
So, bytes=-0
is indeed not allowed (and must lead to 416). As you observed, this is stressed by https://datatracker.ietf.org/doc/html/rfc9110#section-14.1.2-7:
If the selected representation is shorter than the specified suffix-length, the entire representation is used.
So, bytes=-N
for
Now, for readers, the current implementation would unfold as follows for bytes=-10
:
if self._range:
start, end = self._range # start = None, end = -10
if start is None: # <---------- branched
start = max(0, fs.st_size - end) # start = 0, end = -10
end = fs.st_size - 1 # end = -1
if start >= fs.st_size: # 0 >= 0
f.close()
headers = [('Content-Range', f'bytes */{fs.st_size}')]
self.send_error(HTTPStatus.REQUESTED_RANGE_NOT_SATISFIABLE,
extra_headers=headers)
return None
if end is None or end >= fs.st_size:
end = fs.st_size - 1
We should add another if-block before start >= fs.st_size
and send a regular 200 response with the entire content (no partial content since it's not allowed in this case, and not a 416 since it's satisfiable).
For that, how about we check that start
was None (and not that end == fs.st_size
) and that end - start + 1 >= fs.st_size
? I think it would branch correctly and this would be as if no range request was specified at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn’t follow your conditions for sending the full non-range response. I’m thinking something like the following:
if self._range:
if start is None:
start = max(0, fs.st_size - end)
end = fs.st_size - 1
elif end is None or end >= fs.st_size:
end = fs.st_size - 1
if start == 0 and end >= fs.st_size - 1:
self._range = None # Send entire file
elif start >= fs.st_size:
# 416 Range No Satisfiable
return None
if self._range:
self._range = (start, end)
# 206 Partial Content
else:
# 200 Okay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m thinking something like the following
This approach avoids sending the 200 response in several different places, I've used it and added some testcases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR, can we actually a link that that comment to indicate the rationale of this choice? Since the PR has quite a lot of comments, it'd be easier if we want to find the rationale in the future.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the special fs.st_size == 0
and a bit more coverage, I think we're almost there.
We probably changed it historically though I'm not sure we are still changing it for every minor feature. Versioning is now delegated to git and Python versions themselve. We can increase the number but I wouldn't bother (check git blame to see when it was last updated for instance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The coverage is good but the test function is a bit big. Maybe it's better to split into multiple functions (for instance, we can also name it test_single_range_get_*
and have a test_multi_range_get
case which only verifies that we ignore that part for now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there. The remaining are just nitpicks to ease future extensions and for tracking future issues if any.
self.check_status_and_reason(response, HTTPStatus.REQUESTED_RANGE_NOT_SATISFIABLE) | ||
|
||
response = self.request(route, headers={'Range': 'bytes=4-3'}) | ||
self.check_status_and_reason(response, HTTPStatus.OK, data=self.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a final look at the RFC to check that everything was fine. The RFC says:
An int-range is invalid if the last-pos value is present and less than the first-pos.
A ranges-specifier is invalid if it contains any range-spec that is invalid or undefined for the indicated range-unit.
A valid ranges-specifier is "satisfiable" if it contains at least one range-spec that is satisfiable, as defined by the indicated range-unit. Otherwise, the ranges-specifier is "unsatisfiable".
Now, the RFC recommends:
A server that supports range requests MAY ignore or reject a Range header field that contains an invalid ranges-specifier (Section 14.1.1), a ranges-specifier with more than two overlapping ranges, or a set of many small ranges that are not listed in ascending order, since these are indications of either a broken client or a deliberate denial-of-service attack (Section 17.15).
and
A server that supports range requests MAY ignore a Range header field when the selected representation has no content (i.e., the selected representation's data is of zero length).
We decided to ignore range specifiers that are invalid (for instance bytes=5-4
is invalid but not unsatisfiable). Can you perhaps throw a comment in the code (in parse_range()
) (not the test) saying that this was Python's choice please? (and link to https://datatracker.ietf.org/doc/html/rfc9110#name-range)
response = self.request(empty_path, headers={'Range': 'bytes=-512'}) | ||
self.check_status_and_reason(response, HTTPStatus.OK, data=b'') | ||
|
||
response = self.request(empty_path, headers={'Range': 'bytes=1-2'}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to also add a test when using bytes=5-4
since for non-empty files, we ignore them (the range specifier is invalid as 5 > 4 and we decided to ignore and not reject invalid ranges). The reason I'm asking this is that parse_range()
would return None in that case instead (so we'll never enter the if self._range:
block)
# parse_range() collapses (None, None) to None | ||
# and thus `end` can not be None here | ||
start = max(0, fs.st_size - end) | ||
end = fs.st_size - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR, can we actually a link that that comment to indicate the rationale of this choice? Since the PR has quite a lot of comments, it'd be easier if we want to find the rationale in the future.
Add support for the HTTP
Range
header to SimpleHTTPServer to solve #86809