Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-86809: Add support for HTTP Range header in HTTPServer #118949

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

lyc8503
Copy link
Contributor

@lyc8503 lyc8503 commented May 11, 2024

Add support for the HTTP Range header to SimpleHTTPServer to solve #86809

@imba-tjd
Copy link
Contributor

imba-tjd commented May 23, 2024

I didn't use it, but LGTM. Except I would prefer parse_range than get_range.

Lib/http/server.py Outdated Show resolved Hide resolved
Lib/http/server.py Outdated Show resolved Hide resolved
Lib/http/server.py Outdated Show resolved Hide resolved
@lyc8503 lyc8503 requested a review from picnixz May 23, 2024 14:30
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I've just reviewed at the same time you marked some comments as outdated)

@lyc8503 lyc8503 requested a review from picnixz May 23, 2024 15:36
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments (I won't be available afterwards but I may comment tomorrow)

Lib/http/server.py Outdated Show resolved Hide resolved
Lib/http/server.py Outdated Show resolved Hide resolved
Lib/http/server.py Outdated Show resolved Hide resolved
@lyc8503
Copy link
Contributor Author

lyc8503 commented May 23, 2024

Thanks for the suggestions, I've made some changes based on them.

@lyc8503 lyc8503 requested a review from picnixz May 23, 2024 16:37
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaaand some final comments I think. Sorry to break down my reviews like that but I usually comment incrementally but I don't want people to be frustrated by my nitpicking (also, I don't see the point of reviewing something that could change after addressing other comments).

Lib/http/server.py Outdated Show resolved Hide resolved
Lib/test/test_httpservers.py Outdated Show resolved Hide resolved
Lib/test/test_httpservers.py Show resolved Hide resolved
@lyc8503
Copy link
Contributor Author

lyc8503 commented May 24, 2024

Aaaand some final comments I think. Sorry to break down my reviews like that but I usually comment incrementally but I don't want people to be frustrated by my nitpicking (also, I don't see the point of reviewing something that could change after addressing other comments).

No worries, and thanks a lot for the detailed suggestions for my code! Your suggestions are very helpful.

@lyc8503 lyc8503 requested a review from picnixz May 24, 2024 07:40
picnixz
picnixz previously approved these changes May 24, 2024
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't have more comments to add (but as always, it's easy to miss something obvious when you've reviewd the same thing multiple times).

@hondogitsune
Copy link

hondogitsune commented Jun 28, 2024

@arhadthedev
@JelleZijlstra
This would close a nearly 4 year old issue and enhance cpython with one of the most basic features of HTTP servers, range requests.

Please review if possible.

@JCash
Copy link

JCash commented Dec 8, 2024

Ping! (I hope it's ok to ping every 6 months or so, especially if it seems to be so nearly done :) )

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I've got more insights on how CPython's workflow works, I'd like you to address the following changes.

Lib/http/server.py Outdated Show resolved Hide resolved
Lib/test/test_httpservers.py Outdated Show resolved Hide resolved
Lib/http/server.py Outdated Show resolved Hide resolved
Lib/http/server.py Outdated Show resolved Hide resolved
Lib/http/server.py Show resolved Hide resolved
Lib/http/server.py Outdated Show resolved Hide resolved
Lib/http/server.py Show resolved Hide resolved
@picnixz picnixz dismissed their stale review December 8, 2024 23:37

Additional work is needed (work that I wasn't aware of in May...)

Lib/http/server.py Outdated Show resolved Hide resolved
Copy link

cpython-cla-bot bot commented Dec 15, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

Lib/http/server.py Outdated Show resolved Hide resolved
Lib/http/server.py Outdated Show resolved Hide resolved
Lib/http/server.py Show resolved Hide resolved
Lib/http/server.py Outdated Show resolved Hide resolved
Doc/library/http.server.rst Outdated Show resolved Hide resolved
@lyc8503
Copy link
Contributor Author

lyc8503 commented Jan 1, 2025

All comments solved. Another ping here :) @picnixz @vadmium

@picnixz picnixz self-requested a review January 1, 2025 17:36
Doc/library/http.server.rst Outdated Show resolved Hide resolved
Doc/whatsnew/3.14.rst Outdated Show resolved Hide resolved
Doc/whatsnew/3.14.rst Show resolved Hide resolved
Lib/http/server.py Outdated Show resolved Hide resolved
Lib/http/server.py Outdated Show resolved Hide resolved
Lib/http/server.py Outdated Show resolved Hide resolved
Lib/http/server.py Outdated Show resolved Hide resolved
Lib/http/server.py Show resolved Hide resolved
Lib/http/server.py Outdated Show resolved Hide resolved
@lyc8503 lyc8503 requested review from picnixz and vadmium January 2, 2025 08:19
Doc/library/http.server.rst Outdated Show resolved Hide resolved
Lib/http/server.py Outdated Show resolved Hide resolved
Lib/http/server.py Outdated Show resolved Hide resolved
Lib/http/server.py Outdated Show resolved Hide resolved
Lib/http/server.py Outdated Show resolved Hide resolved
Lib/http/server.py Outdated Show resolved Hide resolved
Lib/test/test_httpservers.py Show resolved Hide resolved
# parse_range() collapses (None, None) to None
# and thus `end` can not be None here
start = max(0, fs.st_size - end)
end = fs.st_size - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, we can have fs.st_size = 0. With bytes=-N for $N\geq0$, we have:

start = max(0, fs.st_size - N)  # = 0
end = fs.st_size - 1            # = -1

So, we should do something in this case (I don't know what the RFC says in this case). We could just sent something empty (mimicking ""[:-N] in Python) but maybe we should also sent a 416 instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation would hit the start >= fs.st_size condition below and send 416 Range Not Satisfiable for empty files. This was one reason why I wanted “Content-Range: bytes */0” to be included.

Another option would be to ignore the range request and send 200 Okay for empty files, or even all files where the requested range encompasses the whole file. This could be more complicated to implement, but would allow including fields like Content-Type in the response for clients that just want to limit the response size (denoland/std#3354). And it would be more compliant with RFC 9110, which explicitly says “Range: bytes=-N” (N > 0) is “satisfiable” for a zero-length representation, and implies 416 Range Not Satisfiable is mainly for when no ranges are “satisfiable”. (206 Partial Content is not possible with an empty range, because RFC 9110 requires Content-Range with 0 <= start <= end < size.)

Copy link
Member

@picnixz picnixz Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To complete my previous comment:

For GET requets, the RFC (https://datatracker.ietf.org/doc/html/rfc9110#section-14.1.2-12) says:

When a selected representation has zero length, the only satisfiable form of range-spec in a GET request is a suffix-range with a non-zero suffix-length.

So, bytes=-0 is indeed not allowed (and must lead to 416). As you observed, this is stressed by https://datatracker.ietf.org/doc/html/rfc9110#section-14.1.2-7:

If the selected representation is shorter than the specified suffix-length, the entire representation is used.

So, bytes=-N for $N&gt;0$ should be accepted and we should return the entire content (which in this case would be nothing, but it's still fine).


Now, for readers, the current implementation would unfold as follows for bytes=-10:

if self._range:
    start, end = self._range  # start = None, end = -10
    if start is None:  # <---------- branched
        start = max(0, fs.st_size - end)  # start = 0, end = -10
        end = fs.st_size - 1  # end = -1
    if start >= fs.st_size:  # 0 >= 0
        f.close()
        headers = [('Content-Range', f'bytes */{fs.st_size}')]
        self.send_error(HTTPStatus.REQUESTED_RANGE_NOT_SATISFIABLE,
                        extra_headers=headers)
        return None
    if end is None or end >= fs.st_size:
        end = fs.st_size - 1

We should add another if-block before start >= fs.st_size and send a regular 200 response with the entire content (no partial content since it's not allowed in this case, and not a 416 since it's satisfiable).

For that, how about we check that start was None (and not that end == fs.st_size) and that end - start + 1 >= fs.st_size? I think it would branch correctly and this would be as if no range request was specified at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t follow your conditions for sending the full non-range response. I’m thinking something like the following:

if self._range:
    if start is None:
        start = max(0, fs.st_size - end)
        end = fs.st_size - 1
    elif end is None or end >= fs.st_size:
        end = fs.st_size - 1
    if start == 0 and end >= fs.st_size - 1:
        self._range = None  # Send entire file
    elif start >= fs.st_size:
        # 416 Range No Satisfiable
        return None
if self._range:
    self._range = (start, end)
    # 206 Partial Content
else:
    # 200 Okay

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m thinking something like the following

This approach avoids sending the 200 response in several different places, I've used it and added some testcases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR, can we actually a link that that comment to indicate the rationale of this choice? Since the PR has quite a lot of comments, it'd be easier if we want to find the rationale in the future.

Lib/http/server.py Outdated Show resolved Hide resolved
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the special fs.st_size == 0 and a bit more coverage, I think we're almost there.

Lib/http/server.py Outdated Show resolved Hide resolved
@imba-tjd
Copy link
Contributor

imba-tjd commented Jan 3, 2025

图片

Does this need to be changed?

@picnixz
Copy link
Member

picnixz commented Jan 3, 2025

We probably changed it historically though I'm not sure we are still changing it for every minor feature. Versioning is now delegated to git and Python versions themselve. We can increase the number but I wouldn't bother (check git blame to see when it was last updated for instance)

Doc/library/http.server.rst Outdated Show resolved Hide resolved
@lyc8503
Copy link
Contributor Author

lyc8503 commented Jan 7, 2025

Does this need to be changed?
We can increase the number but I wouldn't bother (check git blame to see when it was last updated for instance)

This field has not been changed for 17 years😂. I guess we can leave it as is.

image

@lyc8503 lyc8503 requested review from picnixz and vadmium January 9, 2025 05:45
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The coverage is good but the test function is a bit big. Maybe it's better to split into multiple functions (for instance, we can also name it test_single_range_get_* and have a test_multi_range_get case which only verifies that we ignore that part for now)

Lib/test/test_httpservers.py Outdated Show resolved Hide resolved
Lib/test/test_httpservers.py Outdated Show resolved Hide resolved
Lib/test/test_httpservers.py Outdated Show resolved Hide resolved
@lyc8503 lyc8503 requested a review from picnixz January 12, 2025 06:14
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there. The remaining are just nitpicks to ease future extensions and for tracking future issues if any.

self.check_status_and_reason(response, HTTPStatus.REQUESTED_RANGE_NOT_SATISFIABLE)

response = self.request(route, headers={'Range': 'bytes=4-3'})
self.check_status_and_reason(response, HTTPStatus.OK, data=self.data)
Copy link
Member

@picnixz picnixz Jan 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a final look at the RFC to check that everything was fine. The RFC says:

An int-range is invalid if the last-pos value is present and less than the first-pos.

A ranges-specifier is invalid if it contains any range-spec that is invalid or undefined for the indicated range-unit.

A valid ranges-specifier is "satisfiable" if it contains at least one range-spec that is satisfiable, as defined by the indicated range-unit. Otherwise, the ranges-specifier is "unsatisfiable".

Now, the RFC recommends:

A server that supports range requests MAY ignore or reject a Range header field that contains an invalid ranges-specifier (Section 14.1.1), a ranges-specifier with more than two overlapping ranges, or a set of many small ranges that are not listed in ascending order, since these are indications of either a broken client or a deliberate denial-of-service attack (Section 17.15).

and

A server that supports range requests MAY ignore a Range header field when the selected representation has no content (i.e., the selected representation's data is of zero length).

We decided to ignore range specifiers that are invalid (for instance bytes=5-4 is invalid but not unsatisfiable). Can you perhaps throw a comment in the code (in parse_range()) (not the test) saying that this was Python's choice please? (and link to https://datatracker.ietf.org/doc/html/rfc9110#name-range)

response = self.request(empty_path, headers={'Range': 'bytes=-512'})
self.check_status_and_reason(response, HTTPStatus.OK, data=b'')

response = self.request(empty_path, headers={'Range': 'bytes=1-2'})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to also add a test when using bytes=5-4 since for non-empty files, we ignore them (the range specifier is invalid as 5 > 4 and we decided to ignore and not reject invalid ranges). The reason I'm asking this is that parse_range() would return None in that case instead (so we'll never enter the if self._range: block)

# parse_range() collapses (None, None) to None
# and thus `end` can not be None here
start = max(0, fs.st_size - end)
end = fs.st_size - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR, can we actually a link that that comment to indicate the rationale of this choice? Since the PR has quite a lot of comments, it'd be easier if we want to find the rationale in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants