Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include Content-Length response header #151

Conversation

paulmillar
Copy link
Contributor

Motivation:

The 'Content-Length' response header is expected by many clients. While it is not possible to specify the content length for on-the-fly compression, it is possible when sending an individual file.

Modification:

Update DataSelection to optionally return the content length.

Update the 'getData' Response to include the Content-Length header when the request targets a single file that is not placed in a zip archive. If the content is dynamically generated (e.g., a zip file) then no Content-Length header is provided.

Result:

IDS now includes the Content-Length response header when this is possible

@paulmillar paulmillar marked this pull request as draft January 26, 2024 22:18
@RKrahl
Copy link
Member

RKrahl commented Jan 29, 2024

Many thanks for submitting this!

However, I do see an issue: if I remember the relevant HTTP RFCs correctly, it is not allowed to use chunked transfer encoding if the Content-Length header is provided. But I'd need to verify that.

As far as I can see in your changes, you only set the Content-Length header, but you don't change the way the file is sent, e.g. IDS would still be using chunked transfer encoding. Again, I didn't check this. If I'm not mistaken, that would be illegal.

@paulmillar
Copy link
Contributor Author

Yes, you're right: the patch only sets the Content-Length response header, and you're also right that sending the Content-Length response header is incompatible with using chunked encoding.

With the frameworks I've seen, setting the length is sufficient to trigger the switch from chunked encoding to normal streaming (e.g., no encoding). In effect, the frameworks default to streaming if the length is known and uses chunked encoding only if the length isn't known. For this framework, it looks like the response entity length is set simply by specifying the Content-Length header.

Unfortunately, I haven't found documentation describing how the framework selects between streaming and chunked encoding. Without this, it's hard to make concrete statements. Therefore, it would be a very good idea to test whether the patch works as expected.

That said, I haven't tried out the patch on a real IDS server, as I'm not sure how much effort would be required (I got IDS installed some time ago, but it was quite involved). I'd be very grateful if you could try out the patch and report back if there are problems.

@RKrahl
Copy link
Member

RKrahl commented Jan 29, 2024

In any case, I guess we should take your PR as an opportunity to review our test suite: there should be some sort of HTTP RFC conformance checking in the tests and I'm afraid, there is none at the moment. If all my assumptions from above turn out to be correct, then the test suite should have noticed the issue and fail.

@paulmillar
Copy link
Contributor Author

Personally, I think you should be a little careful, here.

It would certainly make sense to test that, when downloading a single file, the Content-Length header is sent with the correct value.

However, if you're testing that the bytes sent over the network are RFC compliant then (I would say) you are actually testing Jakata's implementation of JAX-RS, and not IDS.

@paulmillar paulmillar force-pushed the development/support-content-length-response branch from 88f869d to 0569b31 Compare January 29, 2024 22:01
@paulmillar
Copy link
Contributor Author

OK, I've updated the integration tests to verify the Content-Length and Transfer-Encoding response headers for the getData resource.

This should check the patch is working as expected, targetting your concern about generating a valid response.

@paulmillar paulmillar marked this pull request as ready for review January 30, 2024 06:13
@RKrahl
Copy link
Member

RKrahl commented Jan 30, 2024

Hi @paulmillar, it seems you are just too fast for us. When I wrote that we should review the test suite, I didn't meant to say that you need to do all the work alone. I don't have the capacity to do that myself at the moment, but I already asked @MLewerenzHZB to have a look.

@paulmillar
Copy link
Contributor Author

This is speculation, but what seems to be happening is the files are rather small (less than 2 KiB) so they fit within the Payara send buffer. Because of this, Payara knows the entity's length, so can set the Content-Length to the correct value and avoid using chunked encoding.

This is unfortunate, as it means we can't test whether the patch works like this.

Presumably, if the content was much larger (>16 KiB?), Payara would have to start sending data (over the network) before it had read all the OutputStream, forcing it to use chunked encoding.

If this is correct, then the tests would need to be updated, either to support sending "large" files or Payara configuration would need to be adjusted.

@LewerenzM
Copy link
Contributor

LewerenzM commented Jan 31, 2024

Hi,
i investigated a bit to find a reason for the failing tests. If my understanding is correct, the Content-Length header is added anyway for these test cases, because the zip file in the payload is short enough for not being chunked. So the condition that there is no such header isn’t fulfilled and the test fails.
Like you said, @paulmillar.
Even if i remove the manual adding of the header it is added by the framework.
Maybe creating an extra test class on such HTTP properties depending on different payloads makes it a bit easier to focus on these problems.

@RKrahl
Copy link
Member

RKrahl commented Feb 1, 2024

Maybe my previous comment was not very clear: I didn't meant to put a requirement that any zipped response should be chunked. If in any case, where we don't provide the length beforehand, the Jakarta framework somehow figures it out and adds a Content-Length instead of chunking the response, I'm perfectly happy with that.

I only had the concern (ill-founded as it seems) that your change might have the result that we get a response that has both, a Content-Length header and a chunked body, which would be illegal as I understand the relevant RFC. In order to be on the safe side, we might want to test that this does not happen.

So the test that I would suggest should not check that any zipped response comes without a Content-Length header. It should rather verify that any HTTP response (having a body) that we get from IDS, regardless of the original request, either has a Content-Length or a chunked body, but not both.

I just discussed that with @MLewerenzHZB: he will add such a check to the test suite independently of this PR.

Motivation:

The 'Content-Length' response header is expected by many clients.  While
it is not possible to specify the content length for on-the-fly
compression, it is possible when sending an individual file.

Modification:

Update DataSelection to optionally return the content length.

Update the 'getData' Response to include the `Content-Length` header
when the request targets a single file that is not placed in a zip
archive.  If the content is dynamically generated (e.g., a zip file)
then no `Content-Length` header is provided.

Result:

IDS now includes the `Content-Length` response header when this is
possible

Signed-off-by: Paul Millar <paul.millar@desy.de>
@paulmillar
Copy link
Contributor Author

I've added a basic framework to allow test-specific assertions on the HTTP responses. This is intended for situations where the TestingClient is interacting with IDS in a way that returns some content (e.g., getData methods).

You can see that framework in pull request #154. The pull request also adds HTTP response assertions some existing getData tests, to demonstrate that is is useful and feature-complete.

The framework in #154 would replace the rather rudimentary framework in this patch with something a bit more reasonable.

To summarise, the proposed strategy is to:

  1. Add the new testing framework by accepting Add framework to allow test-specific assertions on HTTP response. #154.
  2. Write extra unit tests to exercise the case when IDS/Payara doesn't cache the HTTP responses (the HTTP responses are chunked encoded). This would be an extra pull request.
  3. Update this pull request so that it applies the intended changes and updates only the direct file download tests (from step 2.).

I believe this approach would provide a high level of confidence that compressed and zipped output continues to be chunked encoded, while the direct file download is now transferred with the Content-Length header.

@paulmillar paulmillar force-pushed the development/support-content-length-response branch from 0569b31 to 7a3a28b Compare February 5, 2024 12:43
@paulmillar
Copy link
Contributor Author

I realised that there isn't an issue tied to this pull-request. Therefore, I've created #155 to capture the discussion.

@paulmillar
Copy link
Contributor Author

I'm closing this pull-request, in favour of the discussion in #155. This is because the best strategy for accepting patches isn't clear, and (I think) should be discussed in #155.

@paulmillar paulmillar closed this Feb 7, 2024
@paulmillar paulmillar deleted the development/support-content-length-response branch February 12, 2024 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants