HTTP filter manager: Support a "continue only the current filter" mode #15798

gbrail · 2021-04-01T01:45:48Z

Title: Support a way to continue encoding/decoding, but only for the current filter

Description:
In the ext_proc filter, the protocol first sends a "request_headers" message to the external processing server, waits for a response, and then decides what to do with the request body. It then may or may not send a separate message to handle the body, and so on for the response path and perhaps the trailers.

Here is one typical series of calls:

decodeHeaders() called
Dispatch async gRPC to server
Return StopAllIterationAndWatermark
Receive async gRPC reply
continueDecoding()
decodeData() called with end_stream = false -- return StopIterationAndBuffer
decodeData() called with end_stream = true
Dispatch async gRPC with body
Return StopIterationAndBuffer
Receive async gRPC reply
continueDecoding()
... Same thing on the response path ...

The problem is that at step 5 (continueDecoding) the headers are dispatched down the filter chain and can no longer be modified. What I'd like, instead, is that if the filter is configured to buffer the body that the headers will be retained until the response to the body gRPC (sent in step 8) is returned, and then everything continues. That way an external processor can modify the headers after inspecting and possibly modifying the body too.

I can almost make this happen by returning StopEncoding in step 3. But in that case, decodeData may be called before the async gRPC returns, and at that point we don't know for sure whether the server wants us to handle the body at all, or just stream it through. So if we do that, then we have to compromise on other important parts of the protocol.

In other words we get this:

decodeHeaders() called
Dispatch async gRPC to server
Return StopIteration
decodeData() called -- here we must assume that the server wants to process the body even if it doesn't. That may mean buffering things that we don't want to buffer, or forcing developers to pre-configure the processing mode for the filter rather than change it on the fly.
... after this we can probably make it work, but not without changing some important aspects of the protocol, or making the filter a lot more complicated...

What I'd rather do is replace step 5 -- the first continueDecoding -- with something like "continueLocalDecoding". This should result in my decodeData() callbacks being enabled, but should NOT result in sending the headers further down (or up) the filter chain until decodeData() returns a continue status or continueDecoding() is called properly. This is kind of like changing the "StopAllIterationAndWatermark" status into "StopIteration" status.

I THINK that this might be a simple change in filter_manager.cc to have some more options to "commonContinue". However, I'm not totally sure -- perhaps other things need to change too.

Relevant Links:

ext_proc users are asking about this:

#15569

wbpcode · 2021-04-01T02:28:02Z

May be we can just enhance ext_proc to support mode that send entire request and let developers to pre-configure it:

decodeHeaders() called
Return Stop StopIteration.
decodeData() called with end_stream = false
return StopIterationAndBuffer
decodeData() called with end_stream = true
Dispatch async gRPC (entire request) to server
Receive async gRPC (entire request) reply
continueDecoding.

It should be an acceptable overhead to always cache the body when the headers need to be updated based on the results of the body GRPC reply or user need entire request.

here is similar issues: #15793

snowp · 2021-04-01T19:22:51Z

Do we not already support this via the StopIteration return value? The current filter will block header iteration until continue is called, but will still receive data + trailers. It means that the ext_proc filter will need to manage receiving data etc. while there might be a pending ext_proc gRPC call, but I think this would give you the flexibility to do what you want here?

@alyssawilk might have thoughts here as well

gbrail · 2021-04-01T19:37:50Z

The problem I have now is that I don't necessarily know what to do with the body chunks that come in to the filter until the reply comes back from the "request headers" message, and I don't want to just buffer the body in case it's too large for the current buffer size.

I was hoping for an easy way out, by returning StopAllIterationAndWatermark from the headers callback, and then adding some way to only partially "continue" encoding or decoding by changing the filter state so that it now behaves as if I just returned StopIteration from that callback.

I'll keep working out how ext_proc might work without this. But I also tried experimenting with changes to FilterManager and it's not as easy as I had hoped!

PiotrSikora · 2021-04-02T01:47:38Z

The whole "store-and-forward" flow doesn't really work for request/responses with unbounded sizes, so I'd highly discourage it unless you have to transcode body in a non-streaming way (e.g. resize image), but even then you usually don't need to block headers.

Also, unless I'm missing something obvious, returning StopIterationAndBuffer means that ext_proc is flaky and doesn't work in a "real world", since request/response are terminated with 413/500 once they reach the buffer limit, which Envoy recommends to set to 32 KiB at the edge.

gbrail · 2021-04-02T17:47:12Z

ext_proc is deliberately designed to give the server control over what callbacks and data it gets. Receiving a buffered body is one of the options that the server can choose. It's not the only option the we designed (although right now it's the only one implemented) and it's not the default -- by default ext_proc doesn't do anything with the bodies at all and doesn't buffer them.

In a lot of use cases when it's required to transform an HTTP body, it's often the case that buffering is the only choice -- there are very few data transformation technologies that operate on streaming data. I think it's important that ext_proc offer this option, but only as an option.

dragonbone81 · 2021-04-15T17:20:59Z

Hi, thanks for working on this :)
@PiotrSikora I'm curious about your comment. We have a use-case where we want to transform the body of the request, however since we are changing the body of the request we are required to change a field in the header (a crc32 hash of the body). Would there be any way to approach this besides buffering the body + headers?
Would you also discourage doing it this way in lua, which does support buffering the whole body and headers?

@gbrail We are also trying to decide between using the ext_proc filter and using lua, which does support the above. Is there any indication on how long this enhancement would take? Thanks!

gbrail · 2021-04-16T00:47:22Z

I have a PR in flight that lets ext_proc process headers along with buffered bodies by using the existing callback structure, so I'm going to close this issue.

Here's the PR:

#15935

it seems to be working OK so I suspect that it'll be ready in the next few days.

PiotrSikora · 2021-04-16T08:06:58Z

@dragonbone81 for something like checksum, the better approach woud be to send it in HTTP trailers, after the body, assuming that the client supports them.

dragonbone81 · 2021-04-16T16:17:04Z

@PiotrSikora Trailers would be a good solution I think, but I don't think any of the envoy filters like wasm or ext_proc support them right now (except lua).
However do you think it would be a huge gain by using them? We'd still need to completely buffer the body in the proxy in order to read and modify it. In the ext_proc filter we could buffer the body inside the grpc server, but it would still need to be buffered. And in the lua case it just buffered inside envoy.

So is getHeaders->sendHeaders->getBody/modifyBody/sendBody->sendTrailers faster than
getHeaders->getBody/modifyBody/sendBody/sendHeaders
Forgive if my understanding of this is extremely rudimentary.

PiotrSikora · 2021-04-18T08:44:35Z

@PiotrSikora Trailers would be a good solution I think, but I don't think any of the envoy filters like wasm or ext_proc support them right now (except lua).

Proxy-Wasm supports trailers, but you're right, it looks that ext_proc doesn't support them yet.

However do you think it would be a huge gain by using them? We'd still need to completely buffer the body in the proxy in order to read and modify it. In the ext_proc filter we could buffer the body inside the grpc server, but it would still need to be buffered. And in the lua case it just buffered inside envoy.

So is getHeaders->sendHeaders->getBody/modifyBody/sendBody->sendTrailers faster than
getHeaders->getBody/modifyBody/sendBody/sendHeaders
Forgive if my understanding of this is extremely rudimentary.

That really depends on what kind of transformations are you doing. A lot of things can be done in a streaming fashing (e.g. (de)compression, chunk-by-chunk transcoding, HTML rewriting, etc.), in which case you don't need to buffer the complete body, and you could calculate the checksum in a streaming fashion as well.

But if for you need to buffer the complete body (e.g. you're doing on-the-fly image optimzation), then it won't matter much.

dragonbone81 · 2021-04-19T17:05:21Z

Awesome thank you for the help and insights!

gbrail added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Apr 1, 2021

bladedancer mentioned this issue Apr 6, 2021

OnRequestBody not called if onRequestHeaders returns stopIteration proxy-wasm/proxy-wasm-cpp-host#143

Open

gbrail closed this as completed Apr 16, 2021

TAOXUY mentioned this issue Feb 4, 2023

Support StopIteration Headers until continueDecoding from data for ExtProc #25275

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP filter manager: Support a "continue only the current filter" mode #15798

HTTP filter manager: Support a "continue only the current filter" mode #15798

gbrail commented Apr 1, 2021

wbpcode commented Apr 1, 2021 •

edited

Loading

snowp commented Apr 1, 2021

gbrail commented Apr 1, 2021 •

edited

Loading

PiotrSikora commented Apr 2, 2021

gbrail commented Apr 2, 2021

dragonbone81 commented Apr 15, 2021

gbrail commented Apr 16, 2021

PiotrSikora commented Apr 16, 2021

dragonbone81 commented Apr 16, 2021 •

edited

Loading

PiotrSikora commented Apr 18, 2021

dragonbone81 commented Apr 19, 2021

HTTP filter manager: Support a "continue only the current filter" mode #15798

HTTP filter manager: Support a "continue only the current filter" mode #15798

Comments

gbrail commented Apr 1, 2021

wbpcode commented Apr 1, 2021 • edited Loading

snowp commented Apr 1, 2021

gbrail commented Apr 1, 2021 • edited Loading

PiotrSikora commented Apr 2, 2021

gbrail commented Apr 2, 2021

dragonbone81 commented Apr 15, 2021

gbrail commented Apr 16, 2021

PiotrSikora commented Apr 16, 2021

dragonbone81 commented Apr 16, 2021 • edited Loading

PiotrSikora commented Apr 18, 2021

dragonbone81 commented Apr 19, 2021

wbpcode commented Apr 1, 2021 •

edited

Loading

gbrail commented Apr 1, 2021 •

edited

Loading

dragonbone81 commented Apr 16, 2021 •

edited

Loading