Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase grpc max message size #212

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fhoering
Copy link
Contributor

@fhoering fhoering commented Apr 2, 2020

  • Currently the limit in Python is 4 MB which fails for retrieving large application logs

- Currently the limit in Python is 4 MB which fails for retrieving large application logs
@fhoering
Copy link
Contributor Author

fhoering commented May 18, 2020

@jcrist
Any news on that ? What about making it configurable ?

grpc_max_receive_message_length = os.environ.get("SKEIN_MAX_MSG_SIZE", 50 * 1024 * 1024) 
..
 ('grpc.max_receive_message_length', grpc_max_receive_message_length )
..

@jcrist
Copy link
Owner

jcrist commented May 18, 2020

Apologies, things have been a bit chaotic. Thinking about this a bit more, I think the method returning logs should be a streaming output method, so the msg size matters less. We might stream one response per container, which would hopefully stay below this limit. Thoughts?

@fhoering
Copy link
Contributor Author

Why not but I think one response by container could still easily go above 4 MB. It turns out even 50 MB seems not enough in some cases. We have jobs that produce logs for 24 hours and each spawned container produces the same amount of logs.

In that case we would even need to stream the response for each container in chunks. But that also means we have to merge it back on client side to have a cleaner API right ? Not sure a list of strings would be a nice API to use.

@jcrist
Copy link
Owner

jcrist commented May 18, 2020

Hmmm, yeah. In that case we may want to stream in fixed bytesize chunks (line-by-line seems too much to me). To the end user I think this should keep the same python api, so we'd have to reassemble in the client before returning.

I'm fine merging this as a stop-gap for now, but I think we should redesign the logs api to avoid this issue entirely.

@fhoering
Copy link
Contributor Author

fhoering commented May 19, 2020

OK. I agree it would be nice not having to specify this setting.

Having a quick look we could change the proto like this :

 rpc getLogs (LogsRequest) returns (stream LogsResponse);

message LogsResponse {
  string key = 1; //the container_id
  string data = 2; // one chunk of the log
}

Then send n LogResponse messages and aggregate this back to the client. Seems easy on Python side to just iterate over the messages and merge them. The only complication is to propagate this all the way down to the yarn api on Java side.

@jcrist
Copy link
Owner

jcrist commented May 19, 2020

👍 That rpc design makes sense to me.

@fhoering
Copy link
Contributor Author

OK. I also had a look at LogClient and it could also just produce a stream with Java8 stream API. I will work on it and propose something in another PR.

In the meantime, if you have time, please have a look at:
#213
As this also touches the log api and probably could produce conflicts.

@xabriel
Copy link

xabriel commented Jul 1, 2022

Hi folks,

I've been hitting this issue consistently, with my yarn jobs generating around 20MBs of logs.

It looks like there was agreement on merging the stop gap measure proposed in this PR. If that is still the case, can we please merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants