test_helper requests stop the server before logging can finish when the logger uses async methods #242

kybishop · 2013-05-11T20:17:58Z

I have a server that uses the Goliath::Request.log_block to do some post-response work. In that log block I eventually make redis calls with redis-rb (synchrony driver). The log block stops processing as soon it fires off the request to Redis.

Here is what I believe is happening:

The log block yields to the reactor after making the async call
The reactor defers to the test_helper request, which then stops the eventmachine, regardless of the log block's ongoing request

I've come up with this ugly hack to get the log block to finish, but I despise its implications:

with_api(RtbNicerApi) { get_request(request.dup) { EM::Synchrony.sleep 0.1 } }

This forces the callback to wait 1/10 of a second before terminating the eventmachine, giving the log block just enough time to finish . . but damn is it gross (sleeps in a test and depends on the log block finishing in under 0.1 seconds!)

Any ideas on how to check that logging has finished before terminating the eventmachine?

nolman · 2013-05-12T20:20:19Z

Ya, I believe you are right as to what is happening, here's a gist reproducing it:
https://gist.github.com/nolman/5564680

+1 to that implementation feeling icky, I can't think of a clean way to do it but I may be missing an obvious EventMachine handle. The only method I can think of is: http://eventmachine.rubyforge.org/EventMachine.html#connection_count-class_method which I am not sure we would want to use. I would have hoped that stop_event_loop would have waited for those callbacks to complete but it does not (it just calls EM::stop).

In the past I have dealt with this kind of need the reactor to keep running code by doing something like this:
https://gist.github.com/nolman/5564693
It basically makes the happy path fast but makes the failure a slow fail as it is waiting for some condition to be met. I wasn't super happy with it but it but happier than I was with the sleep $0.02.

I believe the right way to fix this would be a method in EventMachine that stopped the reactor from accepting new connections and bleed the existing callbacks down to 0 with an optional timeout.

kybishop · 2013-05-12T21:36:00Z

EDIT: I deleted my earlier comments on EM.connection_count as understand it a little better now, and think it might even be a reasonable way of checking if EM is actually done serving the request, but would like someone with more eventmachine experience to chime in. I'm trying to dig into why EM.connection_count == 1 when you call with_api, and why it jumps to 3 when called from within get_request when no async methods are called in the API. If I can establish a reliable pattern, something like EM.connection_count == 3 after a request truly finishes, I think EM.connection_count is a worthwhile approach.

I'll dig into eventmachine to see if I can come up with anything generalized; any words of advice greatly appreciated. Send a rescue party if I don't return in a few days :)

kybishop · 2013-05-13T00:18:36Z

The first connection comes from starting the server itself
The second comes from establishing the connection for the test_helper_request
~~The third comes from, and I don't fully understand this, something having to do with the callback of the em-http-request~~
EDIT: I believe the third is actually from the EM server receiving the request made to itself (one connection from the sender, one from the receiver)

Seeing how @igrigorik is the author of em-http-request 😃, perhaps he could chime in on the third, and whether or not there would be any gotchas to using EM.connection_count to see if the test request is truly finished. If my understanding is correct, EM.connection_count should indeed equal 3 when the request is truly finished . . unless I'm missing some way connections could be established.

I'll happily whip up a pull request if it seems reasonable to wait on EM.connection_count == 3 before calling EM.stop_server, implementing some form of user-alterable maximum wait time with a low default.

nolman · 2013-05-13T04:42:13Z

I wonder in testing a goliath server in https will start more connections? I suspect it will. I think the connection_count can be really dangerous to rely on, but I may very well be wrong. I am just not sure that the number will always be 3, it does seem a bit scary to me to add this code into goliath (even if only in test_helper). Curious what @dj2 and @igrigorik think.

Having played with netty a bit their default behavior when shutting down their reactor is to block until the descriptors have all been closed, it sucks if you have a bug in your code and you are leaking descriptors (but it is arguably worse if that bug makes it to production). I do think this is the right approach, either adding a new method or changing stop_event_loop to first wait for all in flight requests to complete.

I feel like this is something that should be fixed in eventmachine but I don't believe it would be a simple fix.

kybishop · 2013-05-13T17:00:08Z

Ah crap, the use of connection pools in the log block results in varied EM.connection_counts. I'm not quite sure what the options are from here, hopefully others will chime in. I'll keep looking for a solution in the meanwhile.

dj2 · 2013-05-13T17:03:46Z

I'd have to agree with @nolman, it sounds like the right solution is to fix this in EM itself. Possibly by adding an exit_when_all_connections_closed, or something similar?

kybishop · 2013-05-13T20:44:23Z

Sorry, didn't mean to close and re-open. My googling shows that many people opt to keep track of the connections themselves. I'm going to go with this solution in the short-term, but see it as a hack-job that doesn't address the true issue in test_helper/eventmachine.

I'd like to contribute back to the community, so I'll dig into eventmachine in my free time to see if I can come up with a graceful shutdown. I'm hoping eventmachine keeps track of registered callbacks and I can go from there. I can see how a lot of issues could arise from simply closing open connections, as that could stop internal requests mid-flight . . . need to somehow allow current requests to keep processing while denying new ones.

kybishop · 2013-05-14T18:19:45Z

Hey all, I thought of a rather simple workaround that doesn't require messing with the internals of eventmachine:

What if ~~test_helper requests~~ TestHelper#server wrapped Goliath::Request.log_block with a parent proc? The parent proc could set a variable in the test_helper class when the request is truly finished. From there, the callback could just call a #wait_until_log_block_finishes method, which invokes itself in a next_tick block until the log block is finished (or some maximum amount of ticks/time elapses?)

I propose this as a solution since it seems rather difficult to get eventmachine to perform a graceful shutdown.

kybishop closed this as completed May 13, 2013

kybishop reopened this May 13, 2013

This was referenced May 14, 2013

ensured the log block finishes in test helper requests #245

Closed

Can't stop eventmachine without killing in-flight requests eventmachine/eventmachine#440

Open

kybishop mentioned this issue Jun 11, 2013

Getting DynamoDB responses with status 0 for requests that work with the default net_http_handler JoshMcKin/em_aws#20

Closed

kybishop mentioned this issue Oct 22, 2013

Ensure the log block finishes when issuing test requests #261

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_helper requests stop the server before logging can finish when the logger uses async methods #242

test_helper requests stop the server before logging can finish when the logger uses async methods #242

kybishop commented May 11, 2013

nolman commented May 12, 2013

kybishop commented May 12, 2013

kybishop commented May 13, 2013

nolman commented May 13, 2013

kybishop commented May 13, 2013

dj2 commented May 13, 2013

kybishop commented May 13, 2013

kybishop commented May 14, 2013

test_helper requests stop the server before logging can finish when the logger uses async methods #242

test_helper requests stop the server before logging can finish when the logger uses async methods #242

Comments

kybishop commented May 11, 2013

nolman commented May 12, 2013

kybishop commented May 12, 2013

kybishop commented May 13, 2013

nolman commented May 13, 2013

kybishop commented May 13, 2013

dj2 commented May 13, 2013

kybishop commented May 13, 2013

kybishop commented May 14, 2013