Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exceptions in Probe.run_command_on_host context are shadowing exceptions from command on host #611

Open
approxit opened this issue Mar 3, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@approxit
Copy link

approxit commented Mar 3, 2023

While firefighting with @lucekdudek goth problems in golemfactory/yapapi#1086 we found a typo, where test should encounter error when starting command on host, but somehow we were getting pattern awaiting timeouts. Pattern awaiting should not even happen, because command could not even start properly. We started searching and we have interesting finding.

This will be a complex one, buckle up!

Goth tests have this portion of code:

requestor = runner.get_probes(probe_type=RequestorProbe)[0]

async with requestor.run_command_on_host("some command") as (_cmd_task, cmd_monitor, _process_monitor):
    cmd_monitor.add_assertion(assert_no_errors)

    await cmd_monitor.wait_for_pattern("some pattern", timeout=20)

run_command_on_host() starts command on host asynchronously, it give us few bits to control that task in context manager. Let's see how context manager is constructed.

try:
    task = asyncio.create_task(cmd_task)
    yield task
    await task
except Exception:
    # do some cleaning about exceptions
    raise
finally:
    # close cmd fully

Let's assume that running cmd_task coroutine will raise some startup exception instantly. Because we're calling create_task, we're using asyncio.Task API, so to check for errors we should call task.result(), task.exception() or await that task. We're not doing this for now, and we're yielding task, giving out control from context manager "startup" to context manager's body. There, we're using .wait_for_pattern() stuff, that after some time will raise timeout exception. Raising uncaught exception in context manager body exists that context manager. So, we're back in context manager at the line with yield, but at this point we're not going further with code execution, we're raising timeout exception. This timeout exception is handled by try block, and finally our whole context manager exists with an timeout exception.

But wait, were is our task startup exception handling? Well, its inside of task, but we never checked it. Python's Task exception was never retrieved sound familiar? Well, this is the case. tasks startup exception will be retrieved only when context manager's body will exit itself without any exception. Then await task would happen, and we'll got startup exception.

Possible solutions

We have few options here, one of them is to change context manager to function decorator. In that case, cmd_task and function body could run asynchronously, and by using asyncio.wait(..., return_when=FIRST_EXCEPTION) we could control exceptions from both sides, and behave properly.

Other solution would be to check cmd_task error each time we are using any assertion-like function from goth.

@approxit approxit added the bug Something isn't working label Mar 3, 2023
@approxit
Copy link
Author

approxit commented Mar 7, 2023

I've just randomly stumbled upon de-facto solution for this problem.

Python 3.11 introduced asyncio.timeout, which is a context manager that can interrupt execution context body by... cancelling current task.

By reusing pattern of canceling current task, we could merge it with asyncio.wait(..., return_when=FIRST_EXCEPTION) idea from first proposed solution, where context body would be interrupted on cmd_task exceptions. In this way, we could archive the fix without any API changes, which is great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant