Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor test fixes to improve debuggability and readability #4862

Merged
merged 9 commits into from
Oct 22, 2024

Conversation

roypat
Copy link
Contributor

@roypat roypat commented Oct 21, 2024

Changes

  • more debug output if running SSH commands in the guest fails
  • relax timeouts
  • minor code cleanup

Reason

  • more debug information when SSH failures, and less spurious timeout failures

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this
    PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet
    contribution quality standards.

  • This functionality cannot be added in rust-vmm.

@roypat roypat added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Oct 21, 2024
Copy link

codecov bot commented Oct 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.07%. Comparing base (e2c787f) to head (c6bffea).
Report is 7 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4862   +/-   ##
=======================================
  Coverage   84.07%   84.07%           
=======================================
  Files         251      251           
  Lines       28052    28052           
=======================================
  Hits        23586    23586           
  Misses       4466     4466           
Flag Coverage Δ
5.10-c5n.metal 84.71% <ø> (+<0.01%) ⬆️
5.10-m5n.metal 84.69% <ø> (ø)
5.10-m6a.metal 84.00% <ø> (+<0.01%) ⬆️
5.10-m6g.metal 80.70% <ø> (ø)
5.10-m6i.metal 84.69% <ø> (ø)
5.10-m7g.metal 80.70% <ø> (ø)
6.1-c5n.metal 84.71% <ø> (+<0.01%) ⬆️
6.1-m5n.metal 84.69% <ø> (ø)
6.1-m6a.metal 84.00% <ø> (-0.01%) ⬇️
6.1-m6g.metal 80.69% <ø> (-0.01%) ⬇️
6.1-m6i.metal 84.68% <ø> (-0.01%) ⬇️
6.1-m7g.metal 80.69% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tests/host_tools/network.py Outdated Show resolved Hide resolved
tests/host_tools/network.py Outdated Show resolved Hide resolved
tests/framework/utils_vsock.py Show resolved Hide resolved
Print microvm logs and stack backtraces whenever running a command via
SSH inside the guest fails (instead of only doing this if
`wait_for_ssh_up` fails).

To this by adding an `on_error` hook that gets called from
`SSHConnection.check_output` if running the SSH command fails.
Potentially, this could later be extended to automatically snapshot the
microvm inside the hook, but for now, let's only print some data.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
If the exception ends up being fatal, then it will get printed in full
anyway, and it contains the entire log message anyway, and we just end
up with duplicate messages. If its not fatal (e.g. caught), then its
arguably up to the caller to decide whether it should be logged.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
@roypat roypat force-pushed the test-debug-improvements branch from a63bf07 to e41a36e Compare October 21, 2024 10:22
@roypat roypat requested a review from pb8o October 21, 2024 10:26
@roypat roypat force-pushed the test-debug-improvements branch from e41a36e to 9e6593d Compare October 21, 2024 11:24
Only keep a single "snapshot" variable outside the loop

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Exit code 255 is already used by the SSH client to communicate
connection errors [1], meaning if the remote command also exits with 255
in failure cases, this will be indistinguishable from a connection
error. Use exit code 1 instead in `check_guest_connections`.

[1]: https://man.openbsd.org/ssh#EXIT_STATUS

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Even if one child process failed, still call `wait` to ensure all other
child vsock_helper processes also exit, to ensure clean socket
teardown/etc.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
After we're done with a socket, close it, to avoid leaking resources.

We do not need to rebuild the rootfs for this, this specific binary is
compiled on the fly and SCP'd into the microVM.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
The main reason we have timeouts is so that we can get some useful
debugging information in case they do happen. This means the timesouts
only need to be strict enough to avoid running into the pytest timeout
instead. 10s/1s however still gets us spurious failures, since in the
non-performance tests contention on host networking resources (we use a
TON of networking namespaces) is pretty bad. So increase the timeouts to
something that still allows us to not run into the pytest timeout, while
hopefully returning spurious failures once and for all.

Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
@roypat roypat force-pushed the test-debug-improvements branch from 7c54859 to 8cfed0d Compare October 21, 2024 11:26
@roypat roypat merged commit c97418a into firecracker-microvm:main Oct 22, 2024
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants