Skip to content

Commit

Permalink
fix: intermittent vsock timeout in snapshot tests
Browse files Browse the repository at this point in the history
Instead of creating a new socat process on every guest resume,
create socat process in the 1st guest created so that the snap shotted
guests just use the same process through.
Suspend and resume socat process before taking snapshot and after
resuming from snapshot respectively to make sure the process is in
the right state for the snapshot.
The socket connection is reset by Firecracker while creating a
snapshot so it seems to leave socat in a weird state and without
stop/cont 'ng socat before/after snapshot we end up in a vsock timeout
when host tries to connect to guest in _vsock_connect_to_guest().

Signed-off-by: Sudan Landge <sudanl@amazon.com>
  • Loading branch information
Sudan Landge authored and pb8o committed Nov 29, 2023
1 parent 6be6cae commit 3d18abc
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion tests/integration_tests/functional/test_snapshot_basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ def test_5_snapshots(

logger.info("Create %s #0.", snapshot_type)
# Create a snapshot from a microvm.
start_guest_echo_server(vm)
snapshot = vm.make_snapshot(snapshot_type)
base_snapshot = snapshot

Expand All @@ -120,17 +121,23 @@ def test_5_snapshots(
microvm = microvm_factory.build()
microvm.spawn()
microvm.restore_from_snapshot(snapshot, resume=True)
# TODO: SIGCONT here and SIGSTOP later before creating snapshot
# is a temporary fix to avoid vsock timeout in
# _vsock_connect_to_guest(). This will be removed once we
# find the right solution for the timeout.
vm.ssh.run("pkill -SIGCONT socat")
# Test vsock guest-initiated connections.
path = os.path.join(
microvm.path, make_host_port_path(VSOCK_UDS_PATH, ECHO_SERVER_PORT)
)
check_guest_connections(microvm, path, vm_blob_path, blob_hash)
# Test vsock host-initiated connections.
path = start_guest_echo_server(microvm)
path = os.path.join(microvm.jailer.chroot_path(), VSOCK_UDS_PATH)
check_host_connections(path, blob_path, blob_hash)

# Check that the root device is not corrupted.
check_filesystem(microvm.ssh, "squashfs", "/dev/vda")
vm.ssh.run("pkill -SIGSTOP socat")

logger.info("Create snapshot %s #%d.", snapshot_type, i + 1)
snapshot = microvm.make_snapshot(snapshot_type)
Expand Down

0 comments on commit 3d18abc

Please sign in to comment.