Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow snapshot tap changes #4731

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

andrewla
Copy link

@andrewla andrewla commented Aug 15, 2024

Changes

Allow renaming of tap devices on snapshot restore

Reason

In some scenarios it is not possible to use the jailer, especially in limited privilege environments where the security is external to firecracker itself. But in these cases a snapshot may have to use a different tap device than the one that it was using when it was snapshotted.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this
    PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet
    contribution quality standards.

  • This functionality cannot be added in rust-vmm.

@andrewla andrewla force-pushed the allow-snapshot-tap-changes branch from 7991d9f to 8d1a0a9 Compare August 15, 2024 21:24
@pb8o
Copy link
Contributor

pb8o commented Aug 19, 2024

Hi @andrewla thank you for your contribution! We would like to understand the use case better in case it can be resolved through other means first. We recommend using a network namespace where you can create TAP devices with the same name, but that probably requires CAP_SYS_ADMIN, which I understand is what you mean with "limited privilege environments".

Could you elaborate on your use case? Is there a way you could create the namespace in a privileged setting and then use something like nsenter firecracker ...?

@andrewla
Copy link
Author

That assessment is correct -- basically to run the jailer in a network namespace you need the setns syscall which requires CAP_SYS_ADMIN. So nsenter is not an option.

Our particular case is running in a containerized environment where our privileges are limited by the nature of the general environment. Once we're in our particular container we have lost all relevant privileges.

@andrewla andrewla force-pushed the allow-snapshot-tap-changes branch 5 times, most recently from 3415816 to 03c3be9 Compare August 26, 2024 20:43
@andrewla andrewla force-pushed the allow-snapshot-tap-changes branch from 03c3be9 to 265ea94 Compare August 27, 2024 13:24
@andrewla andrewla marked this pull request as ready for review August 27, 2024 13:25
@pb8o
Copy link
Contributor

pb8o commented Aug 29, 2024

Hi again @andrewla, we have been talking internally about this PR and we may need to spend some time to decide on the API aspects of it to make sure it doesn't conflict with other efforts.

In the meantime, we thought of another workaround. The snapshot-editor could be enhanced to rename the tap devices in an snapshot file. That would be an easier decision for us, but we want to make sure it would handle your use case.

For example we imagine the tool would work like this:

snapshot-editor edit-vmstate rename-network eth0 tap1

Would this work within your environment?

@andrewla
Copy link
Author

andrewla commented Sep 3, 2024

This was our initial approach as it required minimal changes. But we found that the performance cost of making the copy (as opposed to hardlinking) during the operation (plus serde costs) were more expensive than we were willing to tolerate in our environment.

@andrewla
Copy link
Author

Hi @pb8o -- is there anything we can do to help move this forward?

@pb8o
Copy link
Contributor

pb8o commented Oct 16, 2024

Hi @andrewla I haven't had time to look at this, but this is next on my list now. Thanks for your patience!

@kanpov
Copy link
Contributor

kanpov commented Nov 1, 2024

On a related note, another reason why renaming the tap device is a better approach than namespaced NAT from the "Network for Clones" guide is that the namespaced NAT imposes measurable overhead onto the host kernel due to the addition of about 5 more iptables/nft rules, plus an RTNETLINK route for forwarding the guest IP out of the netns.

Even though I made an effort to support namespaced NAT in fcnet, it increased complexity by a factor of 4-5x in comparison to regular NAT only to support one usecase: two simultaneous microVM clones. So I'd be in favor of this change, or a snapshot-editor equivalent.

@pb8o
Copy link
Contributor

pb8o commented Jan 8, 2025

Hello @andrewla ! I apologize for the long time between updates, but some other stuff came up. So we have decided to go ahead with this. I gave a first initial review and I only have some minor comments, but mostly looks good to me. I just have a question if the network_overrides field also works when starting from a JSON config file.

tests/framework/microvm.py Outdated Show resolved Hide resolved
tests/framework/microvm.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
docs/snapshotting/network-for-clones.md Outdated Show resolved Hide resolved
@andrewla andrewla requested a review from Manciukic as a code owner January 8, 2025 19:28
@andrewla
Copy link
Author

andrewla commented Jan 9, 2025

Re: config -- currently there is no config support for snapshots (https://github.com/firecracker-microvm/firecracker/blob/main/src/vmm/src/resources.rs) -- the snapshot configuration and restore has to be done with a running firecracker instance

@andrewla andrewla force-pushed the allow-snapshot-tap-changes branch 2 times, most recently from d8c5a44 to ea62e9a Compare January 9, 2025 19:00
Copy link
Contributor

@bchalios bchalios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It generally looks good and thanks for the contribution Andrew.

A few comments/questions from me.

Also, I've commented this for the documentation changes, but could you please squash as well the commits for the test changes into a single commit?

src/firecracker/swagger/firecracker.yaml Outdated Show resolved Hide resolved
src/vmm/src/builder.rs Outdated Show resolved Hide resolved
tests/framework/microvm.py Outdated Show resolved Hide resolved
tests/framework/microvm.py Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
docs/snapshotting/network-for-clones.md Show resolved Hide resolved
CHANGELOG.md Show resolved Hide resolved
@andrewla andrewla force-pushed the allow-snapshot-tap-changes branch from ea62e9a to abbb1af Compare January 10, 2025 14:40
In some scenarios it is not possible to use the jailer, especially in
limited privilege environments where the security is external to
firecracker itself. But in these cases a snapshot may have to use a
different tap device than the one that it was using when it was
snapshotted.

Signed-off-by: Andrew Laucius <andrewla@gmail.com>
@andrewla andrewla force-pushed the allow-snapshot-tap-changes branch from abbb1af to c20229c Compare January 10, 2025 14:46
Test that we can correctly parse configuration and API calls in a
backwards compatible way.

Signed-off-by: Andrew Laucius <andrewla@gmail.com>
@andrewla andrewla force-pushed the allow-snapshot-tap-changes branch from c20229c to 0be3ff7 Compare January 10, 2025 14:51
Documenting the ability to rename network interfaces on snapshot
restore.

Signed-off-by: Andrew Laucius <andrewla@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants