Implementing a read-only projection of the conserve archive for Windows #240

WolverinDEV · 2023-12-29T22:17:30Z

Hey,
concluding my latest comment on the discussions #237 I open this PR.

Motivation

Right now it is not possible to view files which have been backupped with conserve without restoring a partial or complete versions of that particular backup. Fully restoring a specific version is mandatory in case of a complete data loss (e.g. corrupted hard drive) but inconvenient when only a few files need to be restored especially if these files are contained in different subtrees. Therefore having a feature which allows the user to easily access, view and copy only specific folders or files for specific versions is required.

Implementation

As discussed in the discussion linked above, Windows does ship an optional feature called Windows Projected File System (ProjFS).

The Windows Projected File System (ProjFS) allows a user-mode application called a "provider" to project hierarchical data from a backing data store into the file system, making it appear as files and directories in the file system.

Leveraging the ProjFS we're able to give the user a visual (and explorable) representation of all folder and files contained within the conserve backup. Leveraging the possibility of a virtual file system, the user is also able to easily view different versions of the backupped file system.

In the past few days I've implemented a rust wrapper around the Windows ProjFS API.
More details can be found here: https://github.com/WolverinDEV/windows-projfs

Pending Tasks

This PR is still in draft as the following tasks need to be completed before this feature can be shipped:

Improve band entry iteration for directory listings
Implement correct file metadata & attributes
Write tests for mounting & utilities

…ion root cleanup

…nused imports compile warnings on Windows

sourcefrog · 2024-01-01T16:13:05Z

Cargo.toml

@@ -64,6 +64,9 @@ indoc = "2.0"
 uzers = "0.11"
 nix = { version = "0.27", features = ["fs", "user"] }

+[target.'cfg(windows)'.dependencies]
+windows-projfs = { version = "0.1.6", features = ["dynamic-import"] }


Does this pull in a lot of stuff? Maybe it should be optional even on Windows?

What do you mean by "pull in a lot of stuff"?
Do you mean memory wise, dependency wise, file wise, etc?

I meant mostly a lot of new dependencies and therefore build time; hopefully it's not having any runtime effect unless it's used. Perhaps it's fine by default.

sourcefrog · 2024-01-01T16:17:36Z

src/index.rs

+    /// Make an iterator that returns hunks of entries from this index.
+    pub fn iter_available_hunks(self) -> IndexHunkIter {
+        let _span = debug_span!("iter_hunks", ?self.transport).entered();
+        let hunks = self.hunks_available().expect("hunks available"); // TODO: Don't panic


Just make this return a Result?

This comment & behaviour originates from the previous code, written by you :)
https://github.com/sourcefrog/conserve/blob/main/src/index.rs#L310

BTW, thanks for the review.

sourcefrog · 2024-01-01T16:19:44Z

src/index.rs

+    }
+
+    /// Make an iterator that returns hunks of entries from this index.
+    pub fn iter_available_hunks(self) -> IndexHunkIter {


Is the change from iter_hunks to iter_available_hunks really semantic? They seem fairly similar..?

Ahh this is actually a relict of the previous versions where I needed to pass a custom hunk iter to the iter_hunks method. Even tough, this is not functionally required any more I think the overall API semantics of IndexRead with iter_hunks and iter_available_hunks are improved.

sourcefrog · 2024-01-01T16:20:49Z

src/mount.rs

+    sync::{Arc, Mutex},
+    time::Duration,
+};
+use tracing::{error, info, trace};


nit: please order imports as paragraphs of

std

dependencies

this crate

sourcefrog · 2024-01-01T16:21:38Z

src/mount.rs

+    };
+}
+
+struct VoidMonitor;


Let's move this under src/monitor/: it doesn't seem specific to projfs.

sourcefrog · 2024-01-01T16:29:27Z

src/mount.rs

+    Apath, Archive, BandId, BandSelectionPolicy, IndexEntry, IndexRead, Kind, Result, StoredTree,
+};
+
+macro_rules! static_dir {


This is (currently) used only once so perhaps it's no worth having as a macro? It could just be a function, or even just inline?

sourcefrog · 2024-01-01T16:30:32Z

src/mount.rs

+            .map(move |hunk_index| {
+                let mut index = index.duplicate();
+                let entries = index.read_hunk(hunk_index)?;
+                let meta_info = if let Some(entries) = entries {


This seems to read and then discard every hunk, and I wonder if it would be better to cache them?

If the tree is very large that might use a lot of memory, but it seems like you at least might as well put them in an LRU?

Caching the hunk contents for the current implementation of the HunkHelper itself does not make sense, as it never needs any of the hunk contents.
The moment we actually access the hunk contents later on via load_hunk_contents the results are getting cached.

Another logical issue with using an cache at this stage is, that the HunkHelper loads every hunk once, therefore we do not gain any knowledge about which hunk is most likely be loaded again.

sourcefrog · 2024-01-01T16:31:23Z

src/mount.rs

+            }
+        });
+
+        let hunk_index = match hunk_index {


maybe this could do with a comment about the interpretation of Err from binary_search

That's actually a relict where I only compared the start path and not the start and end path, therefore never encountering an actual match.

sourcefrog · 2024-01-01T16:34:29Z

src/index.rs

+    /// This has several side effects:
+    /// - Depending on the implementation of the decompressor, duplicate might not be a cheap option.
+    /// - Every read index has its own unique read stats, therefore the clone does not inherit the read stats.
+    pub(crate) fn duplicate(&self) -> Self {


I don't think the decompressor would ever be expensive: it's basically a buffer?

Maybe this could just be impl Clone?

I agree, that the argument of not using Clone based complexity of cloning the decoder in itself wouldn't be enough. But we got two side effects here, which I've mentioned in the doc-comment in the code above seem severe enough that I would argue it's not actually creating a clone.

sourcefrog · 2024-01-01T16:39:18Z

src/mount.rs

+}
+
+impl HunkHelper {
+    pub fn from_index(index: &IndexRead) -> Result<Self> {


Conserve handles the edge case where an index is incomplete due to an interrupted backup, by resuming at the same apath in the previous index. This can happen repeatedly if the previous index is also incomplete, until we either find a complete index or run out of indexes. As a result, just reading the index hunks for a single index is not enough to read incomplete backups correctly.

This is handled today by stitch.rs so this code should probably either build on or merge into that.

The way this is handled in the API today is admittedly not really ideal... but I do feel like this is important for the goal of recovering data even from interrupted or damaged backups...

(This is probably the most important non-stylistic comment in this review.)

Yes I was aware of that, and it's a good catch!

As you have the ability to select which backup you want to inspect, I argue that you should only see files which have been processed therefore falling back to the next best band would be a violation of this expectation.
Under that premiss, unfinished backups shall only be partial.

Let me know of your thoughts :)

sourcefrog · 2024-01-01T17:14:21Z

I meant, a lot of dependencies, or anything that might be particularly likely to cause build breakage. For example, libssh depends on a native library and C compiler, from memory.

WolverinDEV · 2024-01-01T17:20:04Z

I get what you mean.
windows-projfs does only depend on 5 dependencies (log, parking_lot, thiserror, windows, and libloading).
As far as I know are none of them known for causing build breakage.
The windows crate is statically generated before being deployed to crate.io therefore no hidden build requirements.

The only change is I use functionality of the latest Rust stable release, hence the CI pipeline edit to 1.75.

sourcefrog · 2024-01-14T04:59:58Z

I'll come back and go line by line, but just as a general comment I would be delighted to merge this, but I would want to have some more tests, including hopefully an integration test that mounts an archive on Windows and checks that you can read from it.

WolverinDEV · 2024-01-22T09:04:41Z

I'll come back and go line by line, but just as a general comment I would be delighted to merge this, but I would want to have some more tests, including hopefully an integration test that mounts an archive on Windows and checks that you can read from it.

Hey,
yes including tests is one of the bullet points which is missing before draft completion.
Personally I've already added tests to windows-projfs but as the exam phase approaches, I got less and less time...

WolverinDEV · 2024-08-01T11:28:51Z

Hey ho,
I've successfully implemented tests :)
You're free to review now and may merge.

sourcefrog · 2024-08-07T20:25:53Z

Ack, I'll look.

sourcefrog · 2024-08-07T20:41:40Z

Thanks for adding tests! It does look like they are currently failing in CI: https://github.com/sourcefrog/conserve/actions/runs/10197526482/job/28210539695?pr=240

WolverinDEV · 2024-08-10T09:08:13Z

Thanks for adding tests! It does look like they are currently failing in CI: https://github.com/sourcefrog/conserve/actions/runs/10197526482/job/28210539695?pr=240

Yes, I had the same issue with the projfs tests.
I narrowed the issue down to a resource limitation.
The tests work flawlessly when reducing the number of Rust test threads (e.g., https://github.com/WolverinDEV/windows-projfs/blob/master/.github/workflows/rust.yml#L54).

I'm not quite sure what implications this might have for all the other conserve tests.
Maybe execute them separately?

…t threads for the mount tests

sourcefrog

I think this is probably a good idea to merge. I am hesitating a bit because I don't regularly use Windows and so I won't give it any ad-hoc manual testing myself. However, I can see how it would be a useful feature there, and since there are at least some CI tests there is some protection against any major breakage.

This PR is also pointing to how the core APIs could be refactored to support caching or faster random access, but I wouldn't want to make this PR enormous to do all of that in here.

sourcefrog · 2024-10-20T00:16:43Z

Cargo.toml

@@ -64,6 +64,9 @@ indoc = "2.0"
 uzers = "0.11"
 nix = { version = "0.27", features = ["fs", "user"] }

+[target.'cfg(windows)'.dependencies]
+windows-projfs = { version = "0.1.6", features = ["dynamic-import"] }


I meant mostly a lot of new dependencies and therefore build time; hopefully it's not having any runtime effect unless it's used. Perhaps it's fine by default.

src/bin/conserve.rs

src/hunk_index.rs

src/mount/projfs.rs

sourcefrog · 2024-10-23T14:45:05Z

It looks like projfs adds minimal dependencies so I don't think there's any need to make it optional at build time.

I've changed the option name and help to be IMO easier to understand and to be clear this is probably projfs-specific behavior.

I wonder how the caching behavior will interact with the contents of latest changing over time. I'll look.

I tried this out a bit on Windows. It seems like if I run it without cleanup, exit, and then run it again, I get an error that failed to mark projection root: The object manager encountered a reparse point while retrieving an object (0x8007112B). I wonder if this means we should always clean up, or perhaps there's some way to take over ownership of an existing reparse point.

sourcefrog · 2024-10-23T15:58:16Z

Possibly the error on restarting is connected to this:

The provider only needs to create the virtualization root once for each virtualization instance. Once a root has been created, its associated instance can be repeatedly started and stopped without recreating the root.

I'm not sure though if the windows-projfs Rust wrapper exposes that functionality yet.

sourcefrog · 2024-10-23T16:34:27Z

Oh, I just remembered this is your crate.

From the docs, but not yet experimenting, maybe it needs a way to resume virtualizing by calling PrjMarkDirectoryAsPlaceholder on the first run, and then just PrjStartVirtualizing on later runs. Possibly that could just be automatic if the first one calls with ERROR_REPARSE_POINT_ENCOUNTERED?

WolverinDEV added 11 commits December 29, 2023 22:41

Initial implementation of archive mounting on Windows

8dac50d

Adding a hunk metadata cache to speed up index entry querying

58f84a5

Caching the opened bands as well

0b7e53a

Supporting file timestamps, mark files read only and improved project…

e88514b

…ion root cleanup

Enabling the mount command by default for all Windows platforms

16a37aa

Improving cache to speed up directory and file serving

6b7d7b6

Indicating backup timestamp trough directory modify timestamps

eb1e814

Bumped windows-projfs version to v0.1.6

76535be

Mount implementation polishing

bc6cd1b

Moving unix specific test imports into the function itself to avoid u…

e2d2f07

…nused imports compile warnings on Windows

Bumped the minimal required rust version to 1.75

eddd369

sourcefrog reviewed Jan 1, 2024

View reviewed changes

WolverinDEV added 2 commits January 1, 2024 18:51

Incorporate PR feedback

1fd937b

Merge remote-tracking branch 'original/main' into windows-mount

195974a

WolverinDEV mentioned this pull request Jan 22, 2024

No docs for the crate? WolverinDEV/windows-projfs#8

Closed

Merged with upstream

bc5e81f

Adding tests for the mount archive feature

3266178

WolverinDEV force-pushed the windows-mount branch from caf1dc0 to 3266178 Compare August 1, 2024 11:29

WolverinDEV added 2 commits August 1, 2024 13:31

Using Rust 1.76 as required by aws-sdk-sso

4b48c8c

Disable tests for mount on non windows systems

57942e3

WolverinDEV requested a review from sourcefrog August 1, 2024 11:37

sourcefrog marked this pull request as ready for review August 7, 2024 20:25

WolverinDEV and others added 2 commits August 10, 2024 11:14

Seperating mount tests from normal tests any reduce the number of tes…

1918fa5

…t threads for the mount tests

Skip some files from mutants

ef27f3a

sourcefrog reviewed Oct 21, 2024

View reviewed changes

sourcefrog added 3 commits October 20, 2024 19:49

Merge remote-tracking branch 'origin/main' into windows-mount

8bee9c1

fix: New Transport API

4a13602

fix: Better error messages

941a82c

sourcefrog added 8 commits October 23, 2024 07:56

Merge branch 'main' into pr/WolverinDEV/240

985ce15

Separate out no-op Unix mount

fb09fa0

Merge branch 'main' into pr/WolverinDEV/240

81ddb74

rustfmt

8863d09

clippy: some things only needed on Windows

85d495c

Only build hunk_index on Windows

f66d6b0

Suppress unused warnings on Unix for code only used on Windows

210d790

Rename option to --cleanup-projfs and add help

c6a0e91

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing a read-only projection of the conserve archive for Windows #240

Implementing a read-only projection of the conserve archive for Windows #240

WolverinDEV commented Dec 29, 2023 •

edited

Loading

sourcefrog Jan 1, 2024

WolverinDEV Jan 1, 2024

sourcefrog Oct 20, 2024

sourcefrog Jan 1, 2024

WolverinDEV Jan 1, 2024

sourcefrog Jan 1, 2024

WolverinDEV Jan 1, 2024

sourcefrog Jan 1, 2024

sourcefrog Jan 1, 2024

sourcefrog Jan 1, 2024

sourcefrog Jan 1, 2024

WolverinDEV Jan 1, 2024

sourcefrog Jan 1, 2024

WolverinDEV Jan 1, 2024

sourcefrog Jan 1, 2024

WolverinDEV Jan 1, 2024

sourcefrog Jan 1, 2024

WolverinDEV Jan 1, 2024

sourcefrog commented Jan 1, 2024 via email

WolverinDEV commented Jan 1, 2024

sourcefrog commented Jan 14, 2024

WolverinDEV commented Jan 22, 2024

WolverinDEV commented Aug 1, 2024

sourcefrog commented Aug 7, 2024

sourcefrog commented Aug 7, 2024

WolverinDEV commented Aug 10, 2024

sourcefrog left a comment

sourcefrog Oct 20, 2024

sourcefrog commented Oct 23, 2024

sourcefrog commented Oct 23, 2024 •

edited

Loading

sourcefrog commented Oct 23, 2024

Implementing a read-only projection of the conserve archive for Windows #240

Are you sure you want to change the base?

Implementing a read-only projection of the conserve archive for Windows #240

Conversation

WolverinDEV commented Dec 29, 2023 • edited Loading

Motivation

Implementation

Pending Tasks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sourcefrog commented Jan 1, 2024 via email

WolverinDEV commented Jan 1, 2024

sourcefrog commented Jan 14, 2024

WolverinDEV commented Jan 22, 2024

WolverinDEV commented Aug 1, 2024

sourcefrog commented Aug 7, 2024

sourcefrog commented Aug 7, 2024

WolverinDEV commented Aug 10, 2024

sourcefrog left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sourcefrog commented Oct 23, 2024

sourcefrog commented Oct 23, 2024 • edited Loading

sourcefrog commented Oct 23, 2024

WolverinDEV commented Dec 29, 2023 •

edited

Loading

sourcefrog commented Oct 23, 2024 •

edited

Loading