-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing a read-only projection of the conserve archive for Windows #240
base: main
Are you sure you want to change the base?
Conversation
…nused imports compile warnings on Windows
@@ -64,6 +64,9 @@ indoc = "2.0" | |||
uzers = "0.11" | |||
nix = { version = "0.27", features = ["fs", "user"] } | |||
|
|||
[target.'cfg(windows)'.dependencies] | |||
windows-projfs = { version = "0.1.6", features = ["dynamic-import"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this pull in a lot of stuff? Maybe it should be optional even on Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "pull in a lot of stuff"?
Do you mean memory wise, dependency wise, file wise, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant mostly a lot of new dependencies and therefore build time; hopefully it's not having any runtime effect unless it's used. Perhaps it's fine by default.
/// Make an iterator that returns hunks of entries from this index. | ||
pub fn iter_available_hunks(self) -> IndexHunkIter { | ||
let _span = debug_span!("iter_hunks", ?self.transport).entered(); | ||
let hunks = self.hunks_available().expect("hunks available"); // TODO: Don't panic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just make this return a Result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment & behaviour originates from the previous code, written by you :)
https://github.com/sourcefrog/conserve/blob/main/src/index.rs#L310
BTW, thanks for the review.
} | ||
|
||
/// Make an iterator that returns hunks of entries from this index. | ||
pub fn iter_available_hunks(self) -> IndexHunkIter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the change from iter_hunks
to iter_available_hunks
really semantic? They seem fairly similar..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh this is actually a relict of the previous versions where I needed to pass a custom hunk iter to the iter_hunks
method. Even tough, this is not functionally required any more I think the overall API semantics of IndexRead with iter_hunks
and iter_available_hunks
are improved.
src/mount.rs
Outdated
sync::{Arc, Mutex}, | ||
time::Duration, | ||
}; | ||
use tracing::{error, info, trace}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please order imports as paragraphs of
- std
- dependencies
- this crate
src/mount.rs
Outdated
}; | ||
} | ||
|
||
struct VoidMonitor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this under src/monitor/
: it doesn't seem specific to projfs.
src/mount.rs
Outdated
Apath, Archive, BandId, BandSelectionPolicy, IndexEntry, IndexRead, Kind, Result, StoredTree, | ||
}; | ||
|
||
macro_rules! static_dir { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is (currently) used only once so perhaps it's no worth having as a macro? It could just be a function, or even just inline?
src/mount.rs
Outdated
.map(move |hunk_index| { | ||
let mut index = index.duplicate(); | ||
let entries = index.read_hunk(hunk_index)?; | ||
let meta_info = if let Some(entries) = entries { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to read and then discard every hunk, and I wonder if it would be better to cache them?
If the tree is very large that might use a lot of memory, but it seems like you at least might as well put them in an LRU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caching the hunk contents for the current implementation of the HunkHelper itself does not make sense, as it never needs any of the hunk contents.
The moment we actually access the hunk contents later on via load_hunk_contents
the results are getting cached.
Another logical issue with using an cache at this stage is, that the HunkHelper
loads every hunk once, therefore we do not gain any knowledge about which hunk is most likely be loaded again.
src/mount.rs
Outdated
} | ||
}); | ||
|
||
let hunk_index = match hunk_index { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this could do with a comment about the interpretation of Err from binary_search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's actually a relict where I only compared the start path and not the start and end path, therefore never encountering an actual match.
/// This has several side effects: | ||
/// - Depending on the implementation of the decompressor, duplicate might not be a cheap option. | ||
/// - Every read index has its own unique read stats, therefore the clone does not inherit the read stats. | ||
pub(crate) fn duplicate(&self) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the decompressor would ever be expensive: it's basically a buffer?
Maybe this could just be impl Clone
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, that the argument of not using Clone
based complexity of cloning the decoder in itself wouldn't be enough. But we got two side effects here, which I've mentioned in the doc-comment in the code above seem severe enough that I would argue it's not actually creating a clone.
src/mount.rs
Outdated
} | ||
|
||
impl HunkHelper { | ||
pub fn from_index(index: &IndexRead) -> Result<Self> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conserve handles the edge case where an index is incomplete due to an interrupted backup, by resuming at the same apath in the previous index. This can happen repeatedly if the previous index is also incomplete, until we either find a complete index or run out of indexes. As a result, just reading the index hunks for a single index is not enough to read incomplete backups correctly.
This is handled today by stitch.rs
so this code should probably either build on or merge into that.
The way this is handled in the API today is admittedly not really ideal... but I do feel like this is important for the goal of recovering data even from interrupted or damaged backups...
(This is probably the most important non-stylistic comment in this review.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I was aware of that, and it's a good catch!
As you have the ability to select which backup you want to inspect, I argue that you should only see files which have been processed therefore falling back to the next best band would be a violation of this expectation.
Under that premiss, unfinished backups shall only be partial.
Let me know of your thoughts :)
I meant, a lot of dependencies, or anything that might be particularly
likely to cause build breakage. For example, libssh depends on a native
library and C compiler, from memory.
|
I get what you mean. The only change is I use functionality of the latest Rust stable release, hence the CI pipeline edit to 1.75. |
I'll come back and go line by line, but just as a general comment I would be delighted to merge this, but I would want to have some more tests, including hopefully an integration test that mounts an archive on Windows and checks that you can read from it. |
Hey, |
Hey ho, |
caf1dc0
to
3266178
Compare
Ack, I'll look. |
Thanks for adding tests! It does look like they are currently failing in CI: https://github.com/sourcefrog/conserve/actions/runs/10197526482/job/28210539695?pr=240 |
Yes, I had the same issue with the projfs tests. I'm not quite sure what implications this might have for all the other conserve tests. |
…t threads for the mount tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is probably a good idea to merge. I am hesitating a bit because I don't regularly use Windows and so I won't give it any ad-hoc manual testing myself. However, I can see how it would be a useful feature there, and since there are at least some CI tests there is some protection against any major breakage.
This PR is also pointing to how the core APIs could be refactored to support caching or faster random access, but I wouldn't want to make this PR enormous to do all of that in here.
@@ -64,6 +64,9 @@ indoc = "2.0" | |||
uzers = "0.11" | |||
nix = { version = "0.27", features = ["fs", "user"] } | |||
|
|||
[target.'cfg(windows)'.dependencies] | |||
windows-projfs = { version = "0.1.6", features = ["dynamic-import"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant mostly a lot of new dependencies and therefore build time; hopefully it's not having any runtime effect unless it's used. Perhaps it's fine by default.
It looks like I've changed the option name and help to be IMO easier to understand and to be clear this is probably projfs-specific behavior. I wonder how the caching behavior will interact with the contents of I tried this out a bit on Windows. It seems like if I run it without cleanup, exit, and then run it again, I get an error that |
Possibly the error on restarting is connected to this:
I'm not sure though if the |
Oh, I just remembered this is your crate. From the docs, but not yet experimenting, maybe it needs a way to resume virtualizing by calling PrjMarkDirectoryAsPlaceholder on the first run, and then just PrjStartVirtualizing on later runs. Possibly that could just be automatic if the first one calls with |
Hey,
concluding my latest comment on the discussions #237 I open this PR.
Motivation
Right now it is not possible to view files which have been backupped with conserve without restoring a partial or complete versions of that particular backup. Fully restoring a specific version is mandatory in case of a complete data loss (e.g. corrupted hard drive) but inconvenient when only a few files need to be restored especially if these files are contained in different subtrees. Therefore having a feature which allows the user to easily access, view and copy only specific folders or files for specific versions is required.
Implementation
As discussed in the discussion linked above, Windows does ship an optional feature called Windows Projected File System (ProjFS).
Leveraging the ProjFS we're able to give the user a visual (and explorable) representation of all folder and files contained within the conserve backup. Leveraging the possibility of a virtual file system, the user is also able to easily view different versions of the backupped file system.
In the past few days I've implemented a rust wrapper around the Windows ProjFS API.
More details can be found here: https://github.com/WolverinDEV/windows-projfs
Pending Tasks
This PR is still in draft as the following tasks need to be completed before this feature can be shipped: