-
-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File deduplication #6332
base: main
Are you sure you want to change the base?
File deduplication #6332
Conversation
abbaa2e
to
3cb9a66
Compare
0469519
to
aa9dc2b
Compare
I'm not sure whether we still need it and the tests pass without it, but also I don't want to introduce a new bug by changing stuff, and it's just 8 lines, anyway.
f492186
to
ab4a882
Compare
… correct when deduplicating
src/blob.rs
Outdated
@@ -74,7 +75,8 @@ impl<'a> BlobObject<'a> { | |||
Ok(blob) | |||
} | |||
|
|||
// Creates a new file, returning a tuple of the name and the handle. | |||
/// Creates a new file, returning a tuple of the name and the handle. | |||
/// This avoids race conditions when creating multiple files with the same name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "This" here? The function? "Returning ... the handle"? Something else?
Probably "this function" because of the comment added inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but actually, probably the comment added inside is enough, I'll just remove this one.
// from an async context thanks to `block_in_place()`. | ||
// Tokio's "async" I/O functions are also just thin wrappers around the blocking I/O syscalls, | ||
// so we are doing essentially the same here. | ||
task::block_in_place(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this happen on the caller side? It's strange for a blocking function to depend on tokio. E.g. if we call this from another blocking function that is already running on a dedicated thread, this block_in_place
is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still fine not to care about this in tests, they are running in separate processes by nextest anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, it's easy to forget it at the caller site since it's not visible from the function signature that it needs to be wrapped in block_in_place()
. And if we call this from another blocking function, block_in_place
is a no-op:
calling the function outside a runtime is allowed. In this case, block_in_place just calls the provided closure normally.
OTOH, I do agree that it doesn't "look" nice that with the newest changes, block_in_place()
is called twice when you call create_and_deduplicate_from_bytes()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not what other functions do, so you need to think about it when calling sync functions from async functions anyway.
context: &'a Context, | ||
data: &[u8], | ||
) -> Result<BlobObject<'a>> { | ||
task::block_in_place(|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as create_and_deduplicate
, I think this should be on the caller side.
Co-authored-by: l <link2xt@testrun.org>
Co-authored-by: l <link2xt@testrun.org>
Co-authored-by: l <link2xt@testrun.org>
…_and_deduplicate()
When receiving messages, blobs will be deduplicated with the new function
create_and_deduplicate_from_bytes()
. For sending files, this adds a new functionset_file_and_deduplicate()
instead of deduplicating by default.This is for #6265; read the issue description there for more details.
TODO:
accounts/2ff9fc096d2f46b6832b24a1ed99c0d6/dc.db-blobs
(53 chars), plus 64 chars for the filename would be 117).For later PRs:
BlobObject::create(…)
withBlobObject::create_and_deduplicate(…)
in order to deduplicate everytime core creates a filename
usually means "user-visible-name", not "name of the file on disk").Footnotes
Calculated with both https://printfn.github.io/fend/ and https://www.geogebra.org/calculator, both of which came to the same result (1,
2) ↩