-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for PAX Format, Version 1.0 #298
base: main
Are you sure you want to change the base?
Conversation
Windows CI is failing as #299. |
@ncihnegn Thank you very much for this PR! |
@alexcrichton Can you review this PR please? |
src/archive.rs
Outdated
let off = block.offset()?; | ||
let len = block.length()?; | ||
if len != 0 && (size - remaining) % 512 != 0 { | ||
let mut add_block = |block: &SparseEntry| -> io::Result<_> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok to avoid a new SparseEntry
type here and just take two parameters for offset/size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Some(gnu) => gnu, | ||
None => return Err(other("sparse entry type listed but not GNU header")), | ||
}; | ||
let mut sparse_map = Vec::<SparseEntry>::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the main goals I tried to keep for the tar
crate is to minimize internal allocations. Instead of having a temporary list here could this be refactored to avoid the intermediate allocation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Here we need to convert strings to numbers and the number of pairs is not fixed.
src/archive.rs
Outdated
@@ -256,6 +257,7 @@ impl<'a, R: Read> Iterator for Entries<'a, R> { | |||
} | |||
} | |||
|
|||
#[allow(unused_assignments)] // https://github.com/rust-lang/rust/issues/22630 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be best to remove this or move the comment to where the warning is printed instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/archive.rs
Outdated
if is_recognized_header && fields.is_pax_sparse() { | ||
gnu_longname = fields.pax_sparse_name(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels different than the current organization. Instead of pretending that the gnu_longname field was present if a pax-specified field is present could the accessor which looks at long_pathname
be updated to consult the pax extensions if they're present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/archive.rs
Outdated
// Not an entry | ||
// Keep pax_extensions for the next ustar header | ||
processed -= 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this is doing? An entry was consumed here so I don't know why this value would be decremented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In PAX format, each entry has two headers: pax and ustar.
let mut reader = io::BufReader::with_capacity(BLOCK_SIZE, &self.archive.inner); | ||
let mut read_decimal_line = || -> io::Result<u64> { | ||
let mut str = String::new(); | ||
num_bytes_read += reader.read_line(&mut str)?; | ||
str.strip_suffix("\n") | ||
.and_then(|s| s.parse::<u64>().ok()) | ||
.ok_or_else(|| other("failed to read a decimal line")) | ||
}; | ||
|
||
let num_entries = read_decimal_line()?; | ||
for _ in 0..num_entries { | ||
let offset = read_decimal_line()?; | ||
let size = read_decimal_line()?; | ||
sparse_map.push(SparseEntry { offset, size }); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand how this could work and pass tests. This is creating a temporary buffer to read from the inner underlying data stream but then the buffer is discarded outside of this scope. That means that more data than necessary could be consumed from the inner data stream and accidentally discarded.
I don't think that this should create a temporary buffer here but instead, if necessary, use a stack-local buffer and then do byte-searching within since presumably the entry here is typically small enough for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will consume exactly a block of 512 bytes. After reading the necessary data, the remaining filler will be discarded.
None => return Err(other("sparse entry type listed but not GNU header")), | ||
}; | ||
let mut sparse_map = Vec::<SparseEntry>::new(); | ||
let mut real_size = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something about this doesn't feel quite right because real_size
isn't set in all branches of the if
statement below whereas prior it was always set to a particular value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean. It is set in both branches, line 428 and 452.
Did this go stale? |
Updated. |
@alexcrichton, could you please re-review this? The Julia language's version multiplexer/installer uses this crate but we're blocked on porting the installer to all of Julia's supported platforms until this is merged. |
This is quite an old PR and I've unfortunately lost context on this. I'd also understand if @ncihnegn here wouldn't want to sheperd thing along after it's been 2 years since it's been opened. Despite that though this is somewhat tricky code and I'm hesitant to merge as-is. Merging this would require me to get a better understanding of PAX and how this crate works again (it's been awhile) so that's a fair bit of context to boot back up on. The test coverage also looks relatively light here and additionally there's not a ton of documentation internally about what's going on either. There's also pieces I don't fully understand like creating intermediate Overall I unfortunately do not have the time to myself personally help push this across the finish line, but I can try to outline what would make this easier to land if others are interested in helping to contribute. |
As I understand things, #375 added support for GNU sparse, but this would be the PAX version? |
Yes. See |
To fix #286 #295.