Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for PAX Format, Version 1.0 #298

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ncihnegn
Copy link

@ncihnegn ncihnegn commented Aug 12, 2022

To fix #286 #295.

@ncihnegn
Copy link
Author

ncihnegn commented Aug 13, 2022

Windows CI is failing as #299.

@NobodyXu
Copy link

@ncihnegn Thank you very much for this PR!

@NobodyXu
Copy link

NobodyXu commented Sep 6, 2022

@alexcrichton Can you review this PR please?
It is really important for cargo-binstall as users of it encounters PAX format in practice and lack of PAX format means cargo-binstall gives errors even if a pre-built binary is available.

src/archive.rs Outdated
let off = block.offset()?;
let len = block.length()?;
if len != 0 && (size - remaining) % 512 != 0 {
let mut add_block = |block: &SparseEntry| -> io::Result<_> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok to avoid a new SparseEntry type here and just take two parameters for offset/size

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Some(gnu) => gnu,
None => return Err(other("sparse entry type listed but not GNU header")),
};
let mut sparse_map = Vec::<SparseEntry>::new();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the main goals I tried to keep for the tar crate is to minimize internal allocations. Instead of having a temporary list here could this be refactored to avoid the intermediate allocation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. Here we need to convert strings to numbers and the number of pairs is not fixed.

src/archive.rs Outdated
@@ -256,6 +257,7 @@ impl<'a, R: Read> Iterator for Entries<'a, R> {
}
}

#[allow(unused_assignments)] // https://github.com/rust-lang/rust/issues/22630
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best to remove this or move the comment to where the warning is printed instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/archive.rs Outdated
Comment on lines 408 to 410
if is_recognized_header && fields.is_pax_sparse() {
gnu_longname = fields.pax_sparse_name();
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels different than the current organization. Instead of pretending that the gnu_longname field was present if a pax-specified field is present could the accessor which looks at long_pathname be updated to consult the pax extensions if they're present?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/archive.rs Outdated
Comment on lines 399 to 401
// Not an entry
// Keep pax_extensions for the next ustar header
processed -= 1;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this is doing? An entry was consumed here so I don't know why this value would be decremented?

Copy link
Author

@ncihnegn ncihnegn Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In PAX format, each entry has two headers: pax and ustar.

Comment on lines +427 to +441
let mut reader = io::BufReader::with_capacity(BLOCK_SIZE, &self.archive.inner);
let mut read_decimal_line = || -> io::Result<u64> {
let mut str = String::new();
num_bytes_read += reader.read_line(&mut str)?;
str.strip_suffix("\n")
.and_then(|s| s.parse::<u64>().ok())
.ok_or_else(|| other("failed to read a decimal line"))
};

let num_entries = read_decimal_line()?;
for _ in 0..num_entries {
let offset = read_decimal_line()?;
let size = read_decimal_line()?;
sparse_map.push(SparseEntry { offset, size });
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand how this could work and pass tests. This is creating a temporary buffer to read from the inner underlying data stream but then the buffer is discarded outside of this scope. That means that more data than necessary could be consumed from the inner data stream and accidentally discarded.

I don't think that this should create a temporary buffer here but instead, if necessary, use a stack-local buffer and then do byte-searching within since presumably the entry here is typically small enough for that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will consume exactly a block of 512 bytes. After reading the necessary data, the remaining filler will be discarded.

None => return Err(other("sparse entry type listed but not GNU header")),
};
let mut sparse_map = Vec::<SparseEntry>::new();
let mut real_size = 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something about this doesn't feel quite right because real_size isn't set in all branches of the if statement below whereas prior it was always set to a particular value.

Copy link
Author

@ncihnegn ncihnegn Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean. It is set in both branches, line 428 and 452.

@akesson
Copy link

akesson commented Dec 16, 2022

Did this go stale?

@davidanthoff
Copy link

Is there any chance that this PR could be revived? Or maybe it is ready to be merged, given that it looks like @ncihnegn responded to all the change requests?

Presumably the first step would be to resolve the merge conflicts, @ncihnegn is that something you could do?

@ncihnegn
Copy link
Author

ncihnegn commented Jun 9, 2024

Updated.

@ararslan
Copy link

@alexcrichton, could you please re-review this? The Julia language's version multiplexer/installer uses this crate but we're blocked on porting the installer to all of Julia's supported platforms until this is merged.

@alexcrichton
Copy link
Owner

This is quite an old PR and I've unfortunately lost context on this. I'd also understand if @ncihnegn here wouldn't want to sheperd thing along after it's been 2 years since it's been opened. Despite that though this is somewhat tricky code and I'm hesitant to merge as-is. Merging this would require me to get a better understanding of PAX and how this crate works again (it's been awhile) so that's a fair bit of context to boot back up on. The test coverage also looks relatively light here and additionally there's not a ton of documentation internally about what's going on either.

There's also pieces I don't fully understand like creating intermediate BufReader values which feel like they can easily go wrong but I haven't fully investigated them yet.

Overall I unfortunately do not have the time to myself personally help push this across the finish line, but I can try to outline what would make this easier to land if others are interested in helping to contribute.

@cgwalters
Copy link
Collaborator

As I understand things, #375 added support for GNU sparse, but this would be the PAX version?

@ncihnegn
Copy link
Author

ncihnegn commented Oct 25, 2024

As I understand things, #375 added support for GNU sparse, but this would be the PAX version?

Yes. See
https://www.gnu.org/software/tar/manual/html_section/Sparse-Formats.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unpacking file size is incorrect (with reproduced repo)
7 participants