Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --progress-bytes SIZE #313

Merged
merged 5 commits into from
Oct 16, 2019
Merged

Add --progress-bytes SIZE #313

merged 5 commits into from
Oct 16, 2019

Conversation

gperciva
Copy link
Member

@gperciva gperciva commented Jul 3, 2018

No description provided.

@gperciva
Copy link
Member Author

gperciva commented Jul 3, 2018

This is split off from #305. It adds --progress-bytes and --no-progress-bytes which prints the number of files and total bytes, but does not give info about new bytes. It also adds this feature to the changelog, since the announcement doesn't specify exactly what info is printed (so there's no harm in adding "new bytes" info later on).

NEWS.md Outdated Show resolved Hide resolved
tar/siginfo.c Outdated Show resolved Hide resolved
@gperciva

This comment has been minimized.

@cperciva

This comment has been minimized.

@gperciva gperciva force-pushed the print-progress-no-newbytes branch from 5d308a4 to 2326a1f Compare August 4, 2018 16:44
@gperciva

This comment has been minimized.

@gperciva gperciva force-pushed the print-progress-no-newbytes branch from 2326a1f to adf3446 Compare August 20, 2018 21:53
tar/tarsnap.1-mdoc.in Outdated Show resolved Hide resolved
@gperciva gperciva mentioned this pull request Aug 20, 2018
tar/siginfo.c Outdated Show resolved Hide resolved
@cperciva
Copy link
Member

What does it mean for a byte to be "processed" here? Testing this on my laptop, it looks to me like we're not including data which is being supplied via the chunkification cache -- probably because that bypasses almost everything in libarchive. I suspect this wasn't intentional?

@gperciva
Copy link
Member Author

In terms of the user-visible behaviour, I'd say that not including data from the cache is intentional. Or at least, it wasn't unintended?

Leaving aside the name of --progress-bytes, the fundamental idea is to let people get some reassurance that something is happening. If we assume that data from the cache is retrieved much much much faster than new data is uploaded, then merely examining the bytes uploaded will give a relatively constant set of messages.

Granted, there could theoretically be cases where only a tiny amount of data is uploaded so the limiting factor is the chunkification cache. I'd tentatively assume that anybody in that case will be sufficiently technically skilled that they can understand why we're only notifying about uploaded bytes. Provided that the command-line name and docs are clear about what's happening.

On a related note, --checkpoint-bytes only examines bytes of uploaded data. I deliberately modeled the code for --progress-bytes on that code.

@gperciva
Copy link
Member Author

FWIW, the first version of this work used --upload-progress-bytes #298.

I think I changed it purely because very few other tarsnap args contain two hyphens (the only ones I saw from skimming are --keep-newer-files, --maxbw-rate-down, --maxbw-rate-up, --newer-mtime-than, other than the --no- options), and those two-hyphen args are clearly just modifying an existing one-hyphen args. But in the interest of clarity, I'm happy to go back to --upload-progress-bytes.

@cperciva
Copy link
Member

The reason I was surprised by data handled via the chunkification cache not being counted -- once I figured out what's what was happening, which took some time since all I saw at first was "this says it's handled 2 GB of data, but I swear there's more data than that going into this archive" -- is that the chunkification cache is... well, a cache. It's a performance optimization, but it's not supposed to have user-visible effects. So I'd say that reporting progress differently based on whether something was cached or not qualifies as "astonishing behaviour".

@cperciva
Copy link
Member

Just to be clear, I'm talking about the total_uncompressed value here, which is distinct from the amount of data which has been uploaded.

@gperciva
Copy link
Member Author

Woah! I didn't realize you were talking about total_uncompressed. Yeah, that's totally a mistake and those bytes should be counted! I'll revise it.

@gperciva gperciva added the depends PR depends on other PR(s) label Sep 3, 2018
This was referenced Sep 3, 2018
@gperciva gperciva force-pushed the print-progress-no-newbytes branch from 95c4473 to 300bd34 Compare January 3, 2019 01:51
@gperciva gperciva removed the depends PR depends on other PR(s) label Jan 3, 2019
@gperciva gperciva changed the title Print progress no newbytes Add --progress-bytes SIZE Jan 3, 2019
@gperciva
Copy link
Member Author

gperciva commented Jan 3, 2019

Updated version, rebased on master. This lets the user specify something like --progress-bytes 1M to get progress messages (as if she sent a SIGINFO every so often).

Note:

  • this PR doesn't add an extra siginfo_printinfo() check after finishing writing a file. As a result, we can end up in situations like this:
$ ./tarsnap --dry-run -c --progress-bytes 1k zeros-300k zeros-300k 
adding zeros-300k (0 / 300000 bytes)
Processed 1 files, 300544 bytes
$

this seems odd, since we're definitely adding 2 files!

I do have a separate branch which addresses that to produce:

$ ./tarsnap --dry-run -c --progress-bytes 1k zeros-300k zeros-300k 
added zeros-300k (300000 / 300000 bytes)
Processed 1 files, 300544 bytes
added zeros-300k (300000 / 300000 bytes)
Processed 2 files, 601088 bytes

but note the changed adding to added. Also, it's no guarantee that we'll see an accurate final message, depending on the filesizes involved:

$ ./tarsnap --dry-run -c --progress-bytes 250k zeros-300k zeros-1001
added zeros-300k (300000 / 300000 bytes)
Processed 1 files, 300544 bytes

I was going to start adding a "definitely print a final progress message at the end" thing, but at that point I wondered if it was really worth it. I mean, this is really only intended for people who want the reassurance of seeing some progress updates, so does it really matter what the final message says before it ends? I don't have any firm intuitions about this.

@gperciva gperciva force-pushed the print-progress-no-newbytes branch 2 times, most recently from 40923e5 to 4b89fcc Compare May 30, 2019 05:10
@gperciva
Copy link
Member Author

Updated version, rebased on master. This lets the user specify something like --progress-bytes 1M to get progress messages (as if she sent a SIGINFO every so often).

Note:

  • this PR doesn't add an extra siginfo_printinfo() check after finishing writing a file. As a result, we can end up in situations like this:

    $ ./tarsnap --dry-run -c --progress-bytes 1k zeros-300k zeros-300k 
    adding zeros-300k (0 / 300000 bytes)
    Processed 1 files, 300544 bytes
    $
    

    this seems odd, since we're definitely adding 2 files!

    Granted, this is really only intended for people who want the reassurance of seeing some progress
    updates, so does it really matter what the final message says before it ends? I don't have any firm
    intuitions about this.

If/when this PR is merged, I'll tackle the matter of displaying "new bytes" with these messages (which is #305).

@gperciva
Copy link
Member Author

gperciva commented Oct 3, 2019

@cperciva: I have a new version working, but I have two options for the code.

  1. in write.c,
raise(SIGUSR1);
siginfo_printinfo(bsdtar, 0);

where the siginfo_printinfo relies on the interrupt handler from SIGUSR1 having been processed. It seems weird to use a signal like that, but it has the benefit of keeping existing code the same.

  1. in write.c
siginfo_printinfo(bsdtar, -1);

and in siginfo.c, change these lines:

        /* Sanity check. */
        assert(progress >= 0);

        /* Quit if there's no signal to handle. */
        if (!siginfo_received)
                return;

into:

        /* Quit if there's no signal to handle (unless we're forcing a printinfo). */
        if ((!siginfo_received) && (progress != -1))
                return;

@cperciva
Copy link
Member

cperciva commented Oct 3, 2019

POSIX says that "If a signal handler is called, the raise() function shall not return until after the signal handler does." so I'd go with the first option.

@gperciva gperciva added the depends PR depends on other PR(s) label Oct 8, 2019
@gperciva gperciva mentioned this pull request Oct 8, 2019
@gperciva gperciva force-pushed the print-progress-no-newbytes branch from 4b89fcc to 36b5e85 Compare October 15, 2019 06:07
@gperciva gperciva removed the depends PR depends on other PR(s) label Oct 15, 2019
@gperciva gperciva force-pushed the print-progress-no-newbytes branch from 36b5e85 to 08a4050 Compare October 15, 2019 17:08
@gperciva gperciva added the depends PR depends on other PR(s) label Oct 15, 2019
@gperciva gperciva removed the depends PR depends on other PR(s) label Oct 15, 2019
@gperciva gperciva force-pushed the print-progress-no-newbytes branch from 08a4050 to 803a5fd Compare October 15, 2019 23:32
@gperciva
Copy link
Member Author

I think this is finally ready.

Examples of -c

$ ./tarsnap --dry-run -c --progress-bytes 1k 100k-1 100k-2 
Processed 1 files, 100864 bytes
adding 100k-2 (0 / 100000 bytes)
Processed 2 files, 201728 bytes
$ ./tarsnap --dry-run -c --progress-bytes 60k 100k-1 100k-2 
Processed 1 files, 100864 bytes
adding 100k-2 (0 / 100000 bytes)
Processed 2 files, 201728 bytes
$ ./tarsnap --dry-run -c --progress-bytes 10000k 100k-1 100k-2 
Processed 2 files, 201728 bytes

examples of -x

$ ./tarsnap -xf 100k1-100k2 --progress-bytes 1k
Processed 1 files, 101376 bytes
extracting 100k-2
Processed 2 files, 202752 bytes
$ ./tarsnap -xf 100k1-100k2 --progress-bytes 100k
Processed 1 files, 101376 bytes
extracting 100k-2
Processed 2 files, 202752 bytes
$ ./tarsnap -xf 100k1-100k2 --progress-bytes 10000k
Processed 2 files, 202752 bytes

@cperciva cperciva merged commit 9142e72 into master Oct 16, 2019
@gperciva gperciva deleted the print-progress-no-newbytes branch October 16, 2019 01:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants