Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

progress-bytes #305

Closed
wants to merge 9 commits into from
Closed

progress-bytes #305

wants to merge 9 commits into from

Conversation

gperciva
Copy link
Member

No description provided.

@gperciva gperciva added the draft label Feb 19, 2018
@gperciva
Copy link
Member Author

New version of printing progress (formely #298). No man page or autocompletion yet.

This version is cleaner, and applies to -x and --dry-run -c as well:

td@mac: ~/src/tarsnap/build/foo (print-progress)
$ ../tarsnap --dry-run -c --progress-bytes 2M ~
tarsnap: Removing leading '/' from member names
adding home/td/backup-tests/backup-backup/zeros-900k-b.txt (0 B / 900 kB bytes) [processed 2.1 MB bytes on disk]
adding home/td/backup-tests/backup/zeros-900k-b.txt (0 B / 900 kB bytes) [processed 4.8 MB bytes on disk]
adding home/td/backup-tests/backup-tiny/lorem-1000.txt (0 B / 1.0 kB bytes) [processed 7.6 MB bytes on disk]
^C
td@mac: ~/src/tarsnap/build/foo (print-progress)
$ ../tarsnap -x -f a3 --progress-bytes 100k
extracting tar/tarsnap-read.o [processed 115 kB bytes on disk]
extracting tar/tarsnap-matching.o [processed 291 kB bytes on disk]
extracting tar/chunks/ [processed 404 kB bytes on disk]
extracting tar/tarsnap-siginfo.o [processed 568 kB bytes on disk]

I anticipate some nitpicking over the processed XXX bytes on disk] message. My thoughts:

  • keeping a single progress message on a single line is good, even if that line doesn't fit into 80 chars. (otherwise scripting becomes more complicated)
  • I wanted to be clear that the [] messages referred to the local disk, not the encoded network traffic.

@gperciva
Copy link
Member Author

gperciva commented Apr 3, 2018

IIRC you suggested off-line that the [processed XXX bytes on disk] messages should go on a second line. Is that still the case?

There was a suggestion to use isatty() #112 (comment). I'd rather not, for precisely the reason you suggested caution: it would be too easy for users to get confused between what they see on the console, and what they might see if they redirect to a file. Since --progress-bytes is an optional command, which sends info to stderr, I don't think we need to worry about trying to "hide" that info. Users can choose if they want it or not.

@cperciva
Copy link
Member

cperciva commented Apr 7, 2018

IIRC you suggested off-line that the [processed XXX bytes on disk] messages should go on a second line. Is that still the case?

Yes. And maybe that text could be a bit clearer -- to say that we've processed XXX bytes of archive data, not just XXX bytes from the file we mentioned on the previous line...

There was a suggestion to use isatty() #112 (comment). I'd rather not, for precisely the reason you suggested caution: it would be too easy for users to get confused between what they see on the console, and what they might see if they redirect to a file. Since --progress-bytes is an optional command, which sends info to stderr, I don't think we need to worry about trying to "hide" that info. Users can choose if they want it or not.

Yeah. Using isatty to quiet output automatically is generally an odd thing to do, but it's especially silly when users have explicitly asked for output...

@gperciva
Copy link
Member Author

gperciva commented Apr 8, 2018

Do you mean compressed archive data, or uncompressed archive data? The XXX bytes from disk referred to uncompressed archive data -- IMO, "archive data" isn't a clear term. I thought that "bytes on disk" was easier to understand.

@cperciva
Copy link
Member

cperciva commented Apr 9, 2018

I think we decided offline that we want these messages to end up being something like

Processed 16097 files, 809574400 bytes; Uploaded: 259782144 bytes

(with numbers of bytes humanized depending on that option).

@gperciva
Copy link
Member Author

Updated with another DRAFT.

  • adding the global variable feels really icky.
  • I thought it would be more useful to track the "new data", instead of "uploaded", so that it works well with --dry-run.

Sample:

$ ./tarsnap --dry-run -c --progress-bytes 300k lib
adding lib/scryptenc
Processed 3 files, 953 kB; New data: 273 kB 
adding lib/network/lib_libtarsnap_a-tsnetwork_select.o (0 B / 5.0 kB bytes)
Processed 46 files, 1.2 MB; New data: 358 kB 
adding lib/keyfile/.deps
Processed 86 files, 1.5 MB; New data: 439 kB 
adding lib/crypto/.deps/lib_libtarsnap_a-crypto_keys.Po (0 B / 7.3 kB bytes)
Processed 116 files, 1.8 MB; New data: 528 kB 
                                       Total size  Compressed size
All archives                               4.2 MB           1.1 MB
  (unique data)                            3.9 MB           1.1 MB
This archive                               1.9 MB           545 kB
New data                                   1.9 MB           545 kB

Extracting:

$ ./tarsnap -x -f a3 --progress-bytes 300k 
extracting tar/tarsnap-cmdline.o
Processed 11 files, 319 kB
extracting tar/.deps/tarsnap-bsdtar.Po
Processed 23 files, 632 kB
extracting tar/storage/tarsnap-storage_delete.o
Processed 45 files, 957 kB
^C

@cperciva
Copy link
Member

$ ./tarsnap --dry-run -c --progress-bytes 300k lib
adding lib/scryptenc
Processed 3 files, 953 kB; New data: 273 kB 
adding lib/network/lib_libtarsnap_a-tsnetwork_select.o (0 B / 5.0 kB bytes)
Processed 46 files, 1.2 MB; New data: 358 kB 
adding lib/keyfile/.deps
Processed 86 files, 1.5 MB; New data: 439 kB 
adding lib/crypto/.deps/lib_libtarsnap_a-crypto_keys.Po (0 B / 7.3 kB bytes)

Is there a reason the first line is at 953 kB?

@@ -1770,6 +1774,20 @@ dooption(struct bsdtar *bsdtar, const char * conf_opt,

bsdtar->option_print_stats = 1;
bsdtar->option_print_stats_set = 1;
} else if (strcmp(conf_opt, "progress-bytes") == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And handle --no-progress-bytes?

#include "rwhashtab.h"
#include "warnp.h"

#include "chunks.h"

uint64_t tarsnap_glob_newbytes = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having a global variable, can we pass the chunks cookie to the siginfo code so that it can pull the data out of the chunks layer statistics which we're already keeping?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is how to link siginfo code and the chunks layer.

  • siginfo is all part of bsdtar.h.
  • CHUNKS_W is owned by the multitape layer.

I've sketched a version of what I think you're suggesting, but it has a few WTF moments. I kind-of hope that I misunderstood something.

  • chunks_write_newzbytes() refers to chunkstats .s_zlen directly (where previously that was only used in chunks_stats_internal.c and chunks_directory.c. [I could fix this by adding a chunks_stats_calcznew() function, but that seemed like overkill at this stage.]

  • multitape_write.c now includes bsdtar.h, for access to the new siginfo_setchunks().

  • siginfo.c now includes chunks.h.

  • siginfo.c now contains a static CHUNKS_W * chunks_write_cookie.

It might be cleaner to put CHUNKS_W in struct bsdtar directly?

@gperciva
Copy link
Member Author

gperciva commented Apr 11, 2018 via email

@gperciva
Copy link
Member Author

Updated PR with a really questionable (IMO) design. Not for merging.

tar/siginfo.c Outdated
* and trigger a siginfo_printinfo() if it exceeds tarsnap_opt_processbytes.
*/
static void
siginfo_raisetotal(int64_t total)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not take the bsdtar structure here (or possibly bsdtar->siginfo)? And put lastprogress into said structure?

@cperciva
Copy link
Member

Hmm. Maybe it would be cleaner to pass struct siginfo_data * from tarsnap_mode_c to archive_write_open_multitape to writetape_open to chunks_write_start, and then have chunks_write_chunk make a call to siginfo_blahblahblah to update its statistics?

It's probably a fairly large diff given how many layers the siginfo_data cookie needs to get passed through, but it would avoid the global variable and layering violation.

@gperciva
Copy link
Member Author

gperciva commented May 3, 2018

This gets way uglier than we realized, because of recrypt.

If we want chunks_write_chunk to call siginfo_blahblahblah, then it needs to #include "bsdtar.h". If we do that, then recrypt will implicitly #include "bsdtar.h", so we need to do something with the global tarsnap_opt.h values which are at the top of recrypt/recrypt.c.

Looking at tarsnap as an independent program, I think the best bet would be to split off siginfo stuff from bsdtar:

  • put the function headers into a new siginfo.h
  • return an opaque cookie
  • etc

FWIW, siginfo.c was removed before libarchive 3.0 and merged with bsdtar.c, read.c, and write.c directly: libarchive/libarchive@c609f08
So I don't think this has an impact on how messy it will be to upgrade to a newer libarchive. (it'll be a mess no matter what we do with siginfo now)

@gperciva
Copy link
Member Author

gperciva commented May 4, 2018

Ignore the previous comment. (I still think that bsdtar.h is a mess, but it's not a serious problem here.)

@gperciva
Copy link
Member Author

gperciva commented May 8, 2018

I'm happier with this design.

The one part I'm not sold on is passing struct siginfo_data * as (void **). I think that we need the extra indirection because when write.c calls archive_write_open_multitape(), bsdtar->siginfo is still NULL, and I didn't think it was worth trying to play weird games in write.c.

@gperciva gperciva removed the draft label May 8, 2018
tar/siginfo.c Outdated
lastprogress = total;

/* Fake a SIGINFO (no need for an actual signal). */
siginfo_received = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a dumb question, but is there a reason to not call siginfo_printinfo here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, we don't have the struct bsdtar here. But that just raises another question, why isn't this just part of siginfo_setinfo?

And if we do that, lastprogress can be part of the siginfo structure rather than being a random static variable here...

Copy link
Member Author

@gperciva gperciva May 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used static uint64_t lastprogress because that's what happened in raisesigs() from tar/storage/storage_write.c (with lastcheckpoint). I thought it was a nice solution, actually -- it keeps the variable scope isolated to the code that actually needs it. Are you sure that you'd rather have it inside struct siginfo_data?

Good point about moving it into siginfo_setinfo().

tar/siginfo.c Outdated
@@ -51,6 +51,15 @@ struct siginfo_data {
/* How large is the archive entry? */
int64_t size;

/* How many files in total? */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/files in total/files have we handled in total/ ?

Or is there a reason this comment is different from the other two?

* Start a write transaction using the cache directory ${cachepath} and the
* storage layer cookie ${S} which will involve chunks of maximum size
* ${maxchunksize}.
* ${maxchunksize}. If ${siginfo_newbytes} is not NULL, call it with
* ${siginfo_cookie} and the number of new bytes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That last sentence feels like it ended too soon. "... after writing each chunk"?

@gperciva
Copy link
Member Author

gperciva commented Jul 1, 2018

Should I rebase these fixes?

tar/siginfo.c Outdated
@@ -115,6 +114,7 @@ void
siginfo_setinfo(struct bsdtar *bsdtar, const char * oper, const char * path,
int64_t size, int file_count, uint64_t total_uncompressed)
{
static uint64_t lastprogress = 0;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used static uint64_t lastprogress because that's what happened in raisesigs() from tar/storage/storage_write.c (with lastcheckpoint). I thought it was a nice solution, actually -- it keeps the variable scope isolated to the code that actually needs it. Are you sure that you'd rather have it inside struct siginfo_data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice for scoping reasons; but it's bad because aside from scope it's a global variable. Singletons should be avoided except when there's a very good reason for them...

@cperciva
Copy link
Member

cperciva commented Jul 1, 2018

It has been too long since I looked at this for me to have any cached context, so please go ahead and tidy up the commit history -- that way it will be clearer when I look at it ab initio.

@gperciva
Copy link
Member Author

gperciva commented Jul 2, 2018

Rebased PR.

The handling of "new bytes" still feels overly complicated, but given that we're passing info between radically different layers of the program, it might be unavoidable. If you'd like to narrow the scope, I could make a PR that adds a --progress-bytes which prints the number of files and total data processed; printing the number of new bytes could be handled in a separate PR.

kientzle and others added 9 commits August 4, 2018 09:35
No change of functionality; just splitting up the fprintf()s so that the code
is easier to understand when there's more items to print.
The public interface siginfo_setinfo() takes
    int64_t total_uncompressed
for consistency with libarchive, but stores that as an (uint64_t) internally
because that simplifies the handling inside siginfo and the upcoming
tarsnap_opt_progressbytes (next commit).
This is not used anywhere yet, but will be part of printing "new bytes".
This isn't hooked up to the chunks layer yet.

We need the extra indirection of
    void ** siginfo_data
because when write.c calls archive_write_open_multitape(), bsdtar->siginfo is
still NULL.
@gperciva gperciva added draft depends PR depends on other PR(s) and removed draft labels Aug 20, 2018
@gperciva
Copy link
Member Author

This is waiting until #313 is accepted first.

@gperciva gperciva closed this May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
depends PR depends on other PR(s)
Development

Successfully merging this pull request may close these issues.

3 participants