Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise Auto- and Smart-Remove feature #1945

Open
1 of 4 tasks
buhtz opened this issue Nov 27, 2024 · 13 comments
Open
1 of 4 tasks

Revise Auto- and Smart-Remove feature #1945

buhtz opened this issue Nov 27, 2024 · 13 comments
Assignees
Labels
Code Quality About code quality, refactoring, (unit) testing, linting, ... Meta

Comments

@buhtz
Copy link
Member

buhtz commented Nov 27, 2024

Summary

This is a meta issue covering a complex topic related to several issues that can't be solved isolated. The behavior of auto- and smart-remove feature is "unclear". It is not documented. Users have different expectations, based on incorrect wording in the GUI and several bugs. Code related to auto- and smart-remove is not covered enough by tests and not able to test because of its structure.

Beside the details my approach would be to create a mock-up but clear documentation and GUI the way how the auto-/smart-remove feature should work. That docu & GUI then might be a good base for further discussions not focused to much on code details.

What need to be done

  • Refactoring the related code base to make it testable.
  • Write unit tests for that code.
  • Document the current behavior (PR doc: Investigated smart- and auto-remove behavior #1944).
  • Provide an overall solution:
    • Fix and modify the behavior where it might be appropriate.
    • Clear wording in the GUI & the documentation about the behavior.

image

Notes and ideas

Related issues

@buhtz buhtz self-assigned this Nov 27, 2024
@buhtz buhtz added the Meta label Nov 27, 2024
@buhtz buhtz added this to the 2nd release from now milestone Nov 27, 2024
@buhtz buhtz added the Code Quality About code quality, refactoring, (unit) testing, linting, ... label Nov 28, 2024
@buhtz
Copy link
Member Author

buhtz commented Dec 18, 2024

Mockup

This is my current mockup about how the new auto-remove tab/dialog could look like. The wording is improved reflecting the slightly modified (and fixed) behavior. Suggestions, corrections, improvements are welcome.

  • The yellow boxes are tooltips.
  • Term "smart" removed.
  • The "smart remove" checkbox to disable/enable all "smart" rules at once are removed.
  • Replaced term "snapshot" with "backup". (Rephrase "snapshot" to "backup" #1929)
  • Use term "preceding" to make clear that the previous completed day/week/month/year is used for counting. The current (still running and incomplete) day/week/months/year is ignored.
  • In all rules the "most recent" (latest, youngest) backup is kept. If present for all units it would be 23:59 (days), Sunday (weeks), 31th (months), 31th December (years). Currently this is not consistent in BIT, e.g. for months the oldest (1th day in month) is used.
  • The group boxes might not be a good idea.
  • Keep the goal in mind, making the behavior consistent and more clear. It is not the goal to totally modify it, e.g. changing the order of the rules or something like this.

image

Beside this GUI and its tooltip the user manual will be improved with this topic explaining more details and giving examples.

EDIT:
I also think about if it is good choice count the "preceding" days/weeks/months/years only? It might be more "natural" to think that the current still running day/week/month/year is also relevant. And it might be easier to explain and express in the GUI.

@virtuallyunknown
Copy link

I think the sentences in the tooltips are a bit confusing, or at least not super easy to understand:

Complete weeks are counted, starting with the previous week (date from-to).
Complete months are counted, starting with the previous month (monthname).

  1. Shouldn't it say completed (past-tense) instead of complete? I understand what a completed week is, a week that's already over. A complete week on the other hand refers to a period of time, but doesn't necessarily imply something in the past.
  2. The terms between the parentheses, I am not entirely sure what their meaning or significance is.

Apologies if this is not the greatest feedback, I'm not a UI/UX expert by any stretch of the imagination, nor a native English speaker for that matter. Just trying to be helpful.

@noyannus
Copy link

noyannus commented Dec 18, 2024

"Preceding" is a bit unclear. Preceding to ... what? Now, probably, but it's not stated. Better be very explicit (KISS principle), use something like
"keep all backups...
...from the last NN XX"
...not older than NN XX" (or "younger than")
...created since NN XX ago" (or ...created NN XX ago and later")
These ideas are not good enough, but you get the idea.

In the mockup, space and inodes need to go on top. They are technical limitations that set the constraints for everything else.

You can omit the "older than NN years", because all snapshot ages are max'd by their respective number of periods (at least the longest -- years -- must have a default value). Also, make sure that always the highest allowed backup age is obeyed. Age precedence instead of rule application precedence removes the need for the user to think of the latter, except for space and inodes, and is easier to understand -- more KISS. (Dunno if that's not the case already.) If someone wants to keep 1000 weekly backups (I can't think of a reason why, but let's accept it for the sake of the argument), the first applied 15 years would delete the earliest 220 backups they wanted to keep.

Let the user set the max. number of yearly backups, for consistency, and because the "older than NN years" is gone. Also, here a consistent wording would be better: not "per year" but "yearly" (all the other periods are "-ly"), so that no inconsistent wording might confuse the user to be wary of a different treatment of the number.

Also, add an option that is time-independent. IIRC, grsync has something like "No matter what, keep the NN last backups". That has its use: Imagine that the user goes on holidays or sick leave for several weeks, then comes back and finds there was a big mistake made late on the last working day (they were already thinking of the holiday, say, or too ill to concentrate). After an absence of several weeks and the number of retained dailys set to too low, they have to go back up to a week for the status before the mistake and lose the good work until right before the mistake. If there were still the last, say, 20 hourly backups around, they'd lose 59 minutes maximum.

In case users want it less fine grained, this could be done with absolute numbers for every period -- "Keep one x-ly backup for NN , but at least the last MM ."
Example: "Keep one monthly backup for 12 months" plus "Always keep the last 10 daily backups" would allow to pick up work at a point within the last 10 days of work even after an extended absence from christmas to easter.

I hope that's not too confused... hard day.

@buhtz
Copy link
Member Author

buhtz commented Dec 18, 2024

Thank you for your detailed answer. I appreciate that you took your time to dive into it.
I have to ask back because I did not understand all details.

In the mockup, space and inodes need to go on top. They are technical limitations that set the constraints for everything else.

But this rules are executed at last. That is what the GUI tries to reflect.
I don't plan to modify the execution order for the upcoming release. I don't want to break to much in the beginning. And I need to refactor the existing behavior to even be able to create unit tests for it. I don't want to modify to much behavior without having test coverage of the current behavior. But we can think about working on this in a later step.

Do I get you right, that the space/inode rules should get executed as first and before every other rule?

You can omit the "older than NN years", because

I do agree here. But that is also something I wouldn’t change at the moment but would save for a later version.

Also, make sure that always the highest allowed backup age is obeyed.

Yes, also a good point.

Let the user set the max. number of yearly backups, for consistency, and because the "older than NN years" is gone.

Agreed.

Also, add an option that is time-independent. IIRC, grsync has something like "No matter what, keep the NN last backups".

I like that.

In case users want it less fine grained, this could be done with absolute numbers for every period -- "Keep one x-ly backup for NN , but at least the last MM ." Example: "Keep one monthly backup for 12 months" plus "Always keep the last 10 daily backups" would allow to pick up work at a point within the last 10 days of work even after an extended absence from christmas to easter.

Got it.

I hope that's not too confused... hard day.

Thank you very much for your input. I will collect your ideas for "new" behavior in separate Issues and schedule them for later releases. And for the current issue I will also consider your input.

@noyannus
Copy link

noyannus commented Dec 18, 2024

Thank you for the kind words.

In the mockup, space and inodes need to go on top. They are technical limitations that set the constraints for everything else.

But this rules are executed at last. That is what the GUI tries to reflect.
...
Do I get you right, that the space/inode rules should get executed as first and before every other rule?

If they are applied last, this could cause removal of backup files (fresh or oldest doesn't matter here). First the backup would be written, then (if the total backups size were over the limits) the backups would be cut back to the allowed limits.
It makes me feal queasy that user-set limits are overwritten automatically, and that files are removed after they seemingly have been successfully written. Possibly even without user notification...
Imagine a machine crash after the backup had been completely written, exceeding the allowed space/inodes, but before the total backup size could be decreased again as needed and the according error cold be raised/notified/logged. The excess space/inode use would not become known, nor would the reduction fully take place.
If, OTOH, BiT determined the available space/inodes first, it cold stop during the backup writing with an "out of allowed space or inodes" error when it reached that limit. It has to monitor its space/inode use for that, of course.
So the prodecure would be:
-Apply the space/inode rule first, that is, BiT determines how much space / how many inodes are available within the set limits,
-then write the backup while continuously checking that the next file to write will not cause the total backups size to exceed the limit.
-If it will, stop writing, raise the above error and notify/log, mark the backup as incomplete, and give some feedback to the user as to what caused the error, and that backups need to be pruned or moved to a bigger volume before operations can be taken up again.


Another wording point: "Applied in descending order" uses a word that in IT circles applies to values ("sort descending"). I would be more clear that their sequence is meant with "Back In Time goes through the rules below from top to bottom. Should rules conflict, the longer retention time will be applied" or so.

@DerekVeit
Copy link
Contributor

I like the idea of handling the space and inode limits differently.

As it is, it's unclear what gets deleted if you are out of space or inodes. I have always kept those unchecked, because in such a circumstance, I would prefer that it just fails, and then I will look at what's going on, maybe reducing the retention if appropriate.

For example, I've had it happen that I added a mount to (huge) external storage without remembering to exclude it before the backup ran. If Back In Time were just deleting backups as needed to back that up, it could wipe out all of my backups just to create some monster backup that I don't want.

@noyannus
Copy link

As it is, it's unclear what gets deleted if you are out of space or inodes.

U-oh... And that is reknowned, recommended, and widely deployed... And nobody noticed and acted!?! <shakes head sadly, then starts banging head against wall>

You guys are doing a d*n needed job here!

To find out what actuall happens:

  1. Use shell to create a tree of folders, each holding one or a few small, equal-size test files.
    Some nested for ... in ... loops to assemble path and filename, with mkdir -p <path> and dd if=/dev/urandom of=<path/and/filename> bs=1M or so inside --you get the idea.
    Would be helpful if they were named in a way that their position can be deduced from the name alone, like 3.4.6.7= "folder '7' in folder '6' in folder '4' in folder '3'; that would make it easier to evaluate long file/folder lists later.
    Also good for a vgrep[*]: use fixed number of digits for all name-numbers; or limit the files/folders to one digit number.

  2. Create a small partition somewhere, larger than the space needed for the above tree, but not too much larger. Smaller than 2x the total backup size would be right, if you want to see BiT behaviour when it hits a space limitation without space/inode limit set, and don't want to write several backups first.

  3. (Un)set limits according to what you want to test. The space/inode settings should not allow a full backup, and of course be lower than partition capacity.

  4. Back up the file tree to the new partition.

  5. Use shell to overwrite all the files in the tree. (Maybe touch is enough to trigger a need for fresh backup of the files? Then it's find <partition> -xdev -exec touch '{}' \; <= no search criteria! ).

  6. Back up the tree again.

=> Observe how BiT behaves when it reaches the limit. You might want to use a tool that logs and/or continuously monitors used inodes and total backups size. Logging would be great to see if things were written and later removed.
=> Check what files were added to the backup, i.e. what is not a hardlink AND younger than first backup.

[*]"visual grep", i.e. scan with your eyes through long lists.

@buhtz
Copy link
Member Author

buhtz commented Dec 19, 2024

I am pretty sure, after checking the code, that the oldest backups are removed until enough free space/inodes are available again. And yes, this happens without respecting the other rules.

Not cool, I know. But a rare case. I won't change it now, for the next release.

But I definitely will change that behavior in the future. I like the idea to just remove those rules and transform into a clear warning/error message. Just give a message if the user defined threshold is reached.

We have some other issues related to out-of-space situations. Beside using a user definied threshold BIT could also estimate the expected size (and maybe inodes) that might be used by a new backup. And it would warn before starting the backup that there might be not enough space/inodes anymore. But the estimate would be very blurry and might cause more confusing then benefit.

@noyannus
Copy link

For the current out-of-space removal, data loss may occur. If it's not there already (have not checked), add a line to warn users to the Readme, and to known issues, just to save the one or two rare cases from the looming doom and despair. :-) A small GUI label "This overrides all other settings here" would be a good precaution, too.

Pre-backup guesstimates would be too inaccurate, IMO, even when derived from recent backups. E.g, someone switches from mail and text work to work with virtual machines (with their large disk files, esp. if these are not snapshotted themselves), or vice versa, and any estimate will be misleading at best.

In BiT, you could implement an early warning, sort of "Your backups are approaching the [ maximal available space of the backup volume | [ space|inode ] limit you have set ]." Any default of sufficient size will be useful. (Just guessing for space: 10% or 50GB left; I have no idea for a good inodes value here.)

This could also be customized by the user in an additional child setting below the limit setting lines: ".. and alert me when <space|inode warning setting> is reached."

The warning could also be independent from limits acutally being set and positioned in the GUI on the same level: "Warn me when only XXX free space-units/inodes are left" (absolute or as percentage).

The customizable in-app warnings are definitely a nice to have, to be postponed now, IMO (unless they are rather quick to implement; no idea if they really are).

@DerekVeit
Copy link
Contributor

Deleting oldest backups first in that case is what I would have guessed. And those are not the ones I would consider most disposable if I were choosing manually.

But someone with a very different retention strategy might find it useful. For example, someone who only keeps daily backups for the last N days might be happy with that N being reduced rather than missing a backup.

Looking ahead to check for space or inode availability becomes a lot of work trying to predict something that can't really be made reliable anyway. And if you are out of space, failing before trying is only marginally better than failing after trying.

@ceperman
Copy link

Sorry to be late to this party!

I like the redesign in the mockup, especially matching the order of actions in the GUI with execution order and separating the "Keep" rules. This makes it easier to understand how the options work together, ie. sequentially.

IMO there is no reason to remove the first option (remove if older than N years) or the last (remove using free space or inodes criteria). Changing the order in the GUI has already made things clearer, and with suitable documentation these options can be useful. Users can choose to use them or not. I think the concerns about these options (eg. which backups get removed to free up space or inodes) mostly disappear if their operation is properly documented. The most obvious method is to remove the oldest and I think this is what most users would expect. Ultimately if users don't like how it works, then they just don't use it. They can remove backups manually.

As "one snapshot per year for all years" is not an option I don't see why it needs to be here, esp. in the "Keep" box. Either move it to the bottom as a comment, or even remove it. This feature of BiT should be documented anyway.

Need to define "week" - ie. Sunday to Saturday, or Monday to Sunday and not any 7 consecutive days. Also should state that "month" means a calendar month.

Also, add an option that is time-independent. IIRC, grsync has something like "No matter what, keep the NN last backups".

I'm not sure this is necessary - just don't use auto-remove. The remove/keep options are already complex without adding more.

The documentation should include a few example configurations with clear explanations of how backups are retained and deleted as time progresses.

I'm not sure that all the "hand-holding" that's been suggested is needed. Good documentation (I keep repeating!) solves most issues. As a native English speaker I will offer to help if wanted. IMO the most useful extra would be a trail run option (like the rsync -n option) - show what would happen without actually doing anything.

buhtz added a commit that referenced this issue Dec 23, 2024
Documentation of research about how the smart- and auto-remove of BIT is working.

- A maintain-docu file is added.
- Doc strings improved
- Fix minor bug: `smartRemoveKeepAll()` now check dates and not snapshot IDs. That caused inconsistent behavior with keep-rule on day level.
- Refactor: increment and decrement months
- New unit tests.

Related to #1945
@buhtz
Copy link
Member Author

buhtz commented Dec 23, 2024

Thank you for all your input.

@buhtz
Copy link
Member Author

buhtz commented Dec 26, 2024

Hello,

again I am interested in your opinions about my approach about the auto-remove feature for the upcoming release.
For a new mockup and details please see #1976.

Regards,
Christian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code Quality About code quality, refactoring, (unit) testing, linting, ... Meta
Projects
None yet
Development

No branches or pull requests

5 participants