Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weekly snapshots not being retained #1094

Open
daveTheOldCoder opened this issue May 27, 2020 · 39 comments · May be fixed by #1819
Open

Weekly snapshots not being retained #1094

daveTheOldCoder opened this issue May 27, 2020 · 39 comments · May be fixed by #1819
Assignees
Labels
Bug Discussion decision or consensus needed Medium

Comments

@daveTheOldCoder
Copy link

daveTheOldCoder commented May 27, 2020

I'm using the default values for Smart Remove. Hourly and daily snapshots are kept or deleted as expected, but no weekly snapshots are kept.

This seems to be the relevant code in /usr/share/backintime/common/snapshots.py:

        #keep one per week for the last keep_one_per_week weeks
        if keep_one_per_week > 0:
            d = now - datetime.timedelta( days = now.weekday() + 1 )
            for i in range( 0, keep_one_per_week ):
                keep_snapshots = self._smart_remove_keep_first_( snapshots, keep_snapshots, d, d + datetime.timedelta(days=8) )
                d = d - datetime.timedelta(days=7)

I'm trying to figure it out, but I thought I'd ask first if there's a known issue.

@daveTheOldCoder
Copy link
Author

daveTheOldCoder commented May 31, 2020

I found a potential fix for this problem.

I added some "logger.debug" calls, and changed the cron table entry to create log files so that I could see what's happening.

The function "smart_remove_keep_first" keeps the newest snapshot in the specified date/time range, since the snapshots are sorted in descending order of date/time. I added a function "smart_remove_keep_last" that reverses the order of the snapshots and thus keeps the oldest snapshot.

In the code in the previous post, I replaced the call to "smart_remove_keep_first" with a call to "smart_remove_keep_last". So far it seems to be working. But it needs further testing and analysis.

If this is actually a bug, I don't understand why it hasn't been reported before. Can someone else review the code/algorithm and provide an opinion?

@buhtz

This comment was marked as outdated.

@daveTheOldCoder

This comment was marked as outdated.

@daveTheOldCoder

This comment was marked as outdated.

@daveTheOldCoder

This comment was marked as outdated.

@daveTheOldCoder
Copy link
Author

daveTheOldCoder commented Sep 25, 2022

This can be fixed with a one-line change in common/snapshots.py:

In def smartRemoveKeepFirst, change:
for sid in snapshots:
to:
for sid in reversed(snapshots):

But it would probably be "cleaner" to rename smartRemoveKeepFirst to smartRemoveKeepLast or simply smartRemove, and then change the calls to it.

@buhtz buhtz added this to the 1.3.5 / 1.4.0 milestone Mar 19, 2023
@buhtz

This comment was marked as off-topic.

@buhtz buhtz added Discussion decision or consensus needed Feedback needs user response, may be closed after timeout without a response labels Mar 20, 2024
@buhtz buhtz self-assigned this Mar 20, 2024
@daveTheOldCoder

This comment was marked as outdated.

@buhtz

This comment was marked as outdated.

@daveTheOldCoder
Copy link
Author

screenshot

@buhtz

This comment was marked as outdated.

@buhtz

This comment was marked as outdated.

@daveTheOldCoder

This comment was marked as outdated.

@emtiu

This comment was marked as outdated.

@emtiu emtiu self-assigned this Jul 17, 2024
@emtiu emtiu removed Feedback needs user response, may be closed after timeout without a response Close after cooling-off period labels Jul 28, 2024
@emtiu
Copy link
Member

emtiu commented Jul 28, 2024

I have created a testbed with the following shell code in a VM:
for N in {1..60}; do sudo date -s '+7 days'; echo date > backupfile; backintime backup; done

In extreme cases, it appears to work as expected: Keeping "one snapshot per week for the last 52 weeks" really does leave 1 years' worth of weekly snapshots. However, I'm not so sure about shorter timespans. Calculating dates is hard!

My current guess is that we may be dealing with an off-by-1 problem here, where a setting of "one snapshot per week for the last n weeks" only actually keeps one for the last n–1 weeks. This would be especially significant for the default configuration of:

  • "one snapshot per day for the last 7 days"
  • "one snapshot per week for the last 4 weeks"
  • "one snapshot per month for the last 24 months"

@buhtz
Copy link
Member

buhtz commented Jul 28, 2024

I once had a tool called faketime I used for things like this. Worked well.

@emtiu
Copy link
Member

emtiu commented Jul 28, 2024

I think I'm very close to cracking this case. In this function:

backintime/common/snapshots.py

Lines 1686 to 1696 in 22f468c

# keep one per week for the last keep_one_per_week weeks
if keep_one_per_week > 0:
d = now - datetime.timedelta(days=now.weekday() + 1)
for i in range(0, keep_one_per_week):
keep |= self.smartRemoveKeepFirst(
snapshots,
d,
d + datetime.timedelta(days=8),
keep_healthy=True)
d -= datetime.timedelta(days=7)

smartRemoveKeepFirst is called with a max_date of d + datetime.timedelta(days=8) — not days=7!

Therefore, in a routine that says "keep only the youngest snapshot of these 8 days to consider", it will also delete the youngest snapshot of the previous week — but only if it's caught by the last in a series of "keep one per week" calls.

In effect, of "one snapshot per week for the last n weeks", the n-th week has its youngest snapshot thrown away, even though it shouldn't.

I'll need to investigate if changing the call of smartRemoveKeepFirst to a max_date of d + datetime.timedelta(days=7) fixes the problem. There must have been some reason to set an 8 there.

P.S.: The official soundtrack for this bug is Only For The Week.

@buhtz
Copy link
Member

buhtz commented Jul 29, 2024

There are some unit tests in test_snapshot.py about that smart remove thing.
I touched some once long ago but then gave up. A full refactoring of that tests would be the best. But a first step would be to see if this use case is covered by that tests and add such a test if not.

@daveTheOldCoder
Copy link
Author

daveTheOldCoder commented Jul 29, 2024

It occurred to me that what makes the "smart remove" behavior confusing is that the snapshots are named using only the timestamp. The names don't contain "hourly", "daily", "weekly", etc.

I wonder if adding an index file that identifies daily, weekly, monthly and yearly snapshots would help.

Or maybe I'm wrong. :)

@emtiu
Copy link
Member

emtiu commented Jul 29, 2024

There are some unit tests in test_snapshot.py about that smart remove thing. I touched some once long ago but then gave up. A full refactoring of that tests would be the best. But a first step would be to see if this use case is covered by that tests and add such a test if not.

I'm confident that I can fix this bug, and to make sure I don't introduce any new ones, I'm doing some manual testing in a VM.

I'm totally unfamiliar with automatic testing, though. I'll take a look, but I'd rather not put this bugfix on the "waiting list" until I've cracked the automatic testing of Smart Remove, because that would probably take a very long time ;)

@emtiu
Copy link
Member

emtiu commented Jul 29, 2024

It occurred to me that what makes the "smart remove" behavior confusing is that the snapshots are named using only the timestamp. The names don't contain "hourly", "daily", "weekly", etc.

I wonder if adding an index file that identifies daily, weekly, monthly and yearly snapshots would help.

Or maybe I'm wrong. :)

I see what you mean, but BiT doesn't work like that. Snapshots are not created as "weekly" or "monthly" snapshots. In fact, they are only characterized by their date+time of saving (plus a random three-digit number, which is called the "tag"). Together, this makes a "Snapshot ID" (SID for short): 20241117-115358-562

The only time when a snapshot might be characterized as "weekly" or "monthly" is when the Smart Remove routine determines: "this snapshot should be kept, because it's the only one from that week/month, and it should be kept according to the removal rules".

Therefore, any single snapshot can start out as an "hourly" snapshot, then become a "weekly" one, and maybe even a "monthly" one later.

@daveTheOldCoder
Copy link
Author

I see what you mean, but BiT doesn't work like that.

I know that. To figure out the workaround I posted above, I added logging code in the smart-remove processing to log the "kept" and "removed" snapshots, so I could see what was happening.

@emtiu
Copy link
Member

emtiu commented Jul 29, 2024

I know that. To figure out the workaround I posted above, I added logging code in the smart-remove processing to log the "kept" and "removed" snapshots, so I could see what was happening.

I see, and I've also understood why your workaround is successful. It's hard to put into words ;)

What happens is that the code considers snapshots from 8 days (instead of 7) when determining which to keep for a specific week. But in a one-snapshot-per-week environment, 8 days will likely contain two snapshots. Since smartRemoveKeepFirst keeps only the newer one, the older one is lost.

This has no consequences as long as the previous week is also considered before deletion. But it fails for the last of the weeks to be considered (resulting in the n–1 situation I described above).

Your workaround reverses the list of snapshots, so in the "8-day-week" that is the last to be considered, the older of the two snapshots is kept, and your problem disappears.

However, it is my understanding that the problem disappears completely when days=8 is changed into days=7 :)

@daveTheOldCoder
Copy link
Author

After verifying that the bug was still present in version 1.4.3 (the latest version available for Ubuntu 24.04), I'm testing your fix (days=8 -> days=7) by making that one-line change in that version.

It will take a few more weeks. I want to verify that the monthly snapshots are retained too.

@buhtz
Copy link
Member

buhtz commented Oct 10, 2024

It will take a few more weeks. I want to verify that the monthly snapshots are retained too.

Cool! Real time testing is the best. 😄

@daveTheOldCoder
Copy link
Author

Ideally, I'd like to write a simulator for the code in snapshots.py that could do the testing in seconds.

But I'm too lazy and senile to accomplish that now. :)

@buhtz
Copy link
Member

buhtz commented Nov 25, 2024

Hello Guys,
After working on PR #1944 (some unit tests for this topic) and investigating emtiu's PR #1819 I am not sure what the problem is.
Dave, your initial post did not describe the problem in details. What is the problem? What is the actual behavior and what is the expected behavior?

I was not able to produce a problem in my testings.

The actual behavior I have observed in the context of "keep one snapshot per week for the last N weeks".

  • The current week is ignored. The calculation starts with the previous week and goes back in time from there.
  • It keeps always the last/newest backup in a week. If there is one backup for each day in the week it will keep always the backup from Sunday.

@daveTheOldCoder
Copy link
Author

It's been 4-1/2 years since I posted the original problem, and I don't remember the exact details.

The fix I posted above works for me.

I'm going to use the released version of BackInTime with my fix applied.

@buhtz
Copy link
Member

buhtz commented Nov 25, 2024

Thank you for the reply.
Will wait for response from Michael. He might bring light into the topic.

Would be nice to have example Dates to reproduce.

@daveTheOldCoder
Copy link
Author

daveTheOldCoder commented Nov 26, 2024

Ideally, the relevant code should be simple and clear enough that its correctness, or lack of correctness, is obvious.

If code is too complex to understand easily, then it's too complex.

But I'm not volunteering to rewrite it. ;)

@buhtz
Copy link
Member

buhtz commented Nov 26, 2024

Hello Dave,
Please allow me to add another question to you.

Out of your guts, does the original "keep one per N weeks" settings include the current running and not complete week or not? From your perspective as a user what would you expect?

The actual behavior I have observed in the context of "keep one snapshot per week for the last N weeks".

  • The current week is ignored. The calculation starts with the previous week and goes back in time from there.
  • It keeps always the last/newest backup in a week. If there is one backup for each day in the week it will keep always the backup from Sunday.

It seems my assumption was not correct. The algorithm try to take the current week also into account. But that does not work in all situations. I am on it...

@daveTheOldCoder
Copy link
Author

daveTheOldCoder commented Nov 26, 2024

Since Oct 10, I've been using BackInTime 1.4.3 with the one-line change in this PR manually applied:
a207fc2

BackInTime was run hourly, with the same Auto-remove settings as in the screenshot I posted above on Mar 20.

Monthly and hourly snapshots are being retained, but I think that the number of weekly backups retained is one less than it should be. For that reason, I recently changed the weekly setting to: "Keep one snapshot per week for the last 5 weeks".

If there is ambiguity in the meaning of the setting, I think that a user would prefer to retain one more, rather than one less, backup. You can remove an extra backup; you can't restore a deleted one.

Here is the current list of snapshots:
20240930-230001-367
20241031-230002-477
20241102-220001-447
20241109-230002-184
20241116-230001-201
20241120-230002-530
20241121-230002-290
20241122-220001-856
20241123-230001-801
20241124-230001-181
(+ hourly snapshots for 20241125 and 20241126)

The same issue may apply to the daily snapshots. It seems like the number retained is smaller than it should be.

When I was using BackInTime 1.2.1, with the patch I posted above on Sep 25, 2022, I didn't observe any problems.

@buhtz
Copy link
Member

buhtz commented Nov 27, 2024

Thank you for providing your data and setup.

I agree that there are "problems" with that algorithm. According to my current understanding of it you do have 5 weekly backups in your list. Some of the backups in your list do hit multiple of of the keep-rules.

  • monthly:
    • 30 Sept.
    • 31 Oct.
  • weekly:
    • 2 Nov
    • 9
    • 16
    • 23
    • 26 (vanilla BIT) or 25 (in your patched BIT because of reversed())
  • daily:
    • 20 to 26 Nov

You see there are 5 weekly backups. The 25/26 is hit twice by the weekly- and the daily-rule.

Just for my own documentation this might illustrate the situation:
Gescanntes Dokument

@daveTheOldCoder
Copy link
Author

daveTheOldCoder commented Nov 29, 2024

Out of your guts, does the original "keep one per N weeks" settings include the current running and not complete week or not? From your perspective as a user what would you expect?

On reflection, I consider "keep one for N weeks" to mean:
Last week
Two weeks ago
Three weeks ago
...
N weeks ago

@buhtz
Copy link
Member

buhtz commented Nov 29, 2024

Is 18. to 24. the "Last week"?

@daveTheOldCoder
Copy link
Author

daveTheOldCoder commented Nov 29, 2024

In my opinion, "last week's backup" is the snapshot taken approximately seven days ago. It's hard to define precisely.

It's like art. It's hard to define, but you know when you like it.

@buhtz
Copy link
Member

buhtz commented Nov 29, 2024

I see, you define "a week" as a timedelta and not as a calendar element starting on Monday.

@daveTheOldCoder
Copy link
Author

In this context, I suppose so. I haven't given the matter much thought.

@buhtz
Copy link
Member

buhtz commented Dec 17, 2024

Might not be directly relevant to this issue. But I investigated the current auto-/smart-remove behavior and documented it. I found inconsistent behavior and minor bugs. There is a lot potential to improve and refactoring.

Before starting with this I would like to request your opinions and suggestions about the next steps. Please see the issue #1945 for details and a mockup for the new auto-remove dialog-tab.

Feedback and suggestions welcome. We appreciate your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Discussion decision or consensus needed Medium
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants