-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
separable repository format #370
Comments
I am interested in similar use cases, so I'll answer based on what I have found, though I have not worked on the internals, so I can't answer definiitively. You would probably like to look at this internals documentation in the borgbackup repo. You would probably also be interested in looking at borgbackup/borg#102 about working with Amazon S3. Working with S3 would be similar to what you want to do since ideally you would want to send new chunks to S3 without needing to read previous chunks (S3 has additional issues due to its eventual consistency behavior). Regarding your first question, a cache of file metadata is maintained to allow for skipping unchanged files. This cache can be deleted at the cost of re-reading the files for the next archive, but note that it seems that the actual data segments are not used for this check, so it should be okay to delete them at least in regards to how the file metadata checking is done. Regarding the second question, as explained in the internals document, attic/borg maintains a manifest of all the archives which it stores in the final data segment file. Every time a new archive is created the segment file(s) with the manifest is deleted and the manifest is written out onto the end of the new last segment file. Otherwise, it seems like segment files are not touched by the Regarding implementing support for deleting data locally, I think it would not be too hard, but it would require at least changing how the manifest is stored (i.e. not storing in the same objects as the backup data). Of course, not many of the commands other than |
i originally intended to ask this on the mailing list, but i can't post or subscribe, so i'm just going to go ahead and post it here as a feature request.
the following paragraphs are part of what i intended to post to the mailing list (skip below them if you don't care about the motivation):
hello!
i'm in need of a new backup solution, and i've been playing with zbackup for a while. i like that it delegates in a true unix fashion.
what's nice about this is that zbackup does not implement transfer of the archive off-site itself. rather, its repository format cleanly separates data chunks from indexes. data chunks are never modified, only new ones added, and they are only read for restoring an archive. so you can use whatever you want to transfer the data chunks, and then just delete them. only the index files remain on the system, so new backups can still be created efficiently, even if the data chunks are now missing from the original system.
this is nice because in my setup, it makes a lot of sense to create a backup archive at one time and transfer it off-site slightly time-delayed. also, i do not have full control over the off-site storage (i can't install arbitrary software). so being able to use whatever software/script i want to transfer the data is a huge benefit.
the downside is that zbackup doesn't even implement reading the input files itself. rather, it takes an input stream (from tar, or any other archive program) and deduplicates it. this is where it gets inefficient, because even though the data is deduplicated, it generates a lot of disk i/o because tar still needs to fully read each file on every run of the backup.
....
these are the two features i'm curious about:
it seems that this is the case, at least judging from some experiments i did. can you confirm that?
this seems not to be the case. how easy would it be to implement this?
thanks
The text was updated successfully, but these errors were encountered: