-
-
Notifications
You must be signed in to change notification settings - Fork 743
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #62 from ThomasWaldmann/chunker-params
Chunker params, fixes #16
- Loading branch information
Showing
8 changed files
with
169 additions
and
33 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
About borg create --chunker-params | ||
================================== | ||
|
||
--chunker-params CHUNK_MIN_EXP,CHUNK_MAX_EXP,HASH_MASK_BITS,HASH_WINDOW_SIZE | ||
|
||
CHUNK_MIN_EXP and CHUNK_MAX_EXP give the exponent N of the 2^N minimum and | ||
maximum chunk size. Required: CHUNK_MIN_EXP < CHUNK_MAX_EXP. | ||
|
||
Defaults: 10 (2^10 == 1KiB) minimum, 23 (2^23 == 8MiB) maximum. | ||
|
||
HASH_MASK_BITS is the number of least-significant bits of the rolling hash | ||
that need to be zero to trigger a chunk cut. | ||
Recommended: CHUNK_MIN_EXP + X <= HASH_MASK_BITS <= CHUNK_MAX_EXP - X, X >= 2 | ||
(this allows the rolling hash some freedom to make its cut at a place | ||
determined by the windows contents rather than the min/max. chunk size). | ||
|
||
Default: 16 (statistically, chunks will be about 2^16 == 64kiB in size) | ||
|
||
HASH_WINDOW_SIZE: the size of the window used for the rolling hash computation. | ||
Default: 4095B | ||
|
||
|
||
Trying it out | ||
============= | ||
|
||
I backed up a VM directory to demonstrate how different chunker parameters | ||
influence repo size, index size / chunk count, compression, deduplication. | ||
|
||
repo-sm: ~64kiB chunks (16 bits chunk mask), min chunk size 1kiB (2^10B) | ||
(these are attic / borg 0.23 internal defaults) | ||
|
||
repo-lg: ~1MiB chunks (20 bits chunk mask), min chunk size 64kiB (2^16B) | ||
|
||
repo-xl: 8MiB chunks (2^23B max chunk size), min chunk size 64kiB (2^16B). | ||
The chunk mask bits was set to 31, so it (almost) never triggers. | ||
This degrades the rolling hash based dedup to a fixed-offset dedup | ||
as the cutting point is now (almost) always the end of the buffer | ||
(at 2^23B == 8MiB). | ||
|
||
The repo index size is an indicator for the RAM needs of Borg. | ||
In this special case, the total RAM needs are about 2.1x the repo index size. | ||
You see index size of repo-sm is 16x larger than of repo-lg, which corresponds | ||
to the ratio of the different target chunk sizes. | ||
|
||
Note: RAM needs were not a problem in this specific case (37GB data size). | ||
But just imagine, you have 37TB of such data and much less than 42GB RAM, | ||
then you'ld definitely want the "lg" chunker params so you only need | ||
2.6GB RAM. Or even bigger chunks than shown for "lg" (see "xl"). | ||
|
||
You also see compression works better for larger chunks, as expected. | ||
Duplication works worse for larger chunks, also as expected. | ||
|
||
small chunks | ||
============ | ||
|
||
$ borg info /extra/repo-sm::1 | ||
|
||
Command line: /home/tw/w/borg-env/bin/borg create --chunker-params 10,23,16,4095 /extra/repo-sm::1 /home/tw/win | ||
Number of files: 3 | ||
|
||
Original size Compressed size Deduplicated size | ||
This archive: 37.12 GB 14.81 GB 12.18 GB | ||
All archives: 37.12 GB 14.81 GB 12.18 GB | ||
|
||
Unique chunks Total chunks | ||
Chunk index: 378374 487316 | ||
|
||
$ ls -l /extra/repo-sm/index* | ||
|
||
-rw-rw-r-- 1 tw tw 20971538 Jun 20 23:39 index.2308 | ||
|
||
$ du -sk /extra/repo-sm | ||
11930840 /extra/repo-sm | ||
|
||
large chunks | ||
============ | ||
|
||
$ borg info /extra/repo-lg::1 | ||
|
||
Command line: /home/tw/w/borg-env/bin/borg create --chunker-params 16,23,20,4095 /extra/repo-lg::1 /home/tw/win | ||
Number of files: 3 | ||
|
||
Original size Compressed size Deduplicated size | ||
This archive: 37.10 GB 14.60 GB 13.38 GB | ||
All archives: 37.10 GB 14.60 GB 13.38 GB | ||
|
||
Unique chunks Total chunks | ||
Chunk index: 25889 29349 | ||
|
||
$ ls -l /extra/repo-lg/index* | ||
|
||
-rw-rw-r-- 1 tw tw 1310738 Jun 20 23:10 index.2264 | ||
|
||
$ du -sk /extra/repo-lg | ||
13073928 /extra/repo-lg | ||
|
||
xl chunks | ||
========= | ||
|
||
(borg-env)tw@tux:~/w/borg$ borg info /extra/repo-xl::1 | ||
Command line: /home/tw/w/borg-env/bin/borg create --chunker-params 16,23,31,4095 /extra/repo-xl::1 /home/tw/win | ||
Number of files: 3 | ||
|
||
Original size Compressed size Deduplicated size | ||
This archive: 37.10 GB 14.59 GB 14.59 GB | ||
All archives: 37.10 GB 14.59 GB 14.59 GB | ||
|
||
Unique chunks Total chunks | ||
Chunk index: 4319 4434 | ||
|
||
$ ls -l /extra/repo-xl/index* | ||
-rw-rw-r-- 1 tw tw 327698 Jun 21 00:52 index.2011 | ||
|
||
$ du -sk /extra/repo-xl/ | ||
14253464 /extra/repo-xl/ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
a487e16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this is a great new feature over Attic. I hope you can get it backported, but if not, it at least makes a very good line-item for a "Why Borg?" article.
a487e16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If attic was propperly accepting contributions Borg wouldn't exist to begin with
a487e16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea with that "why borg" article. :)
Maybe it could be a section in the docs, giving a high-level overview with the changes compared to attic.
a487e16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that would be a good idea. Being a newcomer to the project, the only concrete information I could find at first glance was that you wanted to be able to break backcompat more often. This is not the most enticing thing for someone looking for backup software.
After further reading I understand your motivations, and #5 really shows how far Borg has come. An article like this could take focus off of the clickbaity "he wants to break my backups!" message that has been circulating around and onto the message that Attic is mostly unmaintained and this is taking over development.
a487e16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #224 .