Backup job configuration tips

Backup configuration options

The backup configuration has the following options.

Option	Type	Meaning
backup_type	Enum	Defines whether the job should always produce full backups or should perform incremental backups after the initial full backup. Possible values: FULL, INCREMENTAL
hash_algorithm	Enum	Defines which hash algorithm should be used for ensuring file integrity and finding duplicated files. Possible values: NONE, MD5, SHA1, SHA256, SHA512
compression_algorithm	Enum	Selects the compression algorithm which should be used for the archive. Possible values: NONE, BZIP2, GZIP
encryption_key	Base64 String (nullable)	The KEK (RSA public key) that should be used for key encryption.
duplicate_strategy	Enum	Instructs the backup algorithm to keep each duplicated version as is, store only one of the per the current backup increment, or eliminate duplications even across multiple backup increments. Using KEEP_EACH is expected to produce the biggest archives, but the benefit of the other two options can depend on the number of duplications in the file set. Possible values: KEEP_EACH, KEEP_ONE_PER_BACKUP
chunk_size_mebibyte	int (larger than 0)	Sets an upper threshold for the size of each archive chunk. Can be useful in case the archive will be transferred to a file system or online storage service where the file size is limited. Also, partial restores can benefit from only reading the relevant chunks if there are multiple of them.
file_name_prefix	String	The prefix of each backup archive file. Must use characters which are allowed by the file system and OS. Can be a good idea to use only alphanumeric characters, dashes and underscores for simplicity.
destination_directory	String (file:// URI)	The absolute path of a directory where we want to store the backup archives.
sources	Set (Backup source)	Defines the source folders/files and the relevant match criteria. Each Backup source has to be mutually exclusive (to guarantee, that each file can only be included due to matching one source). A backup source has a path component, defining the root of the source, an include_patterns list, defining the glob patterns identifying the files matching under the root path, and an optional exclude_patterns list that can be used for excluding some of the matching files using the same glob matching pattern as before.

Example configuration

A simple example configuration can be seen below:

{
    "backup_type" : "FULL",
    "hash_algorithm" : "SHA256",
    "compression_algorithm" : "GZIP",
    "encryption_key" : null,
    "duplicate_strategy" : "KEEP_EACH",
    "chunk_size_mebibyte" : 500,
    "file_name_prefix" : "home-backup-gzip-unencrypted",
    "destination_directory" : "file:///tmp/backup-destination/",
    "sources" : [ 
        {
            "path" : "file:///home/user/",
            "include_patterns": [ "**" ],
            "exclude_patterns": [ ".m2", ".m2/**" ]
         }
    ]
}

Tips

Security

If security is important for you, make sure to always provide a KEK (4096 bit RSA public key) in the encryption_key property. This will automatically turn on automatic AES DEK generation as well as encryption of each archived entry, and each piece of metadata.

Reducing archive size

As backups can take a lot of space, it can be useful to reduce the size of the backup archives. Multiple features can be used for this, such as:

Using incremental backups can make sure that only the changed files are stored
Selecting the KEEP_ONE_PER_BACKUP duplication strategy can make sure each file is only stored once across all versions. It is recommended to enable hash calculation as well by selecting a hash algorithm other than NONE.
Using GZIP or BZIP2 compression can reduce the size of each backup entry
Regularly merging the increments when we are sure that we no longer need to restore to a particular point in time can eliminate unimportant states of files which change frequently

Understanding the different duplication strategies

KEEP_EACH

This setting ignores duplications, meaning that every copy of the file will be stored without considering the fact that we have already stored the same content previously as illustrated on the picture below.

Backup_DuplicationHandlingStrategy-KEEP_EACH

As you can see, the A content is saved in multiple copies.

KEEP_ONE_PER_BACKUP

With this strategy, we can try to eliminate duplications across each backup increment globally, making sure, that we are not adding the same file twice even in case of later increments.

Backup_DuplicationHandlingStrategy-KEEP_ONE_PER_BACKUP

This illustration shows even more links than the previous.

Performance

Using encryption and compression can slow down backup creation. If performance is more important than security or the size of the archive, you can opt to disable these features as a trade-off. At the same time, the implementation supports multi-threaded backup and restore functionality, that can help you mitigate the overhead caused by these expensive options.

Tip

Since the multi-threaded backup is using temp files in the backup directory, it can make sense to allow using only 1 thread for the backup process when the backup destination is a slow disk or accessible over the network. This is because the single threaded implementation in writing only once, allowing better efficiency in these I/O bound scenarios.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly