Stream Sync is a tool to synchronize two folder structures. It copies or updates files between the structures until their content is equivalent. The folder structures can reside on a local disk or on various cloud storages. Different modes for the synchronization are supported addressing different use cases; refer to Operation modes for further details.
- Building
- Operation modes
- Usage
- Structures
- Option syntax
- Filtering options
- Filter expressions
- Log files
- Adjust granularity of timestamps
- Encryption
- Structure types
- Authentication
- Throttling sync streams
- Timeouts
- Reading passwords from the console
- Dry-run mode
- Options specific to Mirror mode
- Options specific to Sync mode
- Examples and use cases
- Sync a local directory to an external USB hard disk
- Do not remove archived data
- Interrupt and resume long-running sync processes
- Sync from a local directory to a WebDav directory
- Sync from a local directory to a WebDav server with encryption
- Sync from a local directory to Microsoft OneDrive
- Sync from a local directory to Google Drive
- Setting up a sync process for existing data
- Architecture
- License
Stream Sync uses [sbt](https://www.scala-sbt.org/) as its build tool. After the
project has been checked out, you can cd into the project directory and enter
the sbt console by typing sbt
. Then the following commands are of interest:
-
test
compiles everything and executes the unit tests -
ITest / test
compiles everything and executes the integration tests -
assembly
as the build makes use of the sbt-assembly plugin, with this command a so-called fat jar can be built that is executable and contains all the dependencies of Stream Sync. This makes it easy to run the tool from the command line without having to bother with a large class path.
When running Stream Sync it compares two folder structures and updates them to achieve a desired target state. Depending on a concrete use case, different target states are possible. The exact way how Stream Sync updates the structures it processes is controlled by its operation mode. The following subsections describe the operation modes supported by the tool.
In mirror mode, a destination directory structure becomes an exact copy of a source structure. Both structures are compared. Files or directories existing in the source structure, but not in the destination structure are created there. Files that have been modified in the source structure override their counterparts in the destination structure. Files or directories that exist in the destination structure, but not in the source structure are removed. So, changes are applied only on the destination structure; the source structure is not modified.
This mode is suitable to create an exact (non-incremental) copy of data stored in a folder structure. For instance, you have a local working state and want to back up this state to an external storage medium or upload it to a cloud storage. After modifying the local working state, you can run the tool again, and it will adjust the destination structure according to your latest changes.
Warning
|
Note that such a "mirror" is typically not a replacement for a full backup, since files deleted in the source structure are deleted in the destination structure as well and cannot be restored later. |
The typical use case for sync mode is a shared set of data that is used from multiple devices. A central copy of the data is available on a storage accessible from all devices, and each device has a local copy. When data on one device is changed, the changes are synced to the central copy. From there, they can be downloaded to all other devices.
In contrast to mirror mode, there is not a source and a destination structure, but a local and a remote structure. During a run of Stream Sync, both structures may be modified: changes done locally are applied to the remote copy, changes on the remote data are synced to the local files. This mode is more complex than mirror mode. Single sync operations can fail if a conflict is detected; for instance if a file was changed both locally and on the remote side. In this case, the conflicting operation is skipped, an error is reported, and the user has to resolve the conflict manually.
To be able to detect changes on the local copy and potential conflicts in a reliable manner, Stream Sync stores information about the managed data in a local file. Each sync run compares the current state of the local data with the information stored in the file and can thus determine what has actually changed. The file is then updated with the new local state resulting from the current sync run.
Caution
|
The current implementation assumes that the data is modified on different devices by the same user at different times. It cannot handle parallel sync processes run at the same time from multiple devices against the central copy of data. |
The tool offers a command line interface (CLI) that is invoked using the following general syntax:
Sync <sourceStructure> <destinationStructure> [--mirror] [options]
for Mirror mode, where sourceStructure points to the structure serving as the source of the mirror process, and destinationStructure refers to the destination structure; or
Sync <localStructure> <remoteStructure> --sync [options]
to enable Sync mode, where localStructure references the local copy of
the data, and remoteStructure is a URL pointing to the central copy. The
--mirror
switch is optional; mirror mode is the default operation mode.
The following subsections provide further details about the command line options supported by the tool. Most of these options work in all operation modes; so no explicit distinction is needed. Therefore, this documentation is a bit lax with the terms it uses:
-
The term sync process is used to refer to a run of the Stream Sync tool in all operation modes; it can especially mean a run in mirror mode as well.
-
The structures processed by the tool are usually referred to as source and destination structure. This wording also covers the local and remote structure of runs in sync mode.
Options that are operation mode-specific are marked as such in their description.
Note: Being written in Scala, Stream Sync requires a Java Virtual Machine to run. So the full command to be executed has to launch Java and specify the full class path and the main class (which is com.github.sync.cli.Sync). When a fat jar has been built as described in the Building section the command can be abbreviated to
java -jar stream-sync-assembly-<version>.jar [options] <source> <destination>
In all examples in this document the short form Sync
is used as a
placeholder for the complete command.
The generic term structure has been used to refer to the source and the destination of a sync process. The reason for this is that Stream Sync can handle different types of structures. In the most basic case, the structures are paths on a local file system (or a network share that can be accessed in the same way as a local directory). In this case, the paths can be specified directly.
To reference a different type of structure, specific URIs need to be used. These URIs typically start with a prefix followed by a part specific to a dedicated structure type. To give a concrete example, one prefix that is currently supported is dav:. This prefix indicates that the structure is hosted on a WebDav server. The root URL to the directory on the server to be synced must be specified after the prefix. The following snippet shows how to sync a path on the local file system with a directory on a WebDav server:
Sync [options] /data/local/music dav:https://my.cloud-space.com/data/music
Some structures need additional parameters to be accessed correctly. For
instance, a WebDav server typically requires correct user credentials. Such
parameters are passed as additional options in the command line; they are
allowed only if a corresponding structure takes part in the sync process. A
structure requiring additional parameters can be both the source and the
destination of the sync process; therefore, when providing additional options
it must be clear to which structure they apply. This is achieved by using
special prefixes: src-
for options to be applied to the source structure,
and dst-
for options referring to the destination structure. In the example
above the WebDav structure is the destination; therefore, the username and
password options must be specified using the dst-
prefix:
Sync --dst-user myWebDavUserName --dst-password myWebDavPwd \
/data/local/music dav:https://my.cloud-space.com/data/music
If both structures were WebDav directories, one would also have to specify the
corresponding options with the src-
prefix, as in
Sync dav:https://server1.online.com/source \
--src-user usrIDSrc --src-password pwdSrc \
dav:https://server2.online.com/dest \
--dst-user usrIDDst --dst-password pwdDst
This convention makes it clear, which option applies to which structure. The structure types supported are described in more detail in the Structure types section later in this document. This section also lists for each structure type which additional options it supports.
A number of options is supported to customize a sync process. Options are
distinguished from the source and destination URIs by the fact that they have
to start with the prefix --
. Most options have a value that is obtained from
the parameter that follows the option key. So a sequence of command line
options looks like
--option1 option1_value --option2 option2_value
There are also a few options acting like switches: these options do not have a value, but their presence or absence on the command line determines their value - true or false.
For some options the application defines short alias names consisting of only a
single letter. Such aliases use only a single -
as prefix. So for instance,
the following parameter lists are equivalent:
Sync --log path/to/log
and:
Sync -l path/to/log
The order of options typically does not matter. It also makes no difference if
options are placed before or after the URIs for the structures to be synced.
Unrecognized option keys cause the program to fail with a corresponding error
message. In case of an error, the application shows a help screen describing
all the parameters it supports. The user can also request help explicitly by
specifying the --help
flag or its short alias -h
, such as:
Sync srcUri destUri --help
Note that the help printed by the application is partly context-sensitive; it
depends on the parameters already provided on the command line. If the help
switch is passed without other arguments, such as
Sync -h
the application shows a generic help screen listing the top-level options available. If the command line contained already URIs for the structures to be processed, e.g.
Sync /data/local/music dav:https://my.cloud-space.com/data/music --help
the help screen would include descriptions of options supported by the structure types in use - the local file system and WebDav in this example. This makes it possible to complete the command line step by step, by requesting help for the parts that are currently defined.
The options supported are described in detail below. There is one special
option, --file
, that expects as value a path to a local file. This file is
read line-wise, and the single lines are added to the sequence of command line
arguments as if they had been provided by the user on program execution. For
instance, given a file sync_params.txt
with the following content:
--actions
actionCreate,actionOverride
--filter-create
exclude:*.tmp
Then an invocation of
Sync --file sync_params.txt /path/source /path/dest
would be equivalent to the following call
Sync --actions actionCreate,actionOverride --filter-create exclude:*.tmp /path/source /path/dest
An arbitrary number of command line files can be specified, and they can be nested to an arbitrary depth. Note, however, that the order in which such files are processed is not defined. This is normally irrelevant, but can be an issue if the source and destination URIs are specified in different files. It could then be the case that the URIs swap their position, and the sync process is done in the opposite direction!
Option keys are not case-sensitive; so --actions
has the same meaning as
--ACTIONS
or --Actions
. However, for short alias names case matters.
With this group of options specific files or directories can be included or excluded from a sync process. It is possible to define such filters globally, and also for different sync actions. A sync process is basically a sequence of the following actions, where each action is associated with a file or folder:
-
Action Create: An element is created in the destination structure.
-
Action Override: An element from the source structure replaces a corresponding element in the destination structure.
-
Action Remove: An element is removed from the destination structure.
To define such action filters, a special option keyword is used whose value is a filter expression. As option keywords can be repeated, an arbitrary number of expressions can be set for each action. A specific action on an element is executed only if the element is matched by all filter expressions defined for this action. The following option keywords exist (filter expressions are discussed a bit later):
Option | Description |
---|---|
--filter-create |
Defines a filter expression for actions of type Create. |
--filter-override |
Defines a filter expression for actions of type Override. |
--filter-remove |
Defines a filter expression for actions of type Remove. |
--filter |
Defines a filter expression that is applied for all action types. |
In addition, it is possible to enable or disable specific action types for the
whole sync process. Per default, all action types are active. With the
--actions
option the action types to enable can be specified. The option
accepts a comma-separated list of action names; alternatively, the option can
be repeated to enable multiple action types. Valid names for action types are
actionCreate, actionOverride, and actionRemove (case is again ignored).
So the following option enables only create and override actions:
--actions actionCreate,actionOverride
With the following command line only create and remove actions are enabled:
--actions actionCreate --actions actionRemove
During a sync process, for each action it is checked first whether its type is enabled. If this is the case, the filter expressions (if any) assigned to this action type are evaluated on the element that is subject to this action. Only if all expressions accept the element, the action is actually performed on this element.
Thus, filter expressions refer to attributes of elements. The general syntax of an expression is as follows:
<criterion>:<value>
Here criterion is one of the predefined filter criteria for attributes of elements to be synced. The value is compared to a specific attribute of the element to find out whether the criterion is fulfilled.
The following table gives an overview over the filter criteria supported:
Criterion | Data type | Description | Example |
---|---|---|---|
minlevel |
Int |
Each element (file or folder) is assigned a level, which is the distance to the root folder of the source structure. Files or folders located in the source folder have level 0, the ones in direct sub folders have level 1 and so on. With this filter the minimum level can be defined; so only elements with a level greater or equal to this value are taken into account. |
min-level:1 |
maxlevel |
Int |
Analogous to minlevel, but defines the maximum level; only elements with a level less or equal to this value are processed. |
max-level:5 |
exclude |
Glob |
Defines a file glob expression for files or folders to be excluded from the sync process. Here file paths can be specified that can contain the well-known wildcard characters '?' (matching a single character) and '*' (matching an arbitrary number of characters). |
|
include |
Glob |
Analogous to exclude, but defines a pattern for files to be included. |
|
date-after |
date or date-time |
Allows selecting only files whose last-modified date is equal or after to a given reference date. The reference date is specified in ISO format with an optional time portion. If no time is defined, it is replaced by 00:00:00. |
|
date-before |
date or date-time |
Analogous to date-after, but selects only files whose last-modified time is before a given reference date. |
|
The sync operations executed during a sync process can also be written in a
textual representation to a log file. This is achieved by adding the --log
option whose value is the path to the log file to be written. With this option,
a protocol of the operations that have been executed can be generated.
If only failed operations are of interest, the error log file is the right
choice. This file contains all sync operations that could not be applied due
to some exception, followed by this exception. This gives an overview over what
went wrong and which files may not be up-to-date. To enable this error log, use
the --error-log
option and provide the path to the error log file.
In order to decide whether a file needs to be copied to the destination structure, StreamSync compares the last-modified timestamps of the files involved. After a file has been copied, the timestamp in the destination structure is updated to match the one in the source structure; so if there are no changes on the file in the source structure, another sync process will ignore this file - at least in theory.
In practice there can be some surprises when syncing between different types of file systems or structures. The differences can also impact the comparison of last-modified timestamps. For instance, some structures may store such timestamps with a granularity of nanoseconds, while others only use seconds. This may lead to false positives when StreamSync decides which files to copy.
To deal with problems like that, the --ignore-time-delta
option can be
specified. The option expects a numeric value which is interpreted as a
threshold in seconds for an acceptable time difference. So if the difference
between the timestamps of two files is below this threshold, the timestamps
will be considered to be equal. Setting this option to a value of 1 or 2
should solve all issues related to the granularity of file timestamps. An
example using this option can be found in the Examples and use cases
section.
One use case for StreamSync is creating a backup of a local folder structure on a cloud server; the data is then duplicated to another machine that is reachable from everywhere. However, if your data is sensitive, you probably do not want it lying around on a public server without additional protection.
StreamSync offers such protection by supporting multiple options for encrypting the data that is synced:
-
The content of files can be encrypted.
-
The names of files and folders can be encrypted.
If encryption is used and what is encrypted is controlled by the so-called encryption mode. This is an enumeration that can have the following values:
-
none: No encryption is used.
-
files: The content of files is encrypted.
-
filesAndNames: Both the content of files and their names are encrypted. (This includes directories as well.)
In all cases, encryption is based on AES using key sizes of 128 bits. The keys are derived from password strings that are transformed accordingly (password strings shorter than 128 bits are padded, longer strings are cut). In addition, a random initialization vector is used; so an encrypted text will always be different, even if the same input is passed.
The source and the destination of a sync process can be encrypted independently. If an encryption mode other than none is set for the destination, but not for the source, files transferred to the destination are encrypted. If such an encryption mode is set for the source, but not for the destination, files are decrypted. If active encryption modes are specified for both sides, files are decrypted first and then encrypted again with the destination password.
The following table lists the command line options that affect encryption (all of them are optional):
Option | Description | Default |
---|---|---|
src-crypt-mode |
The encryption mode for the source structure (see above). This flag controls whether encryption is applied to files on the source structure. |
none |
dst-crypt-mode |
The encryption mode for the destination structure; controls how encryption is applied to the destination structure. |
none |
src-encrypt-password |
Defines a password for the encryption of files in the source structure. This password is needed when the source crypt mode indicates that encryption should be used. |
Undefined |
dst-encrypt-password |
Analogous to |
Undefined |
crypt-cache-size |
During a sync operation with encrypted file names, it may be necessary to encrypt or decrypt file names multiple times; for instance if parent folders are accessed multiple times to process their sub folders. As an optimization, a cache is maintained storing the names that have already been encrypted or decrypted; that way the number of crypt operations can be reduced. For sync operations of very complex structures (with deeply nested folder structures), it can make sense to set a higher cache size. Note that the minimum allowed size is 32. |
128 |
Note that folder structures that are only partly encrypted are not supported; when specifying an encryption password, the password is applied to all files.
This section lists the different types of structures that are supported for sync processes. If not mentioned otherwise, all types can act as source and as destination structure of a sync process. The additional parameters supported by a structure type are described as well.
This is the most basic and "natural" structure type. It can be used for instance to mirror a directory structure on the local hard disk to an external hard disk or a network share.
To specify such a structure, just pass the (OS-specific) path to the root
directory without any prefix. The table below lists the additional options
that are supported. (Remember that these options need to be prefixed with
either src-
or dst-
to assign them to the source or destination
structure.)
Option | Description | Mandatory |
---|---|---|
time-zone |
There are file systems that store last-modified timestamps for files in the system’s local time without proper time zone information. This causes the last-modify time to change together with the local time zone, e.g. when the daylight saving time starts or ends. In such cases, Stream Sync would consider the files on this file system as changed because their last-modified time is now different. One prominent example of such a file system is FAT32 which is still frequently used, for instance on external hard disks, because of its broad support by different operation systems. To work around this problem, with the time-zone option it is possible to define a time zone in which the timestamps of files in a specific structure have to be interpreted. The last-modified time reported by the file system is then calculated according to this time zone before comparison. Analogously, when setting the last-modified of a synced file the timestamp is adjusted. As value of the option, any string can be provided that is accepted by the ZoneId.of() method of the ZoneId JDK class. |
No |
It is possible to sync from or to a directory hosted on a WebDav server. To do
this, the full URL to the root directory on the server has to be specified with
the prefix dav:
defining the structure type. The following table lists the
additional options supported for WebDav structures. (Remember that these
options need to be prefixed with either src-
or dst-
to assign them to
the source or destination structure.)
Option | Description | Mandatory |
---|---|---|
modified-property |
The name of the property that holds the last-modified time of files on the server (see below). |
No |
modified-namespace |
Defines a namespace to be used together with the last-modified property (see below). |
No |
delete-before-override |
Determines whether a file to be overridden on the WebDav server is deleted first. Experiments have shown that for some WebDav servers override operations are not reliable; in some cases, the old file stays on the server although a success status is returned. For such servers this property can be set to true. StreamSync will then send a DELETE request for this file before it is uploaded again. All other values disable this mode. |
No |
In addition to these options, the mechanism to authenticate with the server has to be defined. Refer to the Authentication section for more information.
Notes
Using WebDav in sync operations can be problematic as the standard does not define an official way to update a file’s last-modified time. Files have a getlastmodified property, but this is typically set by the server to the time when the file has been uploaded. For sync processes it is, however, crucial to have a correct modification time; otherwise, the file on the server would be considered as changed in the next sync process because its timestamp does not match the one of the file it is compared against.
Concrete WebDav servers provide different options to work around this problem.
Stream Sync supports servers that store the modification time of files in a
custom property that can be updated. The name of this property can be defined
using the modified-property
option. As WebDav requests and responses are
based on XML, the custom property may use a different namespace than the
namespace used for the core WebDav properties. In this case, the
modified-namespace
option can be set.
When using a WebDav directory as source structure Stream Sync will read the
modification times of files from the configured modified-property
property;
if this is undefined, the standard property getlastmodified is used instead.
When a WebDav directory acts as destination structure, after each file upload another request is sent to update the file’s modification time to match the one of the source structure. Here again the configured property (with the optional namespace) is used or the standard property if unspecified.
Most Windows users will have a Microsoft account and thus access to a free cloud storage area referred to as OneDrive. For Windows there is an integrated OneDrive client that automatically syncs this storage area to the local machine. For Linux, however, no official client exists.
Stream Sync supports a OneDrive storage as both source or destination structure
of a sync process. The storage is identified by using a URL of the form
onedrive:<driveID>
where driveID is a string referencing a specific
Microsoft OneDrive account. In addition, the following special command line
options are supported:
Option | Description | Mandatory |
---|---|---|
path |
Defines the relative sub path of the storage which should be synced. |
Yes |
upload-chunk-size |
File uploads to the OneDrive server have to be split to multiple chunks if the file size exceeds a certain limit (about 60 MB). With this parameter the chunk size in MB to be used by Stream Sync can be configured. |
No, defaults to 10 MB. |
OneDrive uses OAuth 2 as authentication mechanism with a special identity provider from Microsoft. Therefore, the corresponding credentials have to be setup (refer to the OAuth 2 section for further information). This requires a bunch of preparation steps before sync processes can be run successfully. The example Sync from a local directory to Microsoft OneDrive contains a full description of the steps necessary.
Another popular cloud storage offering is available from Google: On a Google Drive account users can store information up to a certain limit. Most users of Android will have such an account. As is true for Microsoft OneDrive, official sync clients are not available for all operation systems.
Stream Sync can handle a Google Drive account as both source and destination of
a sync process. To access such an account, use a URL of the form
googledrive:<path>
, where path is the optional root path of the sync
process. If it is missing, the special root folder of the Google Drive
account is used; otherwise, only the path specified here is taken into account
by sync operations. Note that there is no such thing like an account ID in the
URL; the account to be accessed is encoded in the OAuth 2 access token, which
is used for authentication (the OAuth 2 section contains more information
about this topic).
One speciality of Google Drive is that this file system is not strictly
hierarchical. A single file or folder can have multiple parents, and it is
possible that a folder can have multiple children with the same name. Thus, a
path like documents/private/MyText.doc
does not necessarily uniquely identify
a single element. Even cycles in folder structures are possible. Stream Sync
does not handle such scenarios. It treats Google Drive like any other folder
structure and assumes the same properties. So when using Stream Sync together
with Google Drive, you should make sure that at least the sub path to be synced
follows the conventions of a strictly hierarchical file system.
Other than the root path to be synced in the target Google Drive account - which is part of the structure URL - you typically do not have to specify any further configuration options.
Note
|
There is one additional command line option, --server-url , which can be
used to specify an alternative server URL; but this is only needed for very
special scenarios, e.g. for testing. Per default, the standard Google Drive API
endpoint is addressed.
|
You can find a complete example how to set up Stream Sync for accessing a Google Drive account in the section Sync from a local directory to Google Drive.
Structure types that involve a server typically require an authentication mechanism. Stream Sync supports multiple ways to authenticate with the server.
The easiest authentication mechanism is Basic Auth, which requires that a
user name and password are provided. This information is then passed to the
server in the Authorization
header. (Therefore, this mechanism makes only
sense when HTTPS is used for the server communication.)
To make use of Basic Auth, just define the command line options
user
and password
. Note that these options have to be prefixed with
src-
or dst-
to assign them to either the source or destination structure.
Examples how to use these options can be found in the
Examples section, for instance under
Sync from a local directory to a WebDav directory.
OAuth 2 is another popular way for authentication. Stream Sync supports the Authorization code flow. In this flow the authentication is done by an external server, a so-called identity provider (IDP). In a first step, an authorization code is retrieved. In this step, the user basically grants Stream Sync the permission to access her account with a set of pre-defined rights. This is done by opening a Web page at a URL specific to the IDP in the user’s Web browser. The user then authenticates against the IDP, e.g. by filling out a login form or using another means. If login is successful, the IDP invokes a so-called redirect URL and passes the authorization code as a query parameter.
In a second step, the authorization code has to be exchanged against an
access token. This is done by calling another endpoint provided by the IDP
and passing the authorization code as a form parameter. If everything goes
well, the IDP replies with a document that contains both an access token and a
refresh token. The access token must be passed in the Authorization
header
for all requests sent to the target server. Its validity period is limited;
when it expires, the refresh token can be used to obtain a new access token.
The refresh token is typically valid for a longer time; so the user has to do
the login (i.e. the first step) only once, and then Stream Sync can access the
target server as long as the refresh token stays valid.
The authorization code flow is interactive; it requires that the user executes
some actions in a Web browser. This is not a great fit for a command line tool
like Stream Sync. To close this gap, in addition to the main class of Stream
Sync, there is a second CLI class responsible for the configuration and
management of OAuth identity providers:
com.github.sync.cli.oauth.OAuth
.
What this class basically does is updating a storage with information about
known IDPs: First, an IDP has to be added to the system. In this step a number
of properties for this IDP has to be provided, such as the URLs to specific
endpoints or the client ID and secret to be used for the interaction with the
IDP. For this purpose, the init
command is used. An example invocation could
look as follows:
$ java -cp stream-sync-assembly-<version>.jar com.github.sync.cli.oauth.OAuth init \
--idp-storage-path ~/tokens/ \
--idp-name microsoft \
--auth-url https://login.live.com/oauth20_authorize.srf \
--token-url https://login.live.com/oauth20_token.srf \
--scope "files.readwrite offline_access" \
--redirect-url http://localhost:8080 \
--client-id <client-id> \
--client-secret <secret>
The command supports the following options:
Option | Description | Mandatory |
---|---|---|
idp-name |
Assigns a logical name to the IDP. This name is then used by other commands or within Stream Sync to reference this IDP. An arbitrary name can be chosen. |
Yes |
idp-storage-path |
Defines a path on the local file system where information about the IDP affected is stored. In this path a couple of files are created whose names are derived from the name of the IDP. |
Yes |
auth-url |
The URL of the authorization endpoint of the IDP. This URL is needed to obtain an authorization code; a GET request is sent to it with some specific properties added as query parameters. |
Yes |
token-url |
The URL of the token endpoint of the IDP. This URL is used to obtain an access and refresh token pair for the authorization code, and later also for refresh token requests. |
Yes |
scope |
This parameter defines a list of values that are passed in the scope parameter to the IDP. The values are specific to a concrete IDP; they determine the access rights that are granted to a client that has a valid access token. |
Yes |
redirect-url |
Defines the redirect URL, which plays an important role in the
authorization code flow. This URL is invoked by the IDP after a successful login
of the user. The URLs to be used depend on the concrete use case; URLs
referencing |
Yes |
client-id |
An ID identifying the client. This ID is provided by the IDP as part of some kind of on-boarding process. |
Yes |
client-secret |
A secret assigned to the client. Like the client ID, the secret is provided by the IDP. |
No; if missing the secret is read from the console. |
store-unencrypted |
This is a switch that determines whether some sensitive information related to the IDP should be encrypted. Affected are the client secret and the token information obtained from the IDP. With an access token - as long as it is valid - an attacker can access the target server on behalf of the user; therefore, it makes sense to protect this data, and encryption is active per default. It can be explicitly disabled by specifying this switch. |
No, defaults to true. |
idp-password |
The password to be used to encrypt sensitive information related to the IDP. This property is relevant if the encrypt-idp-data option is evaluated to true. |
No; it is read from the console if necessary. |
After the execution of this command, the IDP-related information is stored
under the path specified, but no access token is retrieved yet. This is done
using the login
command as follows:
$ java -cp stream-sync-assembly-<version>.jar com.github.sync.cli.oauth.OAuth login \
--idp-storage-path ~/tokens/ \
--idp-name microsoft
The parameters correspond to the ones of the init
command; encryption is
supported in the same way. (If an encryption password has been specified to the
init
command, the same password must be entered here as well.)
The login
command does the actual interaction with the IDP as required by the
authorization code flow. It tries to open the standard Web browser at the
authorization URL configured for the IDP in question. If this fails for some
reason, a message is printed asking the user to open the browser manually and
navigate to this URL. The Web page served at this URL is under the control of
the IDP; it should give the relevant instructions to do a successful
authentication, e.g. by filling out a login form. If this is the first login
attempt, the user is typically asked whether she wants to grant the access
rights defined by the scope parameter to this client application. If
authentication is successful, the IDP then redirects the user’s browser to the
redirect URL. Depending on the configured redirect URL, there are two options:
-
If the redirect URL is of the form
http://localhost:<port>;
, the command opens a small HTTP server at the configured port and waits for the redirect. It can then obtain the authorization code automatically without any further user interaction. -
For other types of redirect URLs, the user is responsible to extract the code; for instance from the URL displayed in the browser’s address bar. The command opens a prompt on the console where the code can be entered.
If everything goes well, the command creates a new file in the specified storage path with the access and refresh tokens obtained from the IDP; the file is optionally encrypted.
With this information in place, Stream Sync can now be directed to use this IDP for authentication. To do this, the user and password options used for basic auth have to be replaced by ones pointing to the desired IDP:
Sync C:\data\work dav:https://target.dav.io/backup/work \
--log C:\Temp\sync.log \
--dst-idp-storage-path /home/hacker/temp/tokens --dst-idp-name microsoft \
Note how, analogous to the OAuth commands, the IDP is referenced by its name and the path where its data is stored; the encrypt-idp-data and idp-password options are supported as well.
With one final OAuth command the data of a specific IDP can be removed again:
$ java -cp stream-sync-assembly-<version>.jar com.github.sync.cli.oauth.OAuth remove \
--idp-storage-path ~/tokens/ \
--idp-name microsoft
This command deletes all files for the selected IDP in the path specified. As the files are just deleted, no encryption password is required here.
As is true for the main Sync application, the OAuth application offers the
switch --help
(or its short form -h
) to explicitly request usage
information. To get a general help screen, just enter:
$ java -cp stream-sync-assembly-<version>.jar com.github.sync.cli.oauth.OAuth --help
To request help information specific to a concrete command, also provide this command, for instance:
$ java -cp stream-sync-assembly-<version>.jar com.github.sync.cli.oauth.OAuth init --help
In some situations it may be necessary to restrict the number of sync operations that are executed in a given time unit. For instance, there are public servers that react with an error status of 429 Too many requests when many small files are uploaded over a fast internet connection.
StreamSync supports two command line options to deal with such cases:
Option | Description | Default |
---|---|---|
throttle |
The option is passed a numeric value that limits the number of sync operations (file uploads, deletion of files, creation of folders, etc.) in a time unit. |
None |
throttle-unit |
This option defines the time unit, in which the |
Second |
For instance, using a command like
Sync --throttle 1 ...
only a single operation per second is executed. This is a good solution for the problem with overloaded servers because it mainly impacts small files and operations that complete very fast. The upload of larger files that takes significantly longer than a second will not be delayed by this option. By specifying greater time units, throttling can even be configured on a finer level, e.g.:
Sync --throttle 45 --throttle-unit minute ...
would limit the throughput of the sync stream to 45 operations per minute.
Another option to influence the speed of sync processes that have an HTTP server as source or destination is to override certain configuration settings. StreamSync uses the Akka HTTP library for the communication via the HTTP protocol. The library can be configured in many ways, and system properties can be used to override its default settings. Options you may want to modify in order to customize sync streams are the size of the pool for HTTP connections (which determines the parallelism possible and is set to 4 per default) or the number of requests that can be open concurrently (32 by default). To achieve this, pass the following arguments to the Java VM that executes StreamSync:
-Dakka.http.host-connection-pool.max-connections=1 -Dakka.http.host-connection-pool.max-open-requests=2
As you can see in this example, the name of the system properties is derived from the hierarchical structure of the configuration options for Akka HTTP as described in the referenced documentation.
To prevent that sync processes hang when servers involved respond very slowly,
a timeout is applied to all operations. The timeout in seconds can be
configured via the --timeout
command line option; the default value is one
minute.
If a sync process needs to upload large files to a server via a not so fast internet connection, the timeout probably has to be increased; otherwise, operations will fail because they take too long. The following example shows how to set the timeout to 10 minutes to deal with larger uploads:
Sync C:\data\work dav:https://sd2dav.1und1.de/backup/work --timeout 600
For some use cases, e.g. connecting to a WebDav server or encrypting files, StreamSync needs passwords. Per default, such passwords can be specified as command line arguments, like any other arguments processed by the program. This can, however, be problematic when it comes to secret data: If the program is invoked from a command shell, the passwords are directly visible. They are typically stored in the command line history as well. So they can be easily compromised.
To reduce this risk, passwords can also be read from the console. This happens automatically without any additional action required by the caller. If a password is required for a concrete sync scenario, but the corresponding command line argument is missing, the user is prompted to enter it. As prompt the name of the command line argument representing the password is used. When the password is typed in no echo is displayed.
It is well possible that multiple passwords are needed for a single sync process. An example could be a process that syncs from the local file system to an encrypted WebDav server. Then a password is needed to connect to the server, and another one for the encryption. Either of them can be omitted from the command line; the user is prompted for all missing passwords.
Before actually modifying data on the destination structure, it is sometimes
useful to check, which actions will be performed; so that unexpected
manipulations or even data loss can be avoided. This is possible by adding the
--dry-run
switch to the command line or its short alias -d
. The sync
process then still determines the differences between the source and the
destination structure, and a sync log file can be specified, in which the sync
operations are written. It will, however, not apply any actual changes to the
destination structure.
This section describes the options that are only allowed in Mirror mode.
Section Log files described the usage of the --log
option to produce a
protocol of the operations executed during a sync/mirror process. While such a
log file is meaningful on its own, for mirror processes it can serve an
additional purpose:
It is possible to use such a log file as input for another mirror process.
Then the sync operations to be executed are not calculated as the delta between
two structures, but are directly read from the log file. This is achieved by
specifying the --sync-log
option whose value is the path to the log file to
be read. Note that in this mode still the URIs for both the source and
destination structure need to be specified; log files contain only relative
URIs, and in order to resolve them correctly the root URIs of the original
structures must be provided.
If the structures to be synced are pretty complex and/or large files need to be transferred over a slow network connection, sync processes can take a while. With the support for log files this problem can be dealt with by running multiple incremental mirror processes. This works as follows:
-
An initial mirror process is run for the structures in question that has the
--log
option set and enables Dry-run mode. This does not execute any actions, but creates a log file with the operations that need to be done. -
Now further mirror processes can be started to process the sync log written in the first step. For such operations the following options must be set:
-
--sync-log
is set to the path of the log file written in the first step. -
--log
is set to a file keeping track on the progress of the overall operation. This file is continuously updated with the sync operations that have been executed.
-
The mirror processes can now be interrupted at any time and resumed again later. When restarted with these options the process ignores all sync operations listed in the progress log and only executes those that are still pending. This is further outlined in the Examples and use cases section.
In the incremental mode, as described above, the error log file has no further function than reporting errors. Sync operations that appear in the error log are not written to the normal log and are not considered to be completed. So when running another mirror process from the sync log, these operations are retried (and if they fail again, they are written anew to the error log).
The typical use case for Stream Sync in mirror mode is transferring data from one system - the leading system - to another data structure; the destination structure gets modified to become a clone of the original system. From time to time you may need to run a mirror process in the inverse direction.
Consider for example that you use Stream Sync as a backup tool. If you mess up
with your original data, you will probably want to replace it from the backup
storage. This is of course easily possible: you just have to rewrite the sync
command you use for your backup to work in the opposite direction. This can be
done rather mechanically; the source and destination URIs have to be exchanged,
as well as the src-
and dst-
prefixes of the parameters that configure your
data structures.
Sync commands tend to be become complex; you often need a bunch of parameters to configure authentication and fine-tune the transfer process. Maybe you have therefore written shell scripts that contain your sync commands. In the backup scenario, you would have a shell script that triggers your backup. To restore your data from the backup structure, you could create a restore script using the replacements outlined above. This solution is not ideal, however, because you now have to maintain two scripts that need to be kept in sync.
For such use cases, Stream Sync offers an easier solution: it supports the
--switch
parameter, which swaps the source and destination structures,
effectively reversing the sync direction. This means, you do not have to
duplicate your commands or scripts, but simply add a parameter to switch the
sync direction.
If you use shell scripts to store your sync commands, you should write them in a way that they support additional parameters. For instance, if your backup script looks as follows:
#!/bin/sh
./stream-sync.sh /data/documents dav:https://webdav.my-storage.com/backup/ \
--dst-user backup-user --timeout 600 --dst-crypt-mode filesAndNames \
--log ~/logs/backup.log
Add the special parameter "$@"
at the end, which represents all the
parameters entered by the user:
#!/bin/sh
./stream-sync.sh /data/documents dav:https://webdav.my-storage.com/backup/ \
--dst-user backup-user --timeout 600 --dst-crypt-mode filesAndNames \
--log ~/logs/backup.log "$@"
You can now transform your backup script to a restore script by simply adding
the --switch
parameter:
./backup.sh --switch
Note
|
The --switch option is available only in mirror mode, since in sync
mode the structures have different semantics attached to them. The local
structure is the one that is backed by a local state file. Therefore, it is not
easily possible to switch the direction of the process.
|
As it is the case for mirror mode, a number of command line options is available only if Sync mode is active. In most cases, these are related to the local state managed by Stream Sync for sync processes. The following subsections deal with these options.
As has been shortly mentioned in the Sync mode section, Stream Sync manages a file with information about the local state for each sync stream. Based on this file, it can detect local updates and compute the changes to be applied to the remote structure (or recognize conflicting changes). A sync stream is identified by its local and remote sources; for each combination of a local and a remote source, a separate state file is created.
Per default Stream Sync can manage these state files transparently without user
interaction. They are stored in a subfolder named .stream-sync
in the current
user’s home directory and have a (non-readable) name derived from the URIs of the
local and remote structures. (Actually, the name is computed by concatenating
the local and the remote URI, calculating a SHA-1 hash on this string, and
applying a Base64-encoding on the result; but this is merely an implementation
detail.)
While these defaults should work well in most cases, they can be overridden with some command line options:
-
--state-path
allows specifying the path in which the state file is created. Here the user can provide an arbitrary directory. The path will be created if it does not exist. -
--stream-name
can be used to set a name for the sync stream. The state file is then given this name instead of the cryptic auto-generated one.
The following fragment shows a usage example of these options:
Sync /data/documents dav:https://webdav.my-storage.com/backup/ --sync \
--state-path /data/sync/state
--stream-name 'documents-backup'
--dst-user backup-user
Every run of a sync process updates the local state file associated with the stream. For the initial execution of the stream, a state file does not exist yet. This is no problem if one of the structures taking part in the sync process is empty and will be initialized from the other side. Then the initial sync run actually becomes a mirror: Stream Sync copies all the files found in the existing structure to the empty one and writes an up-to-date local state file automatically.
If there is already data on both sides, however, there should better be a valid
local state file available before running a first sync process. Otherwise,
Stream Sync considers all local files as newly created and will treat changes
on remote files as conflicts. To avoid this, you should create a clean local
state file that reflects the current state of the local structure. This is
achieved by adding the --import-state
switch to the command line. The switch
enables a special mode, in which only the local structure is iterated over, and
all files encountered are recorded in the state file. Afterwards, a fresh and
up-to-date state file exists. A (re-)import of the local state can also be done
if the state file got corrupted for whatever reason.
For the example sync stream from the previous section, an import command could look as follows:
Sync /data/documents dav:https://webdav.my-storage.com/backup/ --sync --import-state \
--state-path /data/sync/state
--stream-name 'documents-backup'
--dst-user backup-user
Note
|
You could of course drop the options that configure the local state file. Then the file would be created and initialized at its default location in the user’s home directory. |
Note
|
The remote side of the sync process must be fully specified, even if it will not be accessed by this sync run. This is because the default name of the state file is derived from the URIs for the local and remote structures; so it must be present. |
This should be a frequent use case, in which some local work is saved on an
external hard disk. The command line is pretty straight-forward, as the target
drive can be accessed like a local drive; e.g. under Windows it is assigned a
drive letter. The only problem is that if the file system on the external drive
is FAT32, it may be necessary to explicitly specify a time zone in which
last-modified timestamps are interpreted (refer to the description of local
directories for more information). For this purpose, the time-zone
option
needs to be provided. In addition, the ignore-time-delta
option is set to a
value of 2 seconds to make sure that small differences in timestamps with a
granularity below seconds do not cause unnecessary copy operations.
Sync C:\data\work D:\backup\work --dst-time-zone UTC+02:00 --ignore-time-delta 2
Consider the case that a directory structure stores the data of different projects: the top-level folder contains a sub folder for each project; all files of this project are then stored in this sub folder and in further sub sub folders.
On your local hard-disk you only have a subset of all existing projects, the ones you are currently working on. On a backup medium all project folders should be saved.
Default sync processes are not suitable for this scenario because they would
remove all project folders from the backup medium that are not present in the
source structure. This can be avoided by using the min-level
filter as
follows:
Sync /path/to/projects /path/to/backup --filter-remove min-level:1
This filter statement says that on the top-level of the destination structure no remove operations are executed. For the example at hand the effect is that folders for projects not available in the source structure will not be removed. In the existing folders, however, (which are on level 1 and greater) full sync operations are applied; so all changes done on a specific project folder are transferred to the backup medium.
As described under Sync log files, with the correct options mirror processes can be stopped at any time and resumed at a later point in time. The first step is to generate a so-called sync log, i.e. a file containing the operations to be executed to sync the structures in question:
Sync /path/to/source /path/to/dest --dry-run --log /data/sync.log
This command does not change anything in the destination structure, but only creates a file /data/sync.log with a textual description of the operations to execute. (Such files have a pretty straight-forward structure. Each line represents an operation including an action and the element affected.)
Now another mirror process can be started that takes this log file as input. To keep track on the progress that is made, a second log file has to be written - the progress log:
Sync /path/to/source /path/to/dest --sync-log /data/sync.log --log /data/progress.log
This process can be interrupted and later started again with the same command line. It will execute the operations listed in the sync log, but ignore the ones contained in the progress log. Therefore, the whole sync process can be split in a number of incremental sync processes.
The following command can be used to mirror a local directory structure to an online storage:
Sync C:\data\work dav:https://sd2dav.1und1.de/backup/work \
--log C:\Temp\sync.log \
--dst-user my.account --dst-password s3cr3t_PASsword \
--dst-modified-property Win32LastModifiedTime \
--dst-modified-namespace urn:schemas-microsoft-com: \
--filter exclude:*.bak
Here all options supported by the WebDav structure type are configured. The server (which really exists) does not allow modifications of the standard WebDav getlastmodified property, but uses a custom property named Win32LastModifiedTime with the namespace urn:schemas-microsoft-com: to hold a modified time different from the upload time. This property will be set correctly for each file that is uploaded during a sync process.
Note that the --dst-password parameter could have been omitted. Then the user would have been prompted for the password.
Building upon the previous example, with some additional options it is possible to protect the data on the WebDav server using encryption:
Sync C:\data\work dav:https://sd2dav.1und1.de/backup/work \
--log C:\Temp\sync.log \
--dst-user my.account --dst-password s3c3t_PASsword \
--dst-modified-property Win32LastModifiedTime \
--dst-modified-namespace urn:schemas-microsoft-com: \
--filter exclude:*.bak \
--dst-encrypt-password s3cr3t \
--dst-crypt-mode filesAndNames \
--crypt-cache-size 1024 \
--ops-per-second 2 \
--timeout 600
This command specifies that both the content and the names of files are encrypted using the password "s3cr3t" when copied onto the WebDav server. With an encryption mode of files only the files' content would be encrypted, but the file names would remain in plain text. The size of the cache for encrypted names is increased to avoid unnecessary crypt operations. In the example the number of sync operations per second is limited to 2 to avoid that the server rejects requests because its load is too high. Also, a larger timeout has been set (600 seconds = 10 minutes), so that uploads of larger files will not cause operations to fail.
As described in the Microsoft OneDrive section, some preparations are necessary before OneDrive can be used as source or destination structure of a sync process. These are mainly related to authentication because an OAuth client for the Microsoft Identity Provider (IDP) has to be registered and integrated with Stream Sync.
As a first step, the OAuth client application has to be created in the Azure Portal. The application is assigned a client ID and a client secret and is then able to interact with the Microsoft IDP to obtain valid access tokens. Note that if Stream Sync was a closed source application, it could have been registered as a client application and be shipped with its client secret. But because the full source is available in a public repository, such a registration cannot be done; the client secret would not be very secret, would it?
The steps necessary to create a client application are described in detail in the official Microsoft documentation under OneDrive authentication and sign-in. Here we will give a short outline.
Log into the Microsoft Azure Portal and navigate to the page for
App registrations.
Here you can create a new application. You are then presented a form where you
can enter some data about the new application. Choose a name and select the
type of accounts to be supported. You also have to enter a redirect URI, which
will be invoked by the Microsoft IDP as part of the code authorization flow.
It is up to you, which redirect URI you choose; if you intend to run sync
processes on your personal machine, it is recommended to use a URI pointing to
localhost with a port number that is not in use on your computer, such as
http://localhost:8080
. This simplifies the integration with Stream Sync as
described below.
After all information has been entered, the app can be registered. The app is then assigned an ID that is displayed in the overview page. On the certificates and secrets page, you can request a new client secret. Copy this secret, it is required later on.
Next you have to add the information about your OAuth client application to Stream Sync. This is done with some command line operations. For the following steps we assume that you have defined some environment variables that are referenced in the commands below:
Variable | Description |
---|---|
SYNC_JAR |
Points to the assembly jar of Stream Sync; this is used to set the classpath for Java invocations. |
CLIENT_ID |
Contains the client ID of the app you have just registered at the Azure Portal. |
CLIENT_SECRET |
Contains the secret of this app. |
TOKEN_STORE |
Points to the directory where Stream Sync should store
information about OAuth client applications, e.g. |
With a first command, basic properties of the client application are specified:
$ java -cp $SYNC_JAR com.github.sync.cli.oauth.OAuth init \
--idp-storage-path $TOKEN_STORE \
--idp-name microsoft \
--auth-url https://login.live.com/oauth20_authorize.srf \
--token-url https://login.live.com/oauth20_token.srf \
--scope "files.readwrite offline_access" \
--redirect-url http://localhost:8080 \
--client-id $CLIENT_ID \
--client-secret $CLIENT_SECRET
Here we use the name microsoft to reference this IDP and a localhost redirect
URI. The other options, the URLs and the scope values, are defined by the
OneDrive API and must have exactly these values. This command will prompt you
for a password for the IDP; sensitive data in the token directory is encrypted
with this password. (If you do not want the files to be encrypted, add the
option --encrypt-idp-data false
.)
Now we can do a login against the Microsoft IDP and obtain an initial pair of an access and refresh token:
$ java -cp $SYNC_JAR com.github.sync.cli.oauth.OAuth login \
--idp-storage-path $TOKEN_STORE \
--idp-name microsoft
This command will open your standard Web browser and point it to the authorization URL of the Microsoft IDP. You are presented a form to enter the credentials of your Microsoft account. You are then asked whether you want to grant access to your client application. Confirm this.
Because we have used a redirect URI of the form http://localhost:<port>;
the
authorization code can be obtained automatically, and the command should finish
with a message that the login was successful. (For other redirect URIs you have
to determine the code yourself and enter it at the prompt in the console.)
After completion of these steps, Stream Sync has all the information to
authenticate against your OneDrive account. So you can run a sync process. One
piece of information you still need is the ID of your OneDrive account. This
can be obtained by signing in into the
OneDrive Web application.
The browser’s address bar shows a URL of the form
https://onedrive.live.com/?id=root&cid=xxxxxx
. The ID in question is the
alphanumeric string after the cid parameter. We assume that you create an
environment variable DRIVE_ID with this value.
The following command shows how the local work
directory can be synced
against the data
folder of your OneDrive account:
Sync ~/work onedrive:$DRIVE_ID \
--dst-path /data \
--dst-idp-storage-path $TOKEN_STORE \
--dst-idp-name microsoft
Of course, you can use other standard options as well, for instance for setting timeouts, configuring encryption or set filters. The following example uses the same options as the one in the section about WebDav and encryption:
Sync ~/work onedrive:$DRIVE_ID \
--dst-path /data \
--dst-idp-storage-path $TOKEN_STORE \
--dst-idp-name microsoft \
--log C:\Temp\sync.log \
--filter exclude:*.bak \
--dst-encrypt-password s3cr3t \
--dst-crypt-mode filesAndNames \
--crypt-cache-size 1024 \
--ops-per-second 2 \
--timeout 600
The steps to set up Stream Sync for an integration with Google Drive are very similar to the ones described in the OneDrive example. Specifically, an application needs to be created in the Google Cloud Platform Console, in order to obtain the credentials (the OAuth client ID and secret) required for the authentication with Google’s OAuth identity provider. As the OneDrive example covers the basics in detail, this section will focus mainly on the differences between these cloud storage providers.
Documentation about the process can be found in the Official Google documentation. Here is a short summary:
At first, a new project has to be created in the Google Cloud Platform Console. With this new project selected, under Credentials click CREATE CREDENTIALS and select OAuth Client ID. Set the Application type to Desktop app and enter a name for the new client. After the successful creation of the OAuth client, the web application will present its client ID and secret. In contrast to Microsoft’s OAuth implementation, no redirect URL needs to be specified when selecting Desktop app as client type. We can use a local redirect URL later when interacting with the identity provider.
With the OAuth client ID and secret available, Stream Sync can now be configured with the details of this client. This can be done using the following command:
$ java -cp $SYNC_JAR com.github.sync.cli.oauth.OAuth init \
--idp-storage-path $TOKEN_STORE \
--idp-name google \
--auth-url https://accounts.google.com/o/oauth2/v2/auth \
--token-url https://oauth2.googleapis.com/token \
--scope "https://www.googleapis.com/auth/drive https://www.googleapis.com/auth/drive.file https://www.googleapis.com/auth/drive.metadata" \
--redirect-url http://localhost:8080 \
--client-id $CLIENT_ID \
--client-secret $CLIENT_SECRET
Note
|
Here again some environment variables are referenced that are expected to have been initialized with the corresponding information. They are explained in the OneDrive example. Of course, you can use a different name for this configuration than google. |
The next step is a login against the Google identity provider. It can be triggered with the command below:
$ java -cp $SYNC_JAR com.github.sync.cli.oauth.OAuth login \
--idp-storage-path $TOKEN_STORE \
--idp-name google
The command opens a web browser and navigates to a login page served by the Google OAuth identity provider. The account you select for the login will be the one that is later accessed by Stream Sync. You have to confirm that you grant access to the application you have created before. After a successful login, Stream Sync should be able to obtain the OAuth tokens and store them locally in the configured path.
You can now run sync processes using your Google Drive account as source or
destination structure. For instance, the following command syncs the folder
/data/google
against the full content stored in your Google Drive:
Sync /data/google googledrive: \
--dst-idp-storage-path $TOKEN_STORE \
--dst-idp-name google
The destination URI googledrive:
refers to the root folder of your Google
Drive. It is possible to specify a path after the googledrive: prefix; so you
could sync only the sub folder music
as follows:
Sync /data/google/music googledrive:music \
--dst-idp-storage-path $TOKEN_STORE \
--dst-idp-name google
Of course, all other options provided by Stream Sync, like encryption or filters, are available as well.
This section discusses the initialization of a sync process over already existing data on a concrete example. It assumes that the mirror mode of Stream Sync has already been used to keep a backup of a local folder with music files on a Google Drive account. (For simplicity, we use the example from the previous section about Google Drive.) Now another device comes into play that should have read and write access to the music collection. The challenge here lies in the correct setup of the local state file.
The first step is to make sure that the local folder contains the most recent data and is in sync with the content of the Google Drive folder. The straight-forward way to achieve this is by running again a mirror process that applies all local changes to the Cloud folder:
Sync /data/google/music googledrive:music \
--dst-idp-storage-path $TOKEN_STORE \
--dst-idp-name google
This assumes that modifications were done only locally, since all changes on the Google Drive Folder are overridden. If this was not the case, you would have to manually ensure that both structures contain the same, up-to-date data.
After the local folder has the correct content, the local state can now be imported using the command below. We use the standard name and location for the local state file:
Sync /data/google/music googledrive:music \
--sync \
--import-state \
--dst-idp-storage-path $TOKEN_STORE \
--dst-idp-name google
This should finish rather fast, since only the local file system is processed.
The command yields a file with local state information in the .stream-sync
subfolder of the user’s home directory.
The second device that should have access to the music collection can be initialized in a similar way. Probably, we want to run a mirror process first, but this time using the Google Drive folder as source and the local folder as destination structure - after the steps performed on the first computer, the Google Drive should contain the most recent data. After this is done, the local state can be imported as described before; execute an equivalent command, maybe the path to the local folder has to be adapted.
In the future, manipulations can be done on the data on both devices. Start a sync process when appropriate using a command like this:
Sync /data/google/music googledrive:music \
--sync \
--dst-idp-storage-path $TOKEN_STORE \
--dst-idp-name google
(This is basically the same command as for importing the local state, just
without the --import-state
flag.) Stream Sync will sync the changes from
both devices or issue warnings if it detects conflicting changes.
The Stream Sync tool makes use of Reactive streams
in the implementation of [Akka](https://akka.io/) to perform sync operations.
Both the source and the destination structure are represented by a stream source
emitting objects that represent the contents of the structure (files and
folders). A special graph stage implementation contains the actual sync
algorithm. It compares two elements from the sources (which are expected to
arrive in a defined order) and decides which action needs to be performed (if
any) to keep the structures in sync. This stage produces a stream of
SyncOperation
objects.
So far only a description of the actions to be performed has been created. In
a second step, the SyncOperation
objects are interpreted and applied to the
destination structure.
Stream Sync is available under the [Apache 2.0 License](http://www.apache.org/licenses/LICENSE-2.0.html).