A standalone process that reads repository URLs (from RabbitMQ or file) and schedules fetching this repository.
A standalone process that takes URLs from RabbitMQ, clones remote repository and pushes it to the appropriate Rooted Repository in the storage (local filesystem or HDFS). Downloaded repositories will be packed into siva files so you don't need to run Borges packer (described below) on them.
A standalone process that takes repository paths (or URLs) from a file and packs them into siva files (as a Rooted Repository) in the given output directory.
A rooted repository is a bare Git repository that stores all objects from all repositories that share a common history, that is, they have the same initial commit. It is stored using the Siva file format.
Rooted repositories have a few particularities that you should know to work with them effectively:
- They have no
HEAD
reference. - All references are of the following form:
{REFERENCE_NAME}/{REMOTE_NAME}
. For example, the referencerefs/heads/master
of the remotefoo
would be/refs/heads/master/foo
. The remote name in rooted repositories generated by borges is always theid
(inUUID
form) of the repository in the PostgreSQL database. - Each remote represents a repository that shares the common history of the rooted repository. A remote can have multiple endpoints.
- In the repository config, inside each remote section you will find a
isfork
configuration, that can either betrue
orfalse
. This indicates whether the repository is a fork or the real one. Note: this does not work with Packer and the results may contain false positives and false negatives due to missing information until all available repositories are fetched, so use this with caution. - A rooted repository is simply a repository with all the git objects that are reachable from a root commit. That means a repository with multiple roots may be split across several rooted repositories instead of being in just one.
Consumer and Producer run independently, communicating though a RabbitMQ instance and storing repository meta-data in PostgreSQL.
Packer does not need a RabbitMQ or a PostgreSQL instance and is not meant to be used as a pipeline, that's what consumer and producer are meant for.
Read the borges package godoc for further details on how does borges archive the repositories.