Skip to content

Dataflow

Terry Brady edited this page Jun 22, 2021 · 4 revisions

Ingest and identification

Merritt's dataflow (see the diagram below) is identical whether content arrives through the UI (1) or via the SWORD endpoint (1a). The Ingest service contacts the Local ID service (2) to match up any local IDs submitted with the content against existing ARKs, and/or create new ARK-to-local-ID mappings; these mappings are stored in the Inventory database (3).

Note:

For a high-level overview of the various services, see the Architecture page. For the complete ingest, storage, and replication process, see the Ingest Process page.

Storage and inventory

The Ingest service then pushes a manifest (4) of staged content to the Storage service, which pulls the staged content (5) to its own local storage and pushes it (6) to its primary storage node. When this process completes, the Ingest service pushes the storage URL for the object manifest to the Inventory queue (7), creating an inventory job.

To process the job, the Inventory service pulls the manifest storage URL from the queue (8) and uses it to pull first the manifest (9), and then (based on the manifest) the object's system metadata (the files in the object's system directory, as well as various files in the producer directory with the mrt- prefix, if present). The information in these files is used to populate the Inventory database (10).

Replication and Audit

The Replication service scans the Inventory database (11) for objects that need to be replicated, pulls the content from each object's primary storage node (11a), pushes it to the object's secondary storage node, and updates the object's replication status in the database.

The Audit service, similarly, continually scans the Inventory database (12) for files that has never been audited, or for the least-recently-audited files, which it pulls (12a) from their storage nodes (both primary and secondary). After recalculating the hash of each file, it writes the updated audit status to the database.

Diagram

(click image to view full size; or download PDF)

Merritt Dataflow