-
Notifications
You must be signed in to change notification settings - Fork 2
Overview
Justin Littman edited this page Jan 9, 2023
·
1 revision
- Maintains a catalog of all objects on all known storage roots
- A storage root contains a druid tree. Each full druid path contains a Moab. Example storage roots are available in
spec/fixtures/
.- Moab is our preservation format. It uses forward-delta versioning to minimize archive size. Each version directory has metadata about its contents, including checksums that can be used for fixity checking. More information on the Moab perservation format is available here: http://journal.code4lib.org/articles/8482
- A storage root contains a druid tree. Each full druid path contains a Moab. Example storage roots are available in
- Regularly and continuously checks the integrity of the Moabs it manages, including validation of Moab directory structure and fixity checking of content.
- Replicates Moabs to S3 compatible cloud buckets.
- our robot infrastructure messages Preservation Catalog when an object has landed on a storage root after ingest or versioning. Pres Cat replicates the object.
- If needed, prior versions of an object will be backfilled when the current version is replicated.
- If needed, missing parts will be backfilled by a replication audit.
- Much of the replication happens automatically via ActiveRecord hooks -- e.g. when a CompleteMoab is created, the associated records for the ZippedMoabVersions are automatically
created. Upon creation of those records, a job is queued for each so that the zip file for each version eventually gets created. After that, each worker calls the next worker in
the chain upon success (ZipmakerJob -> PlexerJob -> [S3WestDeliveryJob, IbmSouthDeliveryJob] -> ResultsRecorderJob).
- The worker that makes zips and the worker that does delivery are intentionally designed to not need the DB. Though they are rails workers in the pres cat app at the moment, we could rewrite them in something else in another codebase later, if we felt that would be more performant. Plexer and ResultsRecorder must query and update the DB, and so have to know about it.
- our robot infrastructure messages Preservation Catalog when an object has landed on a storage root after ingest or versioning. Pres Cat replicates the object.
- Regularly and continuously audits whether each Moab it knows about is also replicated to the endpoints specified by the applicable preservation policy.
- Postgres: The database of record. Has metadata about storage locations (Moab Storage Roots) and replication endpoints (Zip Endpoints), as well as the policy info for which Moabs should get replicated to which archive endpoints. Also tracks the state of all the Moabs on all the known Moab Storage Roots, as well as their replicated copies on Zip Endpoints.
- Rails: We use ActiveRecord heavily for interacting with PG, ActiveJob heavily for implementing workers, and a bit of the web stuff for receiving info from workers.
- Redis, Sidekiq: For managing workers.
The robots notify pres cat when an object is ingested or versioned, so that it can be added to the catalog, and eventually replicated to the appropriate cloud providers.
- The replication results queue.
- Workflows
- Solely for the purpose of exposing audit results. Pres Cat does not act directly on workflow states (though it may act on input from robots that do).
- View the dashboard
- Query the catalog to see what state a particular object or set of objects is in, how many objects are in a given state for a given storage root, how many objects are in a given state for a given replication endpoint, etc.
- Integrity check individual objects, lists of objects, objects on a given storage root, etc.
- Replicate objects. This should only be a manual process when backfilling versions for objects that were added to the catalog before an applicable Zip Endpoint was added.
- Replication errors
- Validate moab step fails during preservationIngestWF
- ZipmakerJob failures
- Moab Audit Failures
- Ceph Errors
- Job queues
- Deposit bag was missing
- ActiveRecord and Replication intro
- 2018 Work Cycle Documentation
- Fixing a stuck Moab
- Adding a new cloud provider
- Audits (how to run as needed)
- Extracting segmented zipfiles
- AWS credentials, S3 configuration
- Zip Creation
- Storage Migration Additional Information
- Useful ActiveRecord queries
- IO against Ceph backed preservation storage is hanging indefinitely (steps to address IO problems, and follow on cleanup)