-
Notifications
You must be signed in to change notification settings - Fork 2
ActiveRecord and Replication intro
As an ORM for the Preservation Catalog's underlying Postgres SQL database, ActiveRecord combined with the Rails Console gives us a way to interact with Preservation Catalog (to do things like replicate Moabs to cloud archives).
https://guides.rubyonrails.org/active_record_basics.html
After SSH into the production VM, launch the console
cd preservation_catalog/current/
bundle exec rails console production
CompleteMoab.by_druid(druid).each(&:validate_checksums!)
You might want to do this if you find a specific druid that you know has an outdated version in the catalog.
root = MoabStorageRoot.find_by(…)
MoabToCatalogJob.perform_now(root, druid, path) # or perform_later
For the simplest case of backfilling as-yet unreplicated versions of a single Moab, you can kick off replication by pasting code similar to the following in rails console:
# by_druid scope could theoretically return multiple rows, though there
# is only one CompleteMoab per druid as of this writing
CompleteMoab.by_druid('jf301dx7536').first.create_zipped_moab_versions!
If you have a list of druids you want to replicate, you could load them up into an array of strings (e.g. by reading the lines out of a text file), and do the following:
# druid_list is an array of strings representing the druids to replicate. providing
# a list to a parameter in the where clause of an ActiveRecord query will automatically
# generate SQL with an IN clause query instead of a single-value equality comparison.
CompleteMoab.by_druid(druid_list).find_each(&:create_zipped_moab_versions!)
From @julianmorley's request in Slack about how to do the following,
sending zipmaker only druids that prescat doesn't think are replicated to us-west-2, limited to druids < 10G, limit 10K druids
The following ruby code can be pasted and run in the rails console:
moab_size_limit = 10_000_000_000 # in bytes
result_limit = 10_000
endpoint = ZipEndpoint.find_by!(endpoint_name: 'aws_s3_west_2')
CompleteMoab.where.not(
id: endpoint.zipped_moab_versions.select(:complete_moab_id)
).where("size < ?", moab_size_limit).limit(result_limit).find_each(&:create_zipped_moab_versions!)
Alternatively, you could split the above into two separately run queries, since it'd be fine to get the list of druids to avoid once, before the outer query is run (generally it's better practice to get everything into one query if you can, but Julian ran into some performance trouble with the all-in-one version, and a little kludging is fine for one-off manual scripting):
moab_size_limit = 10_000_000_000 # in bytes
result_limit = 10_000
endpoint = ZipEndpoint.find_by!(endpoint_name: 'aws_s3_west_2')
# id_list could be whatever, could be e.g. the result of endpoint.zipped_moab_versions.pluck(:complete_moab_id). note
# the `pluck` on the separate query, which returns an array -- don't want to pass the AR relation into the outer which is
# query, the same thing the above is doing. want to pass an already retrieved list of strings instead.
id_list = endpoint.zipped_moab_versions.pluck(:complete_moab_id).uniq
CompleteMoab.where.not(
id: id_list
).where("size < ?", moab_size_limit).limit(result_limit).find_each(&:create_zipped_moab_versions!)
Using ActiveRecord, the create_zipped_moab_versions!
call in the ruby code above will create the ZippedMoabVersion
records in the DB that will represent the things we're shipping off to the cloud. A hook on the ZippedMoabVersion
model will then initiate the replication process by invoking the zip maker job.
The following SQL for selecting the druids that meet these conditions was automatically generated by ActiveRecord (from the all-in-one query with the AR select
subquery):
-- just for illustration/context, the Rails console must be used to initiate replication
SELECT "complete_moabs".*
FROM "complete_moabs"
WHERE ("complete_moabs"."id" NOT IN (
SELECT "zipped_moab_versions"."complete_moab_id"
FROM "zipped_moab_versions"
WHERE "zipped_moab_versions"."zip_endpoint_id" = 1))
AND (size < 10000000000)
LIMIT 10000
PreservedObject.find_by(druid: 'th060bv0250').complete_moabs.first.moab_storage_root.name
- Replication errors
- Validate moab step fails during preservationIngestWF
- ZipmakerJob failures
- Moab Audit Failures
- Ceph Errors
- Job queues
- Deposit bag was missing
- ActiveRecord and Replication intro
- 2018 Work Cycle Documentation
- Fixing a stuck Moab
- Adding a new cloud provider
- Audits (how to run as needed)
- Extracting segmented zipfiles
- AWS credentials, S3 configuration
- Zip Creation
- Storage Migration Additional Information
- Useful ActiveRecord queries
- IO against Ceph backed preservation storage is hanging indefinitely (steps to address IO problems, and follow on cleanup)