-
Notifications
You must be signed in to change notification settings - Fork 2
DB best practices
Justin Littman edited this page Jan 9, 2023
·
1 revision
Preservation Catalog uses PostgreSQL to store metadata about the last known state of all known copies of preserved SDR objects. It relies heavily on DB-level constraints for keeping data consistent.
The Rails API docs: http://api.rubyonrails.org/classes/ActiveRecord/Transactions/ClassMethods.html
- If possible, avoid declaring a transaction explicitly, e.g. by updating a parent object and its children, declaring
autosave: true
on the parent for the child relationships, and callingsave
orsave!
on the parent object. ActiveRecord will implicitly perform this multi-level save in a transaction. - Please use
ApplicationRecord.transaction
for clarity and consistency.- Functionally, it doesn't matter whether
.transaction
is called on a specific ActiveRecord class or object instance, because the transaction applies to the database connection, and all updates in a given thread of operation will be going over the same database connection (it is possible to configure ActiveRecord to do otherwise, but like most applications, we don't). To reduce confusion, it seems best to just always invoke it via the super-class, so that it's clear that the transaction applies to all object types being grouped under it.
- Functionally, it doesn't matter whether
- If two or more things should fail or succeed together atomically, they should be wrapped in a transaction. E.g. if you're creating a PreservedObject so that there's a master record for the MoabRecord that you'd like to create, those two things should probably be grouped as a transaction so that if the creation of the MoabRecord fails, the creation of the PreservedObject gets rolled back, and we don't have a stray lying around.
- Don't wrap more things than needed in a transaction. If multiple operations can succeed or fail independently, it's both semantically incorrect and needlessly inefficient to group them in a transaction.
- Likewise, try not to do any unnecessary processing in the
Application.transaction
block. It's fine to wrap nested chains of method calls in a transaction, as it might be a pain to decompose your code such that the transaction block literally only contained ActiveRecord operations. At the same time, the longer a transaction is open, the higher the chances that two different updates will try to update the same thing, possibly causing one of the updates to fail. If it's easy to keep something unnecessary out of the transaction block, it'd be wise to do so.
- Likewise, try not to do any unnecessary processing in the
- In the unlikely event that you're tempted to pro-actively do row-locking, e.g. due to concern about multiple processes updating a shared resource (e.g. if multiple processes were crawling the same storage root and doing moab validation), the Postgres docs seems to advise against that. Instead, specifying transaction isolation level seems to be recommended as the more robust and performant approach.
- Isolation level can be passed as a param to the transaction block, e.g.
ApplicationRecord.transaction(isolation: :serializable) { ... }
-
serializable
is the strictest isolation level: https://www.postgresql.org/docs/current/static/transaction-iso.html- relevant Rails API doc: http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/DatabaseStatements.html#method-i-transaction
- more background and advice from the PG wiki: https://wiki.postgresql.org/wiki/Serializable
- If there is actually little contention in practice, the PG docs seem to indicate that specifying isolation level, even something as strict as
serializable
, should have little to no overhead (though it could increase chances of failed updates if there is actual resource contention, though code should already be prepared to handle failed DB updates gracefully). As such, there seems to be little risk to erring on the side of a strong isolation level when in doubt. - E.g. you probably shouldn't do it this way: http://api.rubyonrails.org/classes/ActiveRecord/Locking/Pessimistic.html
- Isolation level can be passed as a param to the transaction block, e.g.
- You should likely be catching ActiveRecord exceptions outside of the transaction block, as you likely want to abort the transaction after the first ActiveRecord exception anyway. In other words, wrap transactions in exception handling, and not vice versa. See "Exception handling and rolling back": http://api.rubyonrails.org/classes/ActiveRecord/Transactions/ClassMethods.html
- Use of ActiveRecord enums (http://edgeapi.rubyonrails.org/classes/ActiveRecord/Enum.html):
- In the enum definition, explicitly map each enum string to its underlying integer value, and leave a note admonishing future developers not to change enums that are already in use in production (or to think about the needed migration). This is to prevent unintentional re-mapping of existing enum values.
- Example: https://github.com/sul-dlss/preservation_catalog/blob/main/app/models/moab_record.rb
- Replication errors
- Validate moab step fails during preservationIngestWF
- ZipmakerJob failures
- Moab Audit Failures
- Ceph Errors
- Job queues
- Deposit bag was missing
- ActiveRecord and Replication intro
- 2018 Work Cycle Documentation
- Fixing a stuck Moab
- Adding a new cloud provider
- Audits (how to run as needed)
- Extracting segmented zipfiles
- AWS credentials, S3 configuration
- Zip Creation
- Storage Migration Additional Information
- Useful ActiveRecord queries
- IO against Ceph backed preservation storage is hanging indefinitely (steps to address IO problems, and follow on cleanup)