Carnegie Hall Archives maintains a series of small, portable scripts to expedite batch processes for quality control on our Digital Collections.
These scripts have benefitted immensely from a wide community of archiving, preservation, and programming experts who share their code and troubleshooting techniques online. We are excited by the opportunity to participate in this community and have our methods improve through open collaboration and mutual exchange.
Script Name | Purpose |
---|---|
checksumValidation.py | Compares two given checksum hashes and outputs log of pass/fail/missing |
qa_cksum.sh | Creates formatted output of md5, md5 create timestamp, filename, mime type, last modified timestamp |
copyFilesFromList.py | Copies files to a target directory based on filename identified in a CSV |
matchvaluesfromlists.py | Compares two lists based on shared value and outputs information about that value |
md5Scrape.py | Scrapes all .md5 sidecar files in a given directory and outputs information into a formatted CSV |
reconcileList.py | Compare two lists of files, and output CSV of non-matching values |
embedCopyrightMetadata.sh | Script to embed hardcoded Creator and Copyright Notice metadata using ExifTool |
mediaconch-xmlreport-summary.py | Script to print pass/fail counts when given a MediaConch XML report |
- Digitization specs provided to our vendors for reformatting.
- A working draft of our post-digitization quality control workflow.
- Technical notes, or brief descriptions of snippets of code used to satisfy various small-scale use cases in our quality control workflow.
This code is provided “as is” and for you to use at your own risk. The information included in the contents of this repository is not necessarily complete. Carnegie Hall offers the scripts as-is and makes no representations or warranties of any kind.
The MIT License (MIT)
Copyright (c) 2016 Carnegie Hall
All contents are released under the terms described in the MIT License included in this repository.
We plan to update the scripts regularly. CH Archives welcomes your thoughts, questions, and recommendations on our evolving quality control strategies.
Anyone is welcome to start a new topic ("issue") by selecting the Issues
tab in GitHub and clicking the green New Issue button in the upper right.
All existing issues, open and closed, may be reviewed or commented upon in the Issues section.
Email your thoughts to the Carnegie Hall Archives at archives@carnegiehall.org with the subject line Digital Collections: Quality Control.