A simple CLI for consuming Northwestern University Libraries Digital Collections API. It also includes a set of python helpers for rolling your own scripts.
pip install nuldc
pip install --upgrade nuldc
❯ nuldc --help
nuldc works <id> [--as=<format>]
nuldc collections <id> [--as=<format> --all]
nuldc search <query> [--model=<model>] [--as=<format>] [--all]
nuldc csv <query> [--fields=<fields>] [--all] <outfile>
nuldc xml <query> [--all] <outfile>
nuldc --version
--as=<format> get results as [default: opensearch]
--model=<model> search model (works,collections,filesets) [default: works]
--all get all records from search
--fields=<fields> optional set of fields,e.g id,ark,test defaults to all
-h --help Show this screen
as: opensearch
Let's get a work's manifest
nuldc works c1960aac-74f0-4ce8-a795-f713b2e3cc22
Maybe we should grab that work as a IIIF manifest.
nuldc works c1960aac-74f0-4ce8-a795-f713b2e3cc22 --as iiif
nuldc collections ecacd539-fe38-40ec-bbc0-590acee3d4f2
or get metadata as iiif
nuldc collections ecacd539-fe38-40ec-bbc0-590acee3d4f2 --as iiif
Get the whole collection as IIIF, stitching together all the pages
nuldc collections ecacd539-fe38-40ec-bbc0-590acee3d4f2 --as iiif --all
Simple search
nuldc search "berkeley AND guitars"
Page through all the results and return one big list of items (limit 200 pages)
nuldc search "trains AND chicago" --all
as iiif
nuldc search "trains AND chicago" --as iiif --all
Dumping to CSV is simple. By default it dumps all the fields that are "label". If you need to dig into specific fields you can do that as well.
nuldc csv "trains AND chicago" --all example.csv
Let's grab just a few fields.
nuldc csv "trains AND chicago" --all --fields id,title,ark example.csv
It also supports "dot" notation for getting into nested, special purpose fields.
nuldc csv "trains AND chicago" --all --fields id,title,ark,subject.id example.csv
You can export search results to an xml serialization of the data structure as well.
nuldc xml "trains AND chicago" out.xml
Or get all the records
nuldc xml "trains AND chicago" --all all.xml
All of this is pipe-able too, so if you want to do further analysis with JQ or pipe data through some other processing pipeline, go for it! For instance, let's grab just a coupld of fields from the json and reformat it into a simplified shape.
nuldc search "berkeley AND guitars" --all | jq -r '.data[] | [.title,.id]
You can search within specific fields and perform complex searches using the opensearch/elasticsearch query-string-query syntax. The query syntax is valid for all "search" operations: search, csv, xml.
Get a csv file of all works that have a fileset label including "recto"
nuldc csv "file_sets.label:Recto*" ~/Desktop/rectos.csv
Look at results that have a subject that includes "Chicago"
nuldc search "subject.label:*Chicago*"
Get Results that have a subject of "Chicago" AND a title of "Bus"
nuldc search "subject.label:*Chicago* AND title:bus"
Get results from a known collection that were modified before a certain date:
nuldc search "modified_date:<2022-10-01 AND collection.title:Berkeley*"
This project is built using POETRY. Follow the latest install instructions, clone the repository and poetry install
This project uses pytest and has a very small set of tests to ensure things are running as expected.
From a poetry shell
run pytest