General:
- Count total number of documents
- Count total number of pages
- Count total number of words
- Get measure of OCR quality for each page and group by year
- Normalize
Key searches:
- Count number of occurrences of keywords/keysentences (pages) and group by year
- Count number of occurrences of keywords/keysentences (pages) and group by word
- Count number of occurrences of pages with keywords/keysentences and group by year
- Count number of occurrences of pages with keywords/keysentences and group by book
- Count number of occurrences of keywords/keysentences and group by year
- Count number of occurrences of keywords/keysentences and group by book
- Get concordance (details) - window of words - for keywords/keysentences and group by year
- Get concordance (details) - full page - for keywords/keysentences and group by year
Note: keysearch_by_year.md and keysearch_by_year_page_count.md perform the same action.
Store preprocessed pages using different storage solutions:
- Ingest NLS pages, clean them, preprocess them, and store them using ElasticSearch
- Ingest NLS pages, clean them, preprocess them, and store them using HDFS
- Ingest NLS pages, clean them, preprocess them, and store them using PSQL dabase
- Ingest NLS pages, clean them, preprocess them, and store them using a YML file
Geoparser queries:
- Geoparser NLS pages using the original Edinburgh Geoparser
- Geoparser NLS pages using spacY and Edinburgh Georesolver
Note: Additional information about how to install and run those queries, including how to install and download the Edinburgh Geoparser can be found here.
Colocated word searches:
Old queries:
- Get concordance for keywords and group by word
- Get concordance (details) -full page- for keywords and group by year
Important: We recommend to read also the documentation for running nls indivual queries and nls aggregated queries.