CLI application for generating the Enhanced Networked Monographs (ENM) website and the Solr index which powers the search application.
Information about the ENM project can be found on the project website.
enm
is a CLI application for performing various backend ENM functions:
- Create static pages
- About
- Home
- Create browse topics lists: e.g. featured topics
- Create topic pages: e.g. culture -- popular
- Load
enm-pages
Solr index - Automatically create cached data files which can be used in place of the Postgres database in subsequent jobs: nyudlts/enm-cache
The enm
program is feature complete with good test coverage, but it has not been
fully productionized yet: see Future Improvements.
Enough development was done to create an initial stable and correct demo site.
ENM data is currently frozen and there are no active plans to add or change data at this time.
- AWS CLI v2 - for deployment scripts.
- Go - version 1.16 or higher
The Postgres database from TCT should already be set up on devdb1.dlib.nyu.edu with the correct user credentials. See https://jira.nyu.edu/jira/browse/NYUP-437 for details.
git clone git@github.com:NYULibraries/dlts-enm.git
cd dlts-enm/
go build
mv dlts-enm enm
Set environment variables for database access:
export ENM_POSTGRES_DATABASE=enm
export ENM_POSTGRES_DATABASE_HOSTNAME=127.0.0.1
export ENM_POSTGRES_DATABASE_USERNAME=enm_readonly
export ENM_POSTGRES_DATABASE_PASSWORD=[password for devdb1.dlib.nyu.edu:enm database for user enm_readonly]
Note that we use 127.0.0.1 even though the database is remote because we'll need to access the remote Postgres server through an SSH tunnel through the bastion host.
These environment variables must be set before running any enm
command that
requires database access. Failure to do so will cause a panic
.
Set location of the cache using ENM_CACHE:
export ENM_CACHE=$HOME/enm-cache/
This environment variable is optional. If it is set and the path that is pointed
to does not already exist, enm
will create it, along with any needed intermediate
directories.
If ENM_CACHE
is not set, the location of the cache defaults to /tmp/enm-cache/
.
Set up SSH tunneling to port 5432 on Postgres host devdb1.dlib.nyu.edu through bastion host by running this command in a separate terminal:
ssh -N -L 5432:devdb1.dlib.nyu.edu:5432 [USERNAME]@b.dlib.nyu.edu
This will allow remote database access via local port 5432.
A less verbose command can be run if the following is set up in .ssh/config
:
Host devdb1
Hostname devdb1.dlib.nyu.edu
ProxyCommand ssh bastion -W %h:%p
User [USERNAME]
...
Host bastion
Hostname b.dlib.nyu.edu
User [USERNAME]
In a separate terminal, run this command:
ssh -N -L 5432:devdb1:5432 bastion
There is a deploy script bin/deploy-site.sh
that generates the full website,
syncs it with the S3 bucket (without touching the /search/ path that contains
the search application built by dlts-enm-search-application),
and invalidates the website paths in CloudFront so that the latest files are fetched
from S3.
There is also a convenience wrapper script bin/deploy-site_interactive.sh
which
will call bin/deploy-site.sh
with options set according to the user's responses to
interactive prompts:
bin/deploy-site_interactive.sh dev
bin/deploy-site_interactive.sh stage
bin/deploy-site_interactive.sh prod
See examples for demonstrations of deployment for other use cases.
The deploy script runs all the sitegen
commands (detailed below) with destination
set to directories in dist/
.
Note that the deploy script does not perform Solr indexing.
./enm help
./enm help sitegen
./enm help sitegen browsetopicslists
./enm sitegen browsetopicslists --destination=[DESTINATION]
./enm sitegen browsetopicslists --destination=[DESTINATION] --source=cache
./enm sitegen sitepages --destination=[DESTINATION]
This automatically creates cache files in /tmp/enm-cache/sitegen-topicpages/ that can be used as the data source for subsequent topic pages generation runs:
./enm sitegen topicpages --source=database --destination=[DESTINATION]
./enm sitegen topicpages --source=cache --destination=[DESTINATION]
./enm sitegen topicpages --source=database --destination=[DESTINATION] [TOPIC ID 1] [TOPIC ID 2]
./enm sitegen topicpages --source=cache --destination=[DESTINATION] [TOPIC ID 1] [TOPIC ID 2]
./enm solr load --server=[SOLR SERVER] --port 8983
./enm solr load --server=[SOLR SERVER] --source=cache --port 8983
In the example below, it is assumed that the dlts-enm
repo is located at
$GOPATH/src/github.com/nyulibraries/dlts-enm/
, and the https://github.com/nyudlts/enm-cache
repo has already been cloned to $HOME.
$ export ENM_CACHE=$HOME/enm-cache/
$ bin/deploy-site_interactive.sh dev
Do complete regeneration of the site before copying to server? [y/n] y
Use the cache for regenerating the site? [y/n] y
Generating site pages...
Generating browse topics lists...
Generating topic pages...
upload: dist/about.html to s3://dlts-enm-dev/about.html
upload: dist/browse-topics-lists/0-9.html to s3://dlts-enm-dev/browse-topics-lists/0-9.html
upload: dist/browse-topics-lists/enm-picks.html to s3://dlts-enm-dev/browse-topics-lists/enm-picks.html
upload: dist/browse-topics-lists/g.html to s3://dlts-enm-dev/browse-topics-lists/g.html
upload: dist/browse-topics-lists/d.html to s3://dlts-enm-dev/browse-topics-lists/d.html
upload: dist/browse-topics-lists/f.html to s3://dlts-enm-dev/browse-topics-lists/f.html
upload: dist/browse-topics-lists/c.html to s3://dlts-enm-dev/browse-topics-lists/c.html
upload: dist/browse-topics-lists/j.html to s3://dlts-enm-dev/browse-topics-lists/j.html
upload: dist/browse-topics-lists/i.html to s3://dlts-enm-dev/browse-topics-lists/i.html
[...SNIPPED...]
upload: dist/topic-pages/00/00/04/74/0000047490.html to s3://dlts-enm-dev/topic-pages/00/00/04/74/0000047490.html
upload: dist/topic-pages/00/00/04/74/0000047476.html to s3://dlts-enm-dev/topic-pages/00/00/04/74/0000047476.html
upload: dist/topic-pages/00/00/04/74/0000047485.html to s3://dlts-enm-dev/topic-pages/00/00/04/74/0000047485.html
upload: dist/topic-pages/00/00/04/74/0000047491.html to s3://dlts-enm-dev/topic-pages/00/00/04/74/0000047491.html
upload: dist/topic-pages/00/00/04/74/0000047492.html to s3://dlts-enm-dev/topic-pages/00/00/04/74/0000047492.html
{
"Location": "https://cloudfront.amazonaws.com/2020-05-31/distribution/E2DL5S1BQ4HW26/invalidation/I3IVXF6N96CJXR",
"Invalidation": {
"Id": "I3IVXF6N96CJXR",
"Status": "InProgress",
"CreateTime": "2021-07-27T22:57:49.269000+00:00",
"InvalidationBatch": {
"Paths": {
"Quantity": 5,
"Items": [
"/about.html",
"/index.html",
"/browse-topics-lists*",
"/shared*",
"/topic-pages*"
]
},
"CallerReference": "cli-1627426668-935235"
}
}
}
You have new mail in /var/mail/david
$
./enm solr load --server=discovery1.dlib.nyu.edu --port 8983
Load prod Solr index from cache files at $ENM_CACHE if set, or default cache location /tmp/enm-cache/
./enm solr load --server=discovery1.dlib.nyu.edu --source=cache --port 8983
Make sure to set up access to the Postgres database before running the tests. See Set environment variables.
go test ./...
The Go code in db/postgres/models
was generated automatically by xo
using custom xo
templates. If changes are made to the Postgres database,
the models can be updated by running go generate
at the root of the project.
Configuration is done through command/subcommand options and
environment variables.
Environment variables starting with ENM_POSTGRES_DATABASE must be set before running any enm
command that
requires database access. Failure to do so will cause a panic
.
See Set environment variables.
- Real error handling/recovery/messaging/logging instead of liberal use of
panic
calls - Embed
sitegen
templates into theenm
binary using go embed. See comment in sitegen/sitegen.go for the motivation for doing this. - Write more tests, and stub out Postgres in the
solr
andsitegen
package test suites (and in all future tests).
- dlts-enm
- dlts-enm-search-application
- dlts-enm-tct-backend
- dlts-enm-tct-developer
- dlts-enm-tct-frontend
- dlts-enm-verifier
- dlts-enm-web
This project is licensed under the Apache License Version 2.0 - see the LICENSE file for details.