A NextJS UI application for the NUL Data Science Project.
Public preview branch: https://main.d3nyatpv9uoqqk.amplifyapp.com/
Features visualizations of vector clustering. 3-d Scatterplots are visualized using Plotly.js and, and the 2-d scatterplot is visualized using D3.js.
The goal is to visualize the clusters of the embeddings and the metadata in a way that is intuitive and informative.
The dataset is combination of metadata and vector embeddings of the metadata from NUL digital Collections. (Where does the data come from eventually?)
For initial testing, ndjson
data is generated by the src/lib/notebooks/convert.ipynb
notebook (which can be ran inside VSCode in your dev env). Number of records created and the number of dimensions can be adjusted in the notebook. The default values are:
output_row_count = 50
number_of_dimensions = 3
To access the source vectors file src/lib/data/vectors_full.ndjson
(and run the notebook), please contact Brendan or a member of the RDC team for a download link (~700MB).
Some sample output data is included in the src/lib/data
directory, to feed chart components:
src/lib/data/vectors_tsne_2d_reduced.json
src/lib/data/vectors_tsne_3d_reduced.json
...
pnpm install
pnpm dev
Tests are configured to run with Playwright.
Headless tests (quick)
pnpm test:e2e
Tests with UI (slower, better visuals)
pnpm test:e2e-ui