Thanks for being interested in contributing to this project!
Clone this repo to your local machine. Make sure poetry is installed on your machine.
Install the dependencies.
poetry install
Feel free to enhance the existing functions and documentation.
There are some notes on making changes to the dataset:
- To make a change to the dataset, you typically need to commit to multiple repositories in the oldvis project.
- Before you start working, it's better to open an issue to discuss first.
The following lists some common scenario and steps of making changes to the dataset.
- Support data querying in libquery: Add a
querier
to query metadata and images from the new data source.- Where to place the
querier
: If the data source is a widely used digital library with an official API, place thequerier
in libquery. Otherwise, place thequerier
in libquery_extensions. - Example: The
querier
for David Rumsey Map Collection.
- Where to place the
- Support data processing in libprocess: Add a
processor
(that extends thequerier
) to process metadata.- Where to place the
processor
: If thequerier
has been placed in libquery (libquery_extensions), place theprocessor
in libprocess (libprocess_extensions). - Example: The
processor
for David Rumsey Map Collection.
- Where to place the
- Add data querying and processing scripts to this repository
- Where to place the scripts: The scripts should be placed in a new subdirectory of
./data-sources/
of this repository. - Example: The scripts for David Rumsey Map Collection.
- Scripts to be added: Please refer to the README of
./data-sources/
- Where to place the scripts: The scripts should be placed in a new subdirectory of
- Query metadata: Execute
fetch_metadata.py
in./data-sources/{data-source-name}/
to query metadata. - Query images: Execute
fetch_images.py
in./data-sources/{data-source-name}/
to query images. - Process metadata of unlabeled entries: Execute
process_unlabeled_metadata.py
in./data-sources/{data-source-name}/
. - Label unlabeled data: Label unlabeled processed metadata entries in in
./data-sources/{data-source-name}/output/metadata-processed/unlabeled.json
.- Which labeling tool to use: You may use the image classification labeler in the oldvis project. Other image labeling tool may also serve the purpose, as long as the annotations are stored in the
Annotation
data structure - Where to store the annotations: The annotations should be stored in
./data-sources/{data-source-name}/output/annotations/annotations.json
of this repository. The annotations and the process for obtaining the annotations should also be stored in./image-classification/{data-source-name}/
of oldvis/annotations for reproducibility.
- Which labeling tool to use: You may use the image classification labeler in the oldvis project. Other image labeling tool may also serve the purpose, as long as the annotations are stored in the
- Process metadata of visualizations: Execute the data processing scripts.
- Scripts to be executed: First, execute
process_vis_metadata.py
in./data-sources/{data-source-name}/
. Then, executebuild_dataset.py
in./dataset/
.
- Scripts to be executed: First, execute
- Add a query: Store add the query to
_queries.py
in./data-sources/{data-source-name}/
. - Query metadata: The same as for "Add a new data source".
- Process metadata of unlabeled entries: The same as for "Add a new data source".
- Label unlabeled data: Almost the same as for "Add a new data source", except that the obtained annotations need to be merged with the old annotations.
- Process metadata of visualizations: The same as for "Add a new data source".
- Query metadata: The same as for "Add a new data source".
- Query images: The same as for "Add a new data source".
- Process metadata of unlabeled entries: The same as for "Add a new data source".
- Label unlabeled data: Almost the same as for "Add a new data source", except that the obtained annotations need to be merged with the old annotations.
- Process metadata of visualizations: The same as for "Add a new data source".
- Update data processing in libprocess: Similar to "Support data processing in libprocess" for "Add a new data source", except that the
processor
should be edited instead of created from scratch. - Process metadata of visualizations: The same as for "Add a new data source".
- Edit annotations: Similar to "Label unlabeled data" for "Add a new data source", except that the annotations should be edited instead of created from scratch.
- Process metadata of visualizations: The same as for "Add a new data source".
Use Black to detect code style issues and fix the issues before committing.
Thank you again for being interested in this project! You are awesome!