Skip to content

Latest commit

 

History

History
86 lines (59 loc) · 5.75 KB

CONTRIBUTING.md

File metadata and controls

86 lines (59 loc) · 5.75 KB

Contributing

Thanks for being interested in contributing to this project!

Development

Setup

Clone this repo to your local machine. Make sure poetry is installed on your machine.

Install the dependencies.

poetry install

Contributing

An enhancement that does not change the dataset

Feel free to enhance the existing functions and documentation.

A change to the dataset

There are some notes on making changes to the dataset:

  • To make a change to the dataset, you typically need to commit to multiple repositories in the oldvis project.
  • Before you start working, it's better to open an issue to discuss first.

The following lists some common scenario and steps of making changes to the dataset.

Add a new data source

  1. Support data querying in libquery: Add a querier to query metadata and images from the new data source.
  2. Support data processing in libprocess: Add a processor (that extends the querier) to process metadata.
  3. Add data querying and processing scripts to this repository
  4. Query metadata: Execute fetch_metadata.py in ./data-sources/{data-source-name}/ to query metadata.
  5. Query images: Execute fetch_images.py in ./data-sources/{data-source-name}/ to query images.
  6. Process metadata of unlabeled entries: Execute process_unlabeled_metadata.py in ./data-sources/{data-source-name}/.
  7. Label unlabeled data: Label unlabeled processed metadata entries in in ./data-sources/{data-source-name}/output/metadata-processed/unlabeled.json.
    • Which labeling tool to use: You may use the image classification labeler in the oldvis project. Other image labeling tool may also serve the purpose, as long as the annotations are stored in the Annotation data structure
    • Where to store the annotations: The annotations should be stored in ./data-sources/{data-source-name}/output/annotations/annotations.json of this repository. The annotations and the process for obtaining the annotations should also be stored in ./image-classification/{data-source-name}/ of oldvis/annotations for reproducibility.
  8. Process metadata of visualizations: Execute the data processing scripts.
    • Scripts to be executed: First, execute process_vis_metadata.py in ./data-sources/{data-source-name}/. Then, execute build_dataset.py in ./dataset/.

Add a new query to an existing data source

  1. Add a query: Store add the query to _queries.py in ./data-sources/{data-source-name}/.
  2. Query metadata: The same as for "Add a new data source".
  3. Process metadata of unlabeled entries: The same as for "Add a new data source".
  4. Label unlabeled data: Almost the same as for "Add a new data source", except that the obtained annotations need to be merged with the old annotations.
  5. Process metadata of visualizations: The same as for "Add a new data source".

Re-run old queries to fetch data source update

  1. Query metadata: The same as for "Add a new data source".
  2. Query images: The same as for "Add a new data source".
  3. Process metadata of unlabeled entries: The same as for "Add a new data source".
  4. Label unlabeled data: Almost the same as for "Add a new data source", except that the obtained annotations need to be merged with the old annotations.
  5. Process metadata of visualizations: The same as for "Add a new data source".

Update metadata processing

  1. Update data processing in libprocess: Similar to "Support data processing in libprocess" for "Add a new data source", except that the processor should be edited instead of created from scratch.
  2. Process metadata of visualizations: The same as for "Add a new data source".

Update annotations

  1. Edit annotations: Similar to "Label unlabeled data" for "Add a new data source", except that the annotations should be edited instead of created from scratch.
  2. Process metadata of visualizations: The same as for "Add a new data source".

Code Style

Use Black to detect code style issues and fix the issues before committing.

Thanks

Thank you again for being interested in this project! You are awesome!