The purpose of this repo is to provide code and support for working with BenthicNet - a large dataset of benthic (seafloor) imagery. The code here may be used for hierarchical multi-label (HML) learning or for one-hot multi-class tasks. This framework also provides support for testing on frozen (non-trainable) encoders or for fine-tuning/unfrozen encoders.
To demonstrate the use of the code presented to load and train a model (pre-trained self-supervised on BenthicNet, supervised on ImageNet, or any other of supported architectures) in a typical one-hot multi-class classification task, we supply a tutorial notebook "one_hot.ipynb". The code in this notebook also reflects the main_one_hot.py Python script and can be with some minor alterations be modified to work in a hierarchical multi-label setting.
To use this repo, we recommend setting up a virtual Python environment and then installing the packages listed in requirements.txt.
pip install -r requirements.txt
Additionally, we recommend using the provided notebook "one_hot.ipynb" as a demonstration of how to prepare a model, prepare dataloaders, and train the model on a one-hot classification task. This notebook may also be easily adapted for frozen encoders and hierarchical multi-label learning.
Before running, please ensure that all paths to datasets, models, dataset CSVs, etc are correct.
- benthicnet_norms - stats for average/standard deviation on BenthicNet imagery partitions;
- cfgs - contains JSON configuration files for training models of differing architecture, probes vs fine-tuning/learning, etc;'
- graph_info - contains CSV for all BenthicNet annotations, which is used to build hierarchical graphs for HML classification;
- scripts - scripts prepared for use with SLURM scheduling on high performance clusters (HPCs);
- slurm - auxiliary scripts used for moving and extracting imagery from tarballs during HPC training/limited multi-node support; and
- utils - contains auxiliary code and inner workings for running the required learning tasks.
- ap_score-to_threshold_or_not.pdf - exerpt discussing the implications of thresholded outputs vs using sigmoid outputs for calculating average precision score.
- check_for_problem_imgs.py - a Python script for checking if there are any issues with accessing required images based on the dataset CSV;
- main.py - main Python code which is called when running hierarchical training tasks;
- main_one_hot.py - main Python code called when running one-hot multi-class classification tasks;
- one_hot.ipynb - a demonstration notebook showcasing parameters, the parts of configuration files, and how to load and train a model in a one hot multi-label classification setting;
- test_depth.ipynb - a notebook used for calculating evaluation metrics on test data (may require seaborn package as well); and
- test_depth_group.ipynb - adaptation of test_depth.ipynb to facilitate testing multiple models at once.
Note: Our models are available on FRDR and HuggingFace.
By default, if a path for an encoder checkpoint (file containing model weights) is provided, the model construction function (or its one hot variant) will automatically get the weights from the checkpoint and attempt to load them via the "load_model_state" function. If no path is provided, the model construction randomly initializes weights.
In the tutorial notebook, the contruction function is called with the path provided as an argument and details on how the model is loaded may not be a concern. However, while attempts have been made to ensure the state loading function is flexible, there are ultimately a limited number of cases that can be considered. Most notably, the function assumes that the key to accessing the state dictionary is either "state_dict" (for our provided models) or in the case of models from Facebook/Meta AI, "model". Additionally, the key within the state dictionary indicating the encoder weights is assumed to be either "encoder" or "backbone". If the keys for the state dictionary or the encoder do not fall under our assumptions, some minor editing of the "load_model_state" function may be required.
- Multi-node training is currently not fully supported. Specifically, the copy and extract shell script to move data to each node is not implemented. If this is not an issue, then by default, the DDP strategy in Lightning should support multi-node computation.
For any questions, concerns, or issues in using this repo, please contact isaac.xu@dal.ca.