Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
configs.py	configs.py
dataset_utils.py	dataset_utils.py
evaluate_linear_models.py	evaluate_linear_models.py
linear_evaluation.py	linear_evaluation.py
make_fg_plots.py	make_fg_plots.py
make_newt_plots.py	make_newt_plots.py
newt_task_clusters.csv	newt_task_clusters.csv
plot_utils.py	plot_utils.py
pt_extract_features.py	pt_extract_features.py
pt_resnet_feature_extractor.py	pt_resnet_feature_extractor.py
requirements.txt	requirements.txt
tf_extract_features.py	tf_extract_features.py
tf_resnet.py	tf_resnet.py
tf_resnet_feature_extractor.py	tf_resnet_feature_extractor.py

Benchmarking Representation Learning for Natural World Collections

This directory contains all the resources needed to reproduce the figures and tables that are found in the CVPR 2021 paper "Benchmarking Representation Learning for Natural World Collections."

❗TODO: Data loaders for the NeWT tasks need to be updated for the public release of the data.

Python requirements

The required python modules along with the exact version that we used can be found in the requirements.txt file.

Dataset Preparation

You need to download the following datasets:

We suggest storing all the datasets in a common directory (e.g. /data/datasets).

Pretrained Model Preparation

You can download the pretrained models from the paper here (5.7GB).

Available Model Configurations

Format	Backbone	Train Dataset	Train Objective	Pretrained Weights	Identifier
pytorch	ResNet50		random		random
pytorch	ResNet50	ImageNet	Supervised		imagenet_supervised
tensorflow	ResNet50	ImageNet	Supervised		imagenet_supervised_tf
tensorflow	ResNet50	ImageNet	SimCLR		imagenet_simclr
tensorflow	ResNet50 x4	ImageNet	SimCLR		imagenet_simclr_x4
tensorflow	ResNet50	ImageNet	SimCLR v2		imagenet_simclr_v2
pytorch	ResNet50	ImageNet	SwAV		imagenet_swav
pytorch	ResNet50	ImageNet	MOCO v2		imagenet_moco_v2
pytorch	ResNet50	iNat2021	Supervised	ImageNet	inat2021_supervised
pytorch	ResNet50	iNat2021	Supervised		inat2021_supervised_from_scratch
tensorflow	ResNet50	iNat2021	SimCLR		inat2021_simclr
pytorch	ResNet50	iNat2021 Mini	Supervised	ImageNet	inat2021_mini_supervised
pytorch	ResNet50	iNat2021 Mini	Supervised		inat2021_mini_supervised_from_scratch
tensorflow	ResNet50	iNat2021 Mini	SimCLR		inat2021_mini_simclr
tensorflow	ResNet50 x4	iNat2021 Mini	SimCLR		inat2021_mini_simclr_x4
tensorflow	ResNet50	iNat2021 Mini	SimCLR v2		inat2021_mini_simclr_v2
pytorch	ResNet50	iNat2021 Mini	SwAV		inat2021_mini_swav
pytorch	ResNet50	iNat2021 Mini	SwAV		inat2021_mini_swav_1k
pytorch	ResNet50	iNat2021 Mini	MOCO v2		inat2021_mini_moco_v2
pytorch	ResNet50	iNat2018	Supervised	ImageNet	inat2018_supervised

Create user_configs.py

You need to create a user_configs.py in the benchmark/ directory that specifies paths to the various dataset directories and pretrained model directories:

################
# Adjust the following paths for your local setup

# Datasets
NEWT_DATASET_DIR = '/data/datasets/newt/'
FG_DATASETS = {
    'CUB' : '/data/datasets/CUB_200_2011/CUB_200_2011/',
    'CUBExpert' : '/data/datasets/CUB_200_2011/CUB_200_2011/',
    'NABirds' : '/data/datasets/nabirds/',
    'OxfordFlowers' : '/data/datasets/oxford_flowers/',
    'StanfordDogs' : '/data/datasets/stanford_dogs/',
    'StanfordCars' : '/data/datasets/stanford_cars/',
}

# Pretrained Model Directories
PYTORCH_PRETRAINED_MODELS_DIR = '/data/models/cvpr21_newt_pretrained_models/pt/'
TENSORFLOW_PRETRAINED_MODELS_DIR = '/data/models/cvpr21_newt_pretrained_models/tf/'

Reproduce the experiments in the paper

Run the following scripts from within the benchmark/ directory.

Extract features from the datasets using tensorflow models (estimated time ~10 hours):

$ CUDA_VISIBLE_DEVICES=0 TF_CPP_MIN_LOG_LEVEL=3 python tf_extract_features.py \
--newt_feature_dir newt_features \
--fg_feature_dir fg_features \
--batch_size 64 \
--x4_batch_size 16

Extract features from the datasets using pytorch models (estimated time ~3 hours):

$ CUDA_VISIBLE_DEVICES=0 python pt_extract_features.py \
--newt_feature_dir newt_features \
--fg_feature_dir fg_features \
--batch_size 64

Evaluate linear models on the pre-extracted NeWT features (estimated time ~3 hours (02:48:31)):

$ python evaluate_linear_models.py \
--feature_dir newt_features \
--result_dir newt_results_linearsvc_1000_standardize_noramlize_grid_search \
--model linearsvc \
--max_iter 1000 \
--standardize \
--normalize \
--grid_search

Evaluate linear models on the pre-extracted FG Datasets features (estimated time ~10 hours (09:36:28)):

$ python evaluate_linear_models.py \
--feature_dir fg_features \
--result_dir fg_results_sgd_3000_standardize_noramlize_grid_search \
--model sgd \
--max_iter 3000 \
--standardize \
--normalize \
--grid_search

Create the FG datasets stem plot and latex table:

$ python make_fg_plots.py \
--result_dir fg_results_sgd_3000_standardize_noramlize_grid_search \
--output_dir figures_v2

Create the NeWT tasks stem plot and latex table:

$ python make_newt_plots.py \
--result_dir newt_results_linearsvc_1000_standardize_noramlize_grid_search \
--output_dir figures_v2

Cite the Paper

@inproceedings{vanhorn2021newt,
  title={Benchmarking Representation Learning for Natural World Collections},
  author={Van Horn, Grant and Cole, Elijah and Beery, Sara and Wilber, Kimberly and Belongie, Serge and Mac Aodha, Oisin},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  year={2021}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark

benchmark

README.md

Benchmarking Representation Learning for Natural World Collections

Python requirements

Dataset Preparation

Pretrained Model Preparation

Available Model Configurations

Create user_configs.py

Reproduce the experiments in the paper

Cite the Paper

Files

benchmark

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmark

Folders and files

parent directory

README.md

Benchmarking Representation Learning for Natural World Collections

Python requirements

Dataset Preparation

Pretrained Model Preparation

Available Model Configurations

Create user_configs.py

Reproduce the experiments in the paper

Cite the Paper