Sim2Sim-VLNCE

Official implementation of the ECCV 2022 Oral paper: Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments

Jacob Krantz and Stefan Lee

[Project Page] [Paper]

Setup

This project is modified from the VLN-CE repository starting from this commit.

Initialize the project

git clone --recurse-submodules git@github.com:jacobkrantz/Sim2Sim-VLNCE.git
cd Sim2Sim-VLNCE

conda env create -f environment.yml
conda activate sim2sim

Install the latest version of Matterport3DSimulator

If you do not want to run experiments with known subgoal candidates, you can skip this install and remove code references to MatterSim.

Download the Matterport3D scene meshes

# run with python 2.7
python download_mp.py --task habitat -o data/scene_datasets/mp3d/
# Extract to: ./data/scene_datasets/mp3d/{scene}/{scene}.glb

download_mp.py must be obtained from the Matterport3D project webpage.

Download the Room-to-Room episodes in VLN-CE format (link)

gdown https://drive.google.com/uc?id=1T9SjqZWyR2PCLSXYkFckfDeIs6Un0Rjm
# Extract to: ./data/datasets/R2R_VLNCE_v1-3/{split}/{split}.json.gz

Download the ResNet image encoder

./scripts/download_caffe_models.sh
# this populates ./data/caffe_models/

Download the MP3D connectivity graphs

./scripts/download_connectivity.sh
# this populates ./connectivity/

Evaluating Recurrent-VLN-BERT Models

We evaluate a discrete VLN agent at various points of transfer to continuous environments. The two model components that enable this are the subgoal generation module and the navigation module, illustrated below:

This repository supports the following evaluations of Recurrent-VLN-BERT. The checkpoint to evaluate can be specified by appending EVAL_CKPT_PATH_DIR path/to/checkpoint.pth to the run command.

Known Subgoals

Known subgoals candidates come from the MP3D-Sim navigation graph, just like discrete VLN. The following experiments consider different policies for navigating to selected subgoals.

Teleportation: the discrete VLN task in Habitat

python run.py --exp-config sim2sim_vlnce/config/graph-teleport.yaml

Oracle policy: an A$^*$-based navigator

python run.py --exp-config sim2sim_vlnce/config/graph-oracle_policy.yaml

Local policy: a realistic map-and-plan navigator

python run.py --exp-config sim2sim_vlnce/config/graph-local_policy.yaml

Predicted Subgoals

Predicted subgoals from the subgoal generation module (SGM)

python run.py --exp-config sim2sim_vlnce/config/sgm-local_policy.yaml

inference for leaderboard submissions

python run.py \
  --run-type inference \
  --exp-config sim2sim_vlnce/config/sgm-local_policy-inference.yaml

All experiment configs are set for a GPU with 32GB of RAM. For smaller cards, consider reducing the field RL.POLICY.OBS_TRANSFORMS.RESNET_CANDIDATE_ENCODER.max_batch_size and IL.batch_size if necessary.

Training VLN Models

Training Recurrent-VLN-BERT should be done in that repository. Other panorama-based VLN agents could also be transferred with this Sim2Sim method but are not currently supported.

To train with 3D reconstruction image features, either download them from here (habitat-ResNet-152-places365.tsv) or generate them yourself:

# ~4.5 hours on a 32GB Tesla V100 GPU.
python scripts/precompute_features.py
  [-h]
  [--caffe-prototxt CAFFE_PROTOTXT]
  [--caffe-model CAFFE_MODEL]
  [--save-to SAVE_TO]
  [--connectivity CONNECTIVITY]
  [--scenes-dir SCENES_DIR]
  [--batch-size BATCH_SIZE]
  [--gpu-id GPU_ID]

By default, the exact same Caffe ResNet as used in MP3D-Sim is used. We use these features to train both the VLN agent and the SGM. They are a drop-in replacement to the image features captured in MP3D-Sim under the name ResNet-152-places365.tsv as described in that README.

Fine-Tuning in Continuous Environments

Collect trajectories of optimal SGM selections

python run.py \
  --run-type collect \
  --exp-config sim2sim_vlnce/config/collect_ftune_data.yaml

Fine-tune the VLN agent

python run.py \
  --run-type train \
  --exp-config sim2sim_vlnce/config/train_vln_ftune.yaml

Subgoal Generation Module (SGM)

We use the vln-sim2real-envs repository (specifically the /actions/ folder) to train the SGM. We use the 3D reconstruction image features described above and train with 360${^\circ}$ vision.

Model Downloads

VLN weights [zip]. Extracted format: ./data/models/{Model-Name}

VLN Model	Model Name	Descritption
1	`RecVLNBERT.pth`	Published weights from Recurrent-VLN-BERT
2	`RecVLNBERT_retrained.pth`	Weights when we retrained it ourselves
3	`RecVLNBERT-ce_vision.pth`	Trained with 3D reconstruction image features
4	`RecVLNBERT-ce_vision-tuned.pth`	Fine-tunes row 3 in VLN-CE (leaderboard model)

SGM weights [zip]. Extracted format: ./data/sgm_models/{Model-Name}

SGM Model	Model Name	Descritption
1	`sgm_sim2real.pth`	Published weights from VLN Sim2Real
2	`sgm_sim2sim.pth`	360$^{\circ}$ vision and 3D reconstruction image features

License

Our code is MIT licensed. Trained models are considered data derived from the Matterport3D scene dataset and are distributed according to the Matterport3D Terms of Use.

Related Works

1st Place Solutions for RxR-Habitat Vision-and-Language Navigation Competition Dong An, Zun Wang, Yangguang Li, Yi Wang, Yicong Hong, Yan Huang, Liang Wang, Jing Shao. arXiv 2022

Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation Yicong Hong, Zun Wang, Qi Wu, Stephen Gould. CVPR 2022

Waypoint Models for Instruction-guided Navigation in Continuous Environments Jacob Krantz, Aaron Gokaslan, Dhruv Batra, Stefan Lee, Oleksandr Maksymets. ICCV 2021

Sim-to-Real Transfer for Vision-and-Language Navigation Peter Anderson, Ayush Shrivastava, Joanne Truong, Arjun Majumdar, Devi Parikh, Dhruv Batra, Stefan Lee. CoRL 2021

Citing

@inproceedings{krantz2022sim2sim
  title={Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments},
  author={Krantz, Jacob and Lee, Stefan},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
habitat-lab @ d6ed1c0		habitat-lab @ d6ed1c0
habitat_extensions		habitat_extensions
scripts		scripts
sim2sim_vlnce		sim2sim_vlnce
transformers @ 067923d		transformers @ 067923d
.editorconfig		.editorconfig
.flake8		.flake8
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sim2Sim-VLNCE

Setup

Evaluating Recurrent-VLN-BERT Models

Known Subgoals

Predicted Subgoals

Training VLN Models

Fine-Tuning in Continuous Environments

Subgoal Generation Module (SGM)

Model Downloads

License

Related Works

Citing

About

Languages

License

jacobkrantz/Sim2Sim-VLNCE

Folders and files

Latest commit

History

Repository files navigation

Sim2Sim-VLNCE

Setup

Evaluating Recurrent-VLN-BERT Models

Known Subgoals

Predicted Subgoals

Training VLN Models

Fine-Tuning in Continuous Environments

Subgoal Generation Module (SGM)

Model Downloads

License

Related Works

Citing

About

Resources

License

Stars

Watchers

Forks

Languages