The official repository of the NeurIPS 2022 paper: Towards Versatile Embodied Navigation. VXN is the abbreviation of Vision-$X$ Navigation, a large-scale test bed for multi-task embodied navigation.
Hanqing Wang | Wei Liang | Luc Van Gool | Wenguan Wang
Create a python environment and install the required packages using the following scripts:
conda create -n vxn --file requirements.txt
conda activate vxn
Create the folder for datasets using the following scripts:
mkdir data
Download the matterport3D dataset following the instruction here.
Download datasets for image-goal
nav., audio-goal
nav., object-goal
nav., and vision-language
nav. tasks, and uncompressed it under the path data/datasets
.
- Download the rendered BRIRs (binaural room impulse responses) (887G) for Matterport3D scenes here. Put
data.mdb
under the pathdata/binaural_rirs_lmdb/
. - Download the alignment data (505G) for discrete BRIRs here. Put
data.mdb
under the pathdata/align_lmdb/
.
For a multi-node cluster, run the following script to start the training.
bash sbatch_scripts/sub_job.sh
Run the following script to evaluate the trained model for each task.
bash sbatch_scripts/eval_mt.sh
- Release the full dataset.
- Release the checkpoints.
If you find our project useful, please consider citing us:
@inproceedings{wang2022towards,
title = {Towards Versatile Embodied Navigation},
author = {Wang, Hanqing and Liang, Wei and Van Gool, Luc and Wang, Wenguan},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2022}
}
The VXN codebase is MIT licensed. Trained models and task datasets are considered data derived from the mp3d scene dataset. Matterport3D based task datasets and trained models are distributed with Matterport3D Terms of Use and under CC BY-NC-SA 3.0 US license.
This repository is built upon the following publicly released projects:
Thanks to the authors who create those great prior works.