Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation

Francesco Taioli; Stefano Rosa; Alberto Castellini, Lorenzo Natale, Alessio Del Bue, Alessandro Farinelli, Marco Cristani, Yiming Wang

Accepted to IROS 24

Project Page (Paper, Code and Dataset)

contact: francesco.taioli@polito.it

Important

Consider citing our paper:

  @article{taioli2024mind,
  title={{Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation}},
  author={Taioli, Francesco and Rosa, Stefano and Castellini, Alberto and Natale, Lorenzo and Del Bue, Alessio and Farinelli, Alessandro and Cristani, Marco and Wang, Yiming},
  journal={arXiv preprint arXiv:2403.10700},
  year={2024},
  url={https://arxiv.org/abs/2403.10700}
  }

Abstract

Vision-and-Language Navigation in Continuous Environments (VLN-CE) is one of the most intuitive yet challenging embodied AI tasks. Agents are tasked to navigate towards a target goal by executing a set of low-level actions, following a series of natural language instructions. All VLN-CE methods in the literature assume that language instructions are exact. However, in practice, instructions given by humans can contain errors when describing a spatial environment due to inaccurate memory or confusion. Current VLN-CE benchmarks do not address this scenario, making the state-of-the-art methods in VLN-CE fragile in the presence of erroneous instructions from human users. For the first time, we propose a novel benchmark dataset that introduces various types of instruction errors considering potential human causes. This benchmark provides valuable insight into the robustness of VLN systems in continuous environments. We observe a noticeable performance drop (up to -25%) in Success Rate when evaluating the state-of-the-art VLN-CE methods on our benchmark. Moreover, we formally define the task of Instruction Error Detection and Localization, and establish an evaluation protocol on top of our benchmark dataset. We also propose an effective method, based on a cross-modal transformer architecture, that achieves the best performance in error detection and localization, compared to baselines. Surprisingly, our proposed method has revealed errors in the validation set of the two commonly used datasets for VLN-CE, i.e., R2R-CE and RxR-CE, demonstrating the utility of our technique in other tasks.

Download BEVBert weights ckpt.iter9600.pth [link] in ckptfolder. Can also be done with gdown (must be installed with pip install gdown). This model is the best BEVBert model ckpts, to be downloaded only if you want train IEDL from scratch. Otherwise, you can skip this step and download IEDL
```
gdown --fuzzy [link]
```
Download IEDL (TODO)
```
gdown --fuzzy [link]
```
Download the waypoint predictor check_cwp_bestdist_hfov90 [link] for CE (continuous environment) and place it in data/wp_pred
```
gdown --fuzzy [link]
```

Download the task dataset - R2RIE-CE from gdrive, and place it under data/datasets/

cd data/datasets
gdown --fuzzy https://drive.google.com/file/d/1GbypzvkiQ-e8M2I77UU5YDIZXi1sHkC3/view?usp=sharing
unzip R2RIE_CE_1_3_v1.zip; rm -rf R2RIE_CE_1_3_v1.zip

Download gibson-2plus-resnet50.pth [link] and place in a folder of your choice.
```
wget [link]
```

Then, set the path of this .pth in MODEL.DEPTH_ENCODER.ddppo_checkpoint in eval and train scripts.

How to run

For training: Go to run_R2RIE-CE/train.bash and set a folder name to save your checkpoints. To do that, set the variale WANDB_RUN_NAME. Then, copy the original BEVBert ckpt - ckpt/ckpt.iter9600.pth - in that folder and run the following command:

CUDA_VISIBLE_DEVICES="0,1" bash run_R2RIE-CE/train.bash 2333

For evaluation:

CUDA_VISIBLE_DEVICES="0,1" bash run_R2RIE-CE/eval.bash 2333

Docs

See the documentation on how to use the dataset (changing sensor, update task definition, ecc) in the docs folder.

Acknowledge

Our implementation is inspired by BEVBert.

Thanks for open sourcing this great work!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
ckpt		ckpt
data		data
docs		docs
habitat_extensions		habitat_extensions
run_R2RIE-CE		run_R2RIE-CE
vlnce_baselines		vlnce_baselines
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation

Accepted to IROS 24

Abstract

Table of contents

Setup

Install dependencies

Download models and task dataset

How to run

Docs

Acknowledge

About

Releases

Packages

Languages

intelligolabs/R2RIE-CE

Folders and files

Latest commit

History

Repository files navigation

Mind the Error! Detection and Localization of Instruction Errors in Vision-and-Language Navigation

Accepted to IROS 24

Abstract

Table of contents

Setup

Install dependencies

Download models and task dataset

How to run

Docs

Acknowledge

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages