Skip to content

Latest commit

 

History

History
329 lines (277 loc) · 11.7 KB

GETTING_STARTED.md

File metadata and controls

329 lines (277 loc) · 11.7 KB

Getting Started


Before you play on any V-IRL agents and benchmarks, please make sure you have follow INSTALL.md to prepare environments and models.

1. Launch UI backend

Launch the UI backend for potential image displaying

python -m virl.ui.server

2. Run V-IRL Agents

  • Follow the provided commands to play with various V-IRL agents.
  • Important Note: Since our V-IRL bases on online data sources (i.e., Google Map Platform), the obtained data and images would change as time goes by. Also, as the evolution of GPT, the behavior of Agents might be different to our paper.

Peng

  • Follow the commands below to run Peng.
cd tools
python launcher.py --cfg_file cfgs/peng/peng.yaml
  • If you want to modify the waypoints, please modify the WAY_POINT_LANGUAGE in the config.

Aria

  • Follow the commands below to run Aria that recommend places using Google Place reviews.
cd tools
python launcher.py --cfg_file cfgs/aria/aria.yaml
  • Or you can run Aria that recommend places using web search.
python launcher.py --cfg_file cfgs/aria/aria_websearch.yaml
  • Notice that Aria might recommend different restaurants as Google Map Platform providing different nearby search results as time goes by.

Vivek

  • Follow the commands below to run Vivek estate recommendation.
cd tools
python launcher.py --cfg_file cfgs/vivek/vivek.yaml

RX399

  • Follow the commands below to run RX-399 trash bin counting on New York City,
cd tools
python launcher.py --cfg_file cfgs/rx399/rx399_ny.yaml
  • or Hong Kong
python launcher.py --cfg_file cfgs/rx399/rx399_hk.yaml
  • Note: We find that the Google Map Platform returns different street view images now. Please refer to these images to check results in our previous attempts.

Imani

  • Follow the commands below to run Imani trash bin & hydrant & bench distribution statistic on Central Park, New York City.
cd tools
python launcher.py --cfg_file cfgs/imani/imani.yaml
  • Note: Imani will take a huge amount of Google Map Platform credits, around $300-$400.

  • We provide our collected heatmap data here.

Hiro

  • Follow the commands below to attempt Hiro intentional exploring in Hong Kong street.
cd tools
python launcher.py --cfg_file cfgs/hiro/hiro.yaml

Local

  • In our paper, Ling and Local are collaborative agents, but they can also run individually.
  • Follow the commands below to let Local generate navigation instructions according to a question.
cd tools
# Example of MoMA design store in NYC
python launcher.py --cfg_file cfgs/local/local_nyc_case1.yaml
# Example of Apple store in San Francisco
python launcher.py --cfg_file cfgs/local/local_sf_iphone.yaml 

Ling

  • Follow the commands below to run Ling navigation according to instructions in the city street.
cd tools
# Example of MoMA design store in NYC
python launcher.py --cfg_file cfgs/ling/ling_nyc_moma.yaml
# Example of Apple store in San Francisco
python launcher.py --cfg_file cfgs/ling/ling_sf_applestore.yaml
  • Note: For better reproduction cases in our paper, we provide previously generated routes.

Diego

  • Follow the commands below to get Diego's Itinerary for you.
cd tools
python launcher.py --cfg_file cfgs/diego/diego.yaml
  • By default, we disable taking user language & status adjustment. To open, please modify the following keys in the config to True:
USER_INPUT: False # take user input for revising plan or not
USER_STATUS: False # take user status for revising plan or not

3. V-IRL Benchmark

3.1. Download our pre-collected data

  • We pre-collect place-centric image data for V-IRL Place Recognition & VQA Benchmark and routes for V-IRL Vision-Language Navigation Benchmark.

3.1.1 V-IRL Place Recognition & VQA Benchmark data

  • Please download our collected data for V-IRL Place Recognition & VQA benchmark here.

  • Move the downloaded .zip file to /YOUR_PATH_TO_VIRL/VIRL/data/benchmark/ and then

mv virl_place_recognition_vqa_data.zip /YOUR_PATH_TO_VIRL/VIRL/data/benchmark/
cd /YOUR_PATH_TO_VIRL/VIRL/data/benchmark/
unzip virl_place_recognition_vqa_data.zip
  • After prepared, the folder structure should be
.
├── benchmark_localization_polygon_area
├── benchmark_polygon_area
├── place_centric_data  # obtained by unzip virl_place_recognition_vqa_data.zip
├── place_types_20.txt
└── place_types.txt

3.1.2 V-IRL Place Recognition & VQA Benchmark data

  • Please download our collected data for V-IRL Vision-Language Navigation Benchmark full set and mini set.

  • Move the downloaded .zip file to /YOUR_PATH_TO_VIRL/VIRL/data/benchmark/ and then

mv virl_benchmark_vln_full.zip /YOUR_PATH_TO_VIRL/VIRL/data/benchmark/
mv virl_benchmark_vln_mini.zip /YOUR_PATH_TO_VIRL/VIRL/data/benchmark/
cd /YOUR_PATH_TO_VIRL/VIRL/data/benchmark/
unzip virl_benchmark_vln_full.zip
unzip virl_benchmark_vln_mini.zip
  • After prepared, the folder structure should be
.
├── benchmark_localization_polygon_area
├── benchmark_polygon_area
├── collect_vln_routes  # obtained by unzip virl_benchmark_vln_full.zip
├── collect_vln_routes_subset9  # obtained by unzip virl_benchmark_vln_mini.zip
├── place_centric_data  # obtained by unzip virl_place_recognition_vqa_data.zip
├── place_types_20.txt
└── place_types.txt

3.2. Collect your own data (optional)

  • As mentioned in our paper, we create an automatic data curation and annotation pipeline in VIRL. If you just want to test some models in V-IRL benchmark, you can just pass this part. Nevertheless, if you want to collect your own data as V-IRL benchmark, you can refer to this section.

3.2.1 V-IRL Place Recognition benchmark

  • Here, we first need to collect place-centric images and related place information as follows:
cd tools
python launcher.py --cfg_file cfgs/collect_data/collect_place_centric_data.yaml
  • The place-centric image and place information will also be used in V-IRL Place VQA benchmark.

3.2.2 V-IRL Place VQA benchmark

  • First, please make sure you have already collect place-centric images follow the section 3.2.1 . Then, run the following code to generate VQA pairs based on place-centric images and information
python launcher.py --cfg_file cfgs/collect_data/generate_place_vqa_data.yaml

3.2.3 V-IRL Place Localization benchmark

  • For this benchmark, we do not pre-collect any data but do online image fetching and evaluation, so the detector can use multiple fov and heading angles on each GPS location.

3.2.4 V-IRL Vision-Language Navigation benchmark

  • To collect VLN routes, please run the following scripts:
python scripts/batch_collect_vln_routes.py

3.3. Install & Run benchmarks

  • Important Note: For each method/model, you should refer to its corresponding repo to prepare the environment.

3.3.1 V-IRL Place Recognition benchmark

  • The benchmarked methods of this benchmark are mainly implemented in virl/perception/recognizer/.

  • Take CLIP as an example, run the following script to benchmark it:

python launcher.py --cfg_file cfgs/benchmark/recognition/clip.yaml
  • Configs for other models lie in tools/cfgs/benchmark/recognition/*

3.3.2 V-IRL Place VQA benchmark

  • The benchmarked methods of this benchmark are mainly implemented in virl/perception/mm_llm/.
  • Take BLIP2 as an example, run the following script to benchmark it:
python launcher.py --cfg_file cfgs/benchmark/vqa/place_centric_vqa.yaml
  • Modify the following keys in cfgs/benchmark/vqa/place_centric_vqa.yaml to switch between different models:
VISION_MODELS:
  MiniGPT4:
    SERVER: http://127.0.0.1:xxxx # modify to your own address
    BEAM_SEARCH: 1
    TEMPERATURE: 1.0

  MiniGPT4Local:
    PATH: /xxx/MiniGPT-4  # modify to your path
    GPU_ID: 0
    CFG_PATH: /xxx/MiniGPT-4/eval_configs/minigpt4_eval.yaml  # modify to your path

  InstructBLIPLocal:
    MODEL_NAME: blip2_t5_instruct
    MODEL_TYPE: flant5xxl

    MIN_LENGTH: 1
    MAX_LENGTH: 250
    BEAM_SIZE: 5
    LENGTH_PENALTY: 1.0
    REPETITION_PENALTY: 1.0
    TOP_P: 0.9
    SAMPLING: "Beam search"

...
PIPELINE:
  ...
  VQA:
    MM_LLM: BLIP2 # modify the model name

3.3.3 V-IRL Place Localization benchmark

  • The benchmarked methods of this benchmark are mainly implemented in virl/perception/detector/.
  • To run a single detector GLIP as an example
python launcher.py --cfg_file cfgs/benchmark/localization/place_loc.yaml
  • Modify the following keys in cfgs/benchmark/localization/place_loc.yaml to switch between different models:
...
VISION_MODELS:
  ...
  GLIP_CLIP:
    GLIP:
      SERVER: http://xxx.xxx.xxx.xxx:xxxx  # modify to your address
      THRESH: 0.4
    CLIP:
      SERVER: http://xxx.xxx.xxx.xxx:xxxx  # modify to your address
      THRESH: 0.8
      TEMPERATURE: 100.
  
  GroundingDINO:
    CFG_FILE: /xxx/GroundingDINO/groundingdino/config/GroundingDINO_SwinB_cfg.py  # modify to your path
    CKPT_FILE: /xxx/GroundingDINO/weights/groundingdino_swinb_cogcoor.pth  # modify to your path
    BOX_THRESHOLD: 0.35
    TEXT_THRESHOLD: 0.25
...
PIPELINE:
  ...
  CHECK_SURROUNDING:
    ...
    DETECT:
      NAME: GLIP  # modify here to switch benchmarked methods
      PROPOSAL_SCORES: 0.55  # useless for GLIP_CLIP
  • To run all models on all regions with a single command, please run:
python scripts/batch_benchmark_place_loc.py

3.3.4 V-IRL Vision-Language Navigation benchmark

  • The benchmarked methods of this benchmark are mainly implemented in virl/perception/recognizer/.

  • Take oracle as an example

python launcher.py --cfg_file cfgs/benchmark/vln/benchmark_vln_oracle.yaml
  • To evaluate on different vision models, please fill in the following keys in cfgs/benchmark/vln/benchmark_vln_oracle.yaml
VISION_MODELS:
  ...
  PaddleOCR:
    DET_MODEL_DIR: /xxx/PaddleOCR/ckpt/ch_PP-OCRv4_det_server_infer  # modify to your path
    REC_MODEL_DIR: /xxx/PaddleOCR/ckpt/ch_PP-OCRv4_rec_server_infer  # modify to your path
    CLS_MODEL_DIR: /xxx/PaddleOCR/ckpt/ch_ppocr_mobile_v2.0_cls_slim_infer  # modify to your path
    USE_ANGLE_CLS: True
    PROMPT: ocr_result_to_recognition_template
    MODEL: gpt-3.5-turbo-0613
  
  EvaCLIP:
    MODEL_NAME: EVA02-CLIP-bigE-14-plus
    MODEL_PATH: /xxx/EVA/EVA-CLIP/rei  # modify to your path

  LLaVA:
    MODEL_PATH: /xxx/LLaVA/llava-v1.5-13b  # modify to your path
    LOAD_8BIT: False
    LOAD_4BIT: False
  • Configs for other models lie in tools/cfgs/benchmark/vln/*.

  • To run single models (oracle as an example here) on all regions with a single command, please run:

# mini set
python scripts/batch_benchmark_vln.py \
--split_file ../data/benchmark/benchmark_polygon_area/split_list_9.txt \
--route_dir_base ../data/benchmark/collect_vln_routes_subset9 \
--cfg_file cfgs/benchmark/vln/benchmark_vln_oracle.yaml
# full set
python scripts/batch_benchmark_vln.py \
--split_file ../data/benchmark/benchmark_polygon_area/split_list_14.txt \
--route_dir_base ../data/benchmark/collect_vln_routes \
--cfg_file cfgs/benchmark/vln/benchmark_vln_oracle.yaml

3.4 Add custom models

Please stay tuned for the tutorials.