RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo (CVPR 2023)

This repository contains an official PyTorch implementaiton for training and testing an MVS depth estimation method proposed in the paper:

RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo

Changjiang Cai ^*, Pan Ji, Qingan Yan, Yi Xu

OPPO US Research Center
^* Corresponding author

🆕 Updates

11/10/2024: Official code initially released per institutional approval.
06/01/2023: RIAV-MVS paper released, see arXiv paper.

🌐 Overview

We present a learning-based approach for multi-view stereo (MVS), i.e., estimate the depth map of a reference frame using posed multi-view images. Our core idea lies in leveraging a “learning-to-optimize” paradigm to iteratively index a plane-sweeping cost volume and regress the depth map via a convolutional Gated Recurrent Unit (GRU). Besides, a pose module is leveraged to improve the relative pose among multi-view frames, and a self-attention block is applied only to the reference frame for constructing asymetrical matching volume for improved prediction.

See the comparison between ours and other SOTA baselines.

⚙️ Setup

The code has been tested with Python 3.10 and PyTorch 2.2.0 with CUDA 12.1. Assume the project is located at ~/riav-mvs. We provide the Docker file at docker/Dockerfile_oppo and the library requirements at docker/dev_requirements.txt, which will be installed when you build the docker image (see below).

Docker Environment:

Build and run the docker:

cd ~/riav-mvs/docker
sh build.sh # build docker container
# it will generate a container with tag "changjiang_cai/mvs-raft:1.0".
# You can change the tag name in this build.sh script.

sh run.sh # run the container just generated above;

## (Optional) [Useful Tips ✍]: To exit Docker container without stopping it, press Ctrl+P followed by Ctrl+Q; 
# If you want to exit the container's interactive shell session, 
# but do not want to interrupt the processes running in it,
# press Ctrl+P followed by Ctrl+Q. 
# This operation detaches the container and 
# allows you to return to your system's shell.

## (Optional) [Useful Tips ✍]: Re-enter the Container
# get the container id
docker ps
#  Run as root, e.g., if you want to install some libraries via `pip`;
#  Here `d89c34efb04a` is the container id;
docker exec -it -u 0 d89c34efb04a bash
#  Run as regular user
docker exec -it d89c34efb04a bash

## (Optional) [Useful Tips ✍]: To save the container to a new docker image;
# After the pip installation, save the container to an image. 
# You can run the following from the outside of the docker container.
docker commit -m "some notes you specified" d89c34efb04a xyz/riavmvs:1.1

💾 Datasets

RGB-D Datasets

To train/evaluate RIAV-MVS, you will need to download the required datasets.

ScanNet, for train and within-dataset evaluation. See data generation, and dataloader for train/val and evluation and benchmark. More details about how to prepare the datasets for training and evaluation, can be found here.

The following datasets are only used for evaluation to show cross-domain generalization performance.

7-Scenes, for cross-dataset evaluation. No need data reformat or export code, see dataloader.
RGB-D Scenes V2, for cross-dataset evaluation. See data generation, and dataloader for evluation.

MVS Datasets

DTU, for train and evaluation. See dataloader for train/val and evluation and benchmark.

📊 Testing and Evaluation

Pretrained Models on ScanNet

You can download the model checkpoints for our method and the baseline methods that we trained from scratch at this link.

Here we provide three varients of the pipeline for ablation, including:

V1: Base model: our proposed paradigm that iteratively indexes a plane-sweeping cost volume and regresses the depth map via a convolutional Gated Recurrent Unit (GRU).
V2: +Pose model: with a residual pose module to correct the relative poses, helping the cost volume construction at frame levels.
V3: +Pose,Atten model: this is the full model. Besides the modules seen in V1 and V2, this variant includes a transformer block applied to the reference image (but not to the source images). It breaks the symmetry of the Siamese network (which is typically used in MVS to extract image features) to construct the so-called Asymmetric Volume in our paper. It embedded both the pixel-wise local (high-frequency) features via high-pass CNNs, and the long-range global (low-frequency)context by self-attention, to store more accurate matching similarity cues.

Our pretrained models trained on ScanNet training set can be downloaded as below.

Model Variants	V1 `(Base)`	V2 `(+Pose)`	V3 `(+Pose,Atten`
YAML Config	`[1]` riavmvs_base_test.yaml	`[2]`riavmvs_pose_test.yaml	`[3]`riavmvs_full_test.yaml
Checkpoint trained on ScanNet	`[4]`riavmvs_base_epoch_002.pth.tar	`[5]`riavmvs_pose_epoch_003.pth.tar	`[6]`riavmvs_full_epoch_007.pth.tar

Finetuned Models on DTU

Our pretrained models trained on ScanNet are further finetuned on DTU training set, which can be downloaded as below. Here we skip the V2 (+Pose) model on DTU since DTU has accurate poses.

Model Variants	V1 `(Base)`	V3 `(+Pose,Atten`
YAML Config	`[7]` riavmvs_base_dtu_test.yaml	see `[3]` above
Checkpoint finetuned on DTU	`[8]`riavmvs_base_dtu_epoch_04.pth.tar	`[9]`riavmvs_full_dtu_epoch_03.pth.tar

Key Hyper-Parameters

Base model on ScanNet: Download our pretrained checkpoint shown in the table above and save it at a directory, e.g., the model [4]: checkpoints_nfs/saved/released/riavmvs_base_epoch_002.pth.tar trained on ScanNet. You can find the config YAML file at [1]: config/riavmvs_base_test.yaml. Pay attention to these parameters:

raft_mvs_type: 'raft_mvs' # mvs depth module;
pose_net_type: "none"
raft_depth_init_type: 'none'

Base model on DTU: Download our checkpoint shown in the table above and save it at a directory, e.g., the model [8]: checkpoints_nfs/saved/released/riavmvs_base_dtu_epoch_04.pth.tar finetuned on DTU. You can find the config YAML file at [7]: riavmvs_base_dtu_test.yaml. Pay attention to these parameters:

fusion_pairnet_feats: False # no feature fusion layers;
# -- mvs plane sweeping setup -- #
num_depth_bins: 96 # here we use 96 depth hypotheses planes;

+Pose model on ScanNet: Download our pretrained checkpoint and save it at a directory, e.g., the model [5]: checkpoints_nfs/saved/released/riavmvs_pose_epoch_003.pth.tar trained on ScanNet. You can find the config YAML file [2]: config/riavmvs_pose_test.yaml. Pay attention to these parameters:

raft_mvs_type: 'raft_mvs' # mvs depth module;
pose_net_type: "resnet_pose"
raft_depth_init_type: 'none'

Full model on ScanNet or DTU: Download our pretrained checkpoint and save it at a directory, e.g., the model [6]: checkpoints_nfs/saved/released/riavmvs_full_epoch_007.pth.tar trained on ScanNet and the model [9]: checkpoints_nfs/saved/released/riavmvs_pose_dtu_epoch_04.pth.tar finetuned on DTU. You can find the config YAML file at [3]: config/riavmvs_full_test.yaml. Pay attention to these parameters:

raft_mvs_type: 'raft_mvs_asyatt_f1_att' # attention to frame f1;
pose_net_type: "resnet_pose"
raft_depth_init_type: 'soft-argmin'

Evaluation on ScanNet Test Set

We do same-domain evaluation. I.e., models are trained in the training set of ScanNet (or DTU), then are evaluated in the test set of the same domain ScanNet (or DTU).

We evaluate two sampling strategies to generate the evaluation frames: 1) the simple view selection strategy (i.e., sampling by every 10 frames, resulting in 20,668 samples) as in ESTDepth CVPR'21, and 2) the keyframe selection based on heuristics as in Deep-VideoMVS CVPR'21, resulting in 25,481 samples.

Change this hyperparameter in the YAML configuration files,

scannet_eval_sampling_type: 'e-s10n3' # use this for simple sampling 
scannet_eval_sampling_type: 'd-kyn3' # use this for keyframe sampling

Model	Config	Checkpoint	Evaluation on ScanNet Test-Set (unit: meter)
			Simple Sampling			Keyframe Sampling
			Abs-rel ↓	Abs ↓	δ < 1.25 ↑	Abs-rel ↓	Abs ↓	δ < 1.25 ↑
Our (base)	see [1]	see [4]	0.0885	0.1605	0.9211	0.0843	0.1603	0.9280
Our (+pose)	see [2]	see [5]	0.0827	0.1523	0.9277	0.0790	0.1525	0.9344
Our(+pose,atten)	see [3]	see [6]	0.0734	0.1381	0.9395	0.0692	0.1362	0.9470

Here we provide three varients of our model for ablation. From the results, we can find the proposed residual pose module (see Our(base) vs Our(+pose)), and the so called asymetric attention module, which was applied to the reference view only, (see Our(+pose,atten) vs Our(+pose)), both help boost the performance.
Our(+pose,atten) is the full model, as the default one we used in most of the experiments.

Evaluation on DTU Test Set

Model	Config	Checkpoint	Evaluation on DTU Test-Set (unit: mm)
Model	Config	Checkpoint	Abs-rel ↓	Abs ↓	rmse ↓
Our (base)	see [7]	see [8]	0.0102	7.3564	19.6125
Our(+pose,atten)	see [3]	see [9]	0.0091	6.7214	18.5950

Here we provide two varients of our model for ablation. From the results, we can find the so called asymetric attention module, which was applied to the reference view only, help boost the performance.

Evaluation Script to Replicate Paper Results

Run the script to load the those checkpoints mentioned above will reproduce the results in Table 1 as shown in our paper.

./run_test_exp.sh $MODEL_NAME $DATASET $GPU_ID

The default parameters are

MODEL_NAME='OUR_RIAV_MVS_FULL'
DATASET='scannet_n3_eval'
GPU_ID=0

Baseline Models: We also provide the checkpoints for the baseline methods MVSNet, PairNet, IterMVS. We trained those baseline models fram scratch using the same training dataset and data augmentation as our proposed methods for fair comparison. You can find the YAML configuration files at config/*. Checkpoints are specified for each baseline in the bash script.

See more details in the bash script, and flexible arguments are provided for running experiments easily and protably.

⏳ Training

For network training, run the following script

./run_train_exp.sh $MODEL_NAME $DATASET $GPU_ID

The default parameters are

MODEL_NAME='OUR_RIAV_MVS_FULL'
DATASET='scannet_mea2_npz'
GPU_IDS=0,1

You can adjust other arguments at your will. More details can be found in the bash script for running experiments easily and protably.

⚖️ License

The code in this repository is licensed under MIT licence, see LICENSE.

🙏 Acknowledgements

Our work partially adopts codes from RAFT(BSD 3-Clause License), RAFT-Stereo(MIT License), DeepVideoMVS (MIT License) and GMA (WTFPL License). We also compare our method with baselines IterMVS(MIT License) and ESTDepth(MIT License) for most of the experiments. We sincerely thank those authors for making these repos available.

📑 Citations

If you find our work useful, please consider citing our paper:

@InProceedings{cai2023riavmvs,
    author    = {Cai, Changjiang and Ji, Pan and Yan, Qingan and Xu, Yi},
    title     = {RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo},
    booktitle = {CVPR},
    month     = {June},
    year      = {2023},
    pages     = {919-928}
}

Please also consider our another MVS paper if you find it useful:

@InProceedings{liu2022planemvs,
    author    = {Liu, Jiachen and Ji, Pan and Bansal, Nitin and Cai, Changjiang and Yan, Qingan and Huang, Xiaolei and Xu, Yi},
    title     = {PlaneMVS: 3D Plane Reconstruction From Multi-View Stereo},
    booktitle = {CVPR},
    month     = {June},
    year      = {2022},
    pages     = {8665-8675}
}

🛠️ Troubleshooting

We will keep updating this section as issues arise.

[1] torchvision/models/_utils.py:208 UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
- Solution: change line 26 self.encoder = resnets[num_layers](pretrained) to self.encoder = resnets[num_layers](weights="IMAGENET1K_V2" if pretrained else None), e.g., in the file third_parties/ESTDepth/hybrid_models/resnet_encoder.py.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
config		config
datasets_link		datasets_link
docker		docker
splits		splits
src		src
third_parties		third_parties
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
main.py		main.py
run_test_exp.sh		run_test_exp.sh
run_train_exp.sh		run_train_exp.sh
test.py		test.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo (CVPR 2023)

🆕 Updates

📋 Table of Contents

🌐 Overview

⚙️ Setup

Docker Environment:

💾 Datasets

RGB-D Datasets

MVS Datasets

📊 Testing and Evaluation

Pretrained Models on ScanNet

Finetuned Models on DTU

Key Hyper-Parameters

Evaluation on ScanNet Test Set

Evaluation on DTU Test Set

Evaluation Script to Replicate Paper Results

⏳ Training

⚖️ License

🙏 Acknowledgements

📑 Citations

🛠️ Troubleshooting

About

Releases

Packages

Languages

License

oppo-us-research/riav-mvs

Folders and files

Latest commit

History

Repository files navigation

RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo (CVPR 2023)

🆕 Updates

📋 Table of Contents

🌐 Overview

⚙️ Setup

Docker Environment:

💾 Datasets

RGB-D Datasets

MVS Datasets

📊 Testing and Evaluation

Pretrained Models on ScanNet

Finetuned Models on DTU

Key Hyper-Parameters

Evaluation on ScanNet Test Set

Evaluation on DTU Test Set

Evaluation Script to Replicate Paper Results

⏳ Training

⚖️ License

🙏 Acknowledgements

📑 Citations

🛠️ Troubleshooting

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages