This is a repository for the paper Cross-Dimensional Refined Learning for Real-Time 3D Visual Perception from Monocular Video.
Project Page | Paper | Poster
python valscene_inference.py
The key modes can be configured under configs/inference.yaml
, where disabling MODEL.DEPTH_PREDICTION
release the model into the geometric-semantic inference mode. The geometric-semantic information has been learned by MAP optimization with the help of 2D priors.
Please consider citing our paper and give a ⭐ if you find this repository useful.
@inproceedings{hong2023cross,
author = {Hong, Ziyang and Yue, C. Patrick},
title = {Cross-Dimensional Refined Learning for Real-Time 3D Visual Perception from Monocular Video},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
month = {October},
year = {2023},
pages = {2169-2178}
}