This is the official code for VISAPP 2022 paper Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics and its extended paper in Springer CCIS, Transformers in Unsupervised Structure-from-Motion.
Authors: Hemang Chawla, Arnav Varma, Elahe Arani and Bahram Zonooz.
We propose MT-SfMLearner v1 and v2(pdf) that show how transformers are more competitive and robust for monocular depth estimation.
Hardware details for original training of MT-SfMLearner (v2) can be found in respective papers.
git clone https://github.com/NeurAI-Lab/MT-SfMLearner.git
cd MT-SfMLearner
make docker-build
MT-SfMLearner (v2) is trained in a self-supervised manner from videos.
For training, utilize a .yaml
config file or a .ckpt
model checkpoint file with scripts/train.py
.
python scripts/train.py <config_file.yaml or model_checkpoint.ckpt>
Example config file to train MIMDepth can be found in configs folder.
A trained model can be evaluated by providing a .ckpt
model checkpoint.
python scripts/eval.py --checkpoint <model_checkpoint.ckpt>
For running inference on a single image or folder,
python scripts/infer.py --checkpoint <checkpoint.ckpt> --input <image or folder> --output <image or folder> [--image_shape <input shape (h,w)>]
Pretrained Models for MT-SfMLearner and MIMDepth can be found
Coming soon!
If you find the code useful in your research, please consider citing our papers:
@inproceedings{chawlavarma2022MTSfMLearnerv2, title={Transformers in Unsupervised Structure-from-Motion}, author={Chawla, Hemang and Varma, Arnav and Arani, Elahe and Zonooz, Bahram}, booktitle={International Joint Conference on Computer Vision, Imaging and Computer Graphics, Revised Selected Papers}, pages={281--303}, year={2022}, doi={10.1007/978-3-031-45725-8_14}, organization={Springer Nature} }
@inproceedings{varmachawla2022MTSfMLearner, author={A. {Varma} and H. {Chawla} and E. {Arani} and B. {Zonooz}}, booktitle={Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 4: VISAPP}, year={2022}, pages={758-769}, publisher={SciTePress}, doi={10.5220/0010884000003124}, }
This project is licensed under the terms of the MIT license.