Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 2.19 KB

2406.04343.md

File metadata and controls

5 lines (3 loc) · 2.19 KB

Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image

In this paper, we propose Flash3D, a method for scene reconstruction and novel view synthesis from a single image which is both very generalisable and efficient. For generalisability, we start from a "foundation" model for monocular depth estimation and extend it to a full 3D shape and appearance reconstructor. For efficiency, we base this extension on feed-forward Gaussian Splatting. Specifically, we predict a first layer of 3D Gaussians at the predicted depth, and then add additional layers of Gaussians that are offset in space, allowing the model to complete the reconstruction behind occlusions and truncations. Flash3D is very efficient, trainable on a single GPU in a day, and thus accessible to most researchers. It achieves state-of-the-art results when trained and tested on RealEstate10k. When transferred to unseen datasets like NYU it outperforms competitors by a large margin. More impressively, when transferred to KITTI, Flash3D achieves better PSNR than methods trained specifically on that dataset. In some instances, it even outperforms recent methods that use multiple views as input.

在这篇论文中,我们提出了一种名为Flash3D的方法,用于从单一图像进行场景重建和新视角合成,该方法具有很高的泛化性和效率。为了提高泛化性,我们从用于单眼深度估计的“基础”模型开始,并将其扩展为一个完整的3D形状和外观重构器。为了提高效率,我们将这一扩展基于前馈高斯涂抹。具体来说,我们在预测的深度上预测一层3D高斯,然后添加在空间中偏移的额外高斯层,允许模型完成遮挡和截断后面的重建。Flash3D非常高效,可以在一天内在单个GPU上进行训练,因此对大多数研究者来说都是可行的。在RealEstate10k上训练和测试时,它达到了业界领先的结果。当转移到未见数据集如NYU时,它的表现大大超过了竞争对手。更令人印象深刻的是,当转移到KITTI时,Flash3D比在该数据集上专门训练的方法获得了更好的PSNR。在某些情况下,它甚至超过了使用多视图作为输入的最新方法。