We aim to address sparse-view reconstruction of a 3D scene by leveraging priors from large-scale vision models. While recent advancements such as 3D Gaussian Splatting (3DGS) have demonstrated remarkable successes in 3D reconstruction, these methods typically necessitate hundreds of input images that densely capture the underlying scene, making them time-consuming and impractical for real-world applications. However, sparse-view reconstruction is inherently ill-posed and under-constrained, often resulting in inferior and incomplete outcomes. This is due to issues such as failed initialization, overfitting on input images, and a lack of details. To mitigate these challenges, we introduce LM-Gaussian, a method capable of generating high-quality reconstructions from a limited number of images. Specifically, we propose a robust initialization module that leverages stereo priors to aid in the recovery of camera poses and the reliable point clouds. Additionally, a diffusion-based refinement is iteratively applied to incorporate image diffusion priors into the Gaussian optimization process to preserve intricate scene details. Finally, we utilize video diffusion priors to further enhance the rendered images for realistic visual effects. Overall, our approach significantly reduces the data acquisition requirements compared to previous 3DGS methods. We validate the effectiveness of our framework through experiments on various public datasets, demonstrating its potential for high-quality 360-degree scene reconstruction. Visual results are on our website.
我们旨在通过利用大规模视觉模型的先验来解决稀疏视角下的 3D 场景重建问题。尽管最近的进展如 3D 高斯点云(3DGS)在 3D 重建中取得了显著成功,但这些方法通常需要数百张密集捕捉底层场景的输入图像,这使得它们在实际应用中既耗时又不切实际。然而,稀疏视角重建本质上是一个病态且约束不足的问题,往往导致结果较差且不完整。这是由于初始化失败、对输入图像的过拟合和细节缺乏等问题。为缓解这些挑战,我们引入了 LM-Gaussian,这是一种能够从有限数量图像生成高质量重建的方法。具体而言,我们提出了一个强健的初始化模块,通过利用立体视觉先验来帮助恢复相机姿态和可靠的点云。此外,采用基于扩散的迭代优化来将图像扩散先验融入高斯优化过程,以保留复杂的场景细节。最后,我们利用视频扩散先验进一步增强渲染图像的真实视觉效果。总体而言,我们的方法显著减少了与之前的 3DGS 方法相比的数据获取需求。我们通过在各种公共数据集上的实验验证了框架的有效性,展示了其在高质量 360 度场景重建中的潜力。