Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 2.48 KB

2406.06367.md

File metadata and controls

5 lines (3 loc) · 2.48 KB

MVGamba: Unify 3D Content Generation as State Space Sequence Modeling

Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors. Current works further leverage 3D Gaussian Splatting as 3D representation for improved visual quality and rendering efficiency. However, we observe that existing Gaussian reconstruction models often suffer from multi-view inconsistency and blurred textures. We attribute this to the compromise of multi-view information propagation in favor of adopting powerful yet computationally intensive architectures (\eg, Transformers). To address this issue, we introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor based on the RNN-like State Space Model (SSM). Our Gaussian reconstructor propagates causal context containing multi-view information for cross-view self-refinement while generating a long sequence of Gaussians for fine-detail modeling with linear complexity. With off-the-shelf multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts. Extensive experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only 0.1× of the model size.

最新的大型3D重建模型(LRMs)能够通过整合多视图扩散模型和可扩展的多视图重建器,在不到一秒的时间内生成高质量的3D内容。当前的工作进一步利用3D高斯涂抹作为3D表示,以提高视觉质量和渲染效率。然而,我们观察到现有的高斯重建模型常常受到多视图不一致和纹理模糊的影响。我们将此归因于为采用功能强大但计算密集的架构(例如,变压器)而妥协多视图信息传播。为解决这一问题,我们引入了MVGamba,这是一个通用且轻量级的高斯重建模型,特点是基于类似RNN的状态空间模型(SSM)的多视图高斯重建器。我们的高斯重建器传播包含多视图信息的因果上下文,用于跨视图自我精炼,同时生成长序列的高斯,用于细节建模,其复杂度为线性。通过集成现成的多视图扩散模型,MVGamba统一了来自单张图片、稀疏图片或文本提示的3D生成任务。广泛的实验表明,MVGamba在所有3D内容生成场景中均超越了最先进的基准,模型大小仅为约0.1倍。