Recently, the dynamic scene reconstruction using Gaussians has garnered increased interest. Mainstream approaches typically employ a global deformation field to warp a 3D scene in the canonical space. However, the inherently low-frequency nature of implicit neural fields often leads to ineffective representations of complex motions. Moreover, their structural rigidity can hinder adaptation to scenes with varying resolutions and durations. To overcome these challenges, we introduce a novel approach utilizing discrete 3D control points. This method models local rays physically and establishes a motion-decoupling coordinate system, which effectively merges traditional graphics with learnable pipelines for a robust and efficient local 6-degrees-of-freedom (6-DoF) motion representation. Additionally, we have developed a generalized framework that incorporates our control points with Gaussians. Starting from an initial 3D reconstruction, our workflow decomposes the streaming 4D real-world reconstruction into four independent submodules: 3D segmentation, 3D control points generation, object-wise motion manipulation, and residual compensation. Our experiments demonstrate that this method outperforms existing state-of-the-art 4D Gaussian Splatting techniques on both the Neu3DV and CMU-Panoptic datasets. Our approach also significantly accelerates training, with the optimization of our 3D control points achievable within just 2 seconds per frame on a single NVIDIA 4070 GPU.
最近,使用高斯进行动态场景重建引起了越来越多的关注。主流方法通常采用全局变形场在典范空间中扭曲 3D 场景。然而,隐式神经场固有的低频特性常常导致对复杂运动的表示效果不佳。此外,它们的结构刚性可能妨碍对分辨率和持续时间变化的场景的适应。为了解决这些问题,我们提出了一种利用离散 3D 控制点的新方法。该方法物理建模局部光线,并建立了一个运动解耦坐标系统,能够有效地将传统图形学与可学习管道结合起来,实现强大而高效的局部 6 自由度(6-DoF)运动表示。此外,我们开发了一个通用框架,将我们的控制点与高斯相结合。从初始的 3D 重建开始,我们的工作流程将流式 4D 现实世界重建分解为四个独立的子模块:3D 分割、3D 控制点生成、对象级运动操控和残差补偿。实验表明,该方法在 Neu3DV 和 CMU-Panoptic 数据集上优于现有的最先进 4D 高斯点云技术。我们的方案还显著加快了训练速度,在单个 NVIDIA 4070 GPU 上每帧的 3D 控制点优化时间可缩短至仅 2 秒。