Gaussian Splatting (GS) has significantly elevated scene reconstruction efficiency and novel view synthesis (NVS) accuracy compared to Neural Radiance Fields (NeRF), particularly for dynamic scenes. However, current 4D NVS methods, whether based on GS or NeRF, primarily rely on camera parameters provided by COLMAP and even utilize sparse point clouds generated by COLMAP for initialization, which lack accuracy as well are time-consuming. This sometimes results in poor dynamic scene representation, especially in scenes with large object movements, or extreme camera conditions e.g. small translations combined with large rotations. Some studies simultaneously optimize the estimation of camera parameters and scenes, supervised by additional information like depth, optical flow, etc. obtained from off-the-shelf models. Using this unverified information as ground truth can reduce robustness and accuracy, which does frequently occur for long monocular videos (with e.g. > hundreds of frames). We propose a novel approach that learns a high-fidelity 4D GS scene representation with self-calibration of camera parameters. It includes the extraction of 2D point features that robustly represent 3D structure, and their use for subsequent joint optimization of camera parameters and 3D structure towards overall 4D scene optimization. We demonstrate the accuracy and time efficiency of our method through extensive quantitative and qualitative experimental results on several standard benchmarks. The results show significant improvements over state-of-the-art methods for 4D novel view synthesis.
高斯喷溅(GS)在场景重建效率和新视角合成(NVS)的准确性上相较于神经辐射场(NeRF)有了显著提高,尤其是在动态场景中。然而,当前的 4D NVS 方法,无论是基于 GS 还是 NeRF,主要依赖由 COLMAP 提供的相机参数,甚至使用 COLMAP 生成的稀疏点云进行初始化,这些方法不仅缺乏准确性,而且耗时。这有时会导致动态场景表示质量不佳,特别是在物体移动大或相机条件极端(例如,小位移配合大旋转)的场景中。一些研究同时优化相机参数和场景的估计,由额外信息(如深度、光流等)监督,这些信息来自现成模型。使用这些未经验证的信息作为真值可能会降低鲁棒性和准确性,这在长时间的单目视频(例如,超过数百帧)中经常发生。我们提出了一种新颖的方法,通过自我校准相机参数学习高保真度的 4D GS 场景表示。它包括提取稳健代表 3D 结构的 2D 点特征,以及使用这些特征对相机参数和 3D 结构进行后续的联合优化,以实现整体 4D 场景优化。我们通过在几个标准基准上进行广泛的定量和定性实验,展示了我们方法的准确性和时间效率。结果显示,与现有最先进方法相比,我们的方法在 4D 新视角合成方面取得了显著改进。