Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 3.09 KB

2406.02541.md

File metadata and controls

5 lines (3 loc) · 3.09 KB

Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

Recent advancements in zero-shot video diffusion models have shown promise for text-driven video editing, but challenges remain in achieving high temporal consistency. To address this, we introduce Video-3DGS, a 3D Gaussian Splatting (3DGS)-based video refiner designed to enhance temporal consistency in zero-shot video editors. Our approach utilizes a two-stage 3D Gaussian optimizing process tailored for editing dynamic monocular videos. In the first stage, Video-3DGS employs an improved version of COLMAP, referred to as MC-COLMAP, which processes original videos using a Masked and Clipped approach. For each video clip, MC-COLMAP generates the point clouds for dynamic foreground objects and complex backgrounds. These point clouds are utilized to initialize two sets of 3D Gaussians (Frg-3DGS and Bkg-3DGS) aiming to represent foreground and background views. Both foreground and background views are then merged with a 2D learnable parameter map to reconstruct full views. In the second stage, we leverage the reconstruction ability developed in the first stage to impose the temporal constraints on the video diffusion model. To demonstrate the efficacy of Video-3DGS on both stages, we conduct extensive experiments across two related tasks: Video Reconstruction and Video Editing. Video-3DGS trained with 3k iterations significantly improves video reconstruction quality (+3 PSNR, +7 PSNR increase) and training efficiency (x1.9, x4.5 times faster) over NeRF-based and 3DGS-based state-of-art methods on DAVIS dataset, respectively. Moreover, it enhances video editing by ensuring temporal consistency across 58 dynamic monocular videos.

近期在零样本视频扩散模型方面的进展显示出了以文本驱动视频编辑的潜力,但在实现高时间一致性方面仍面临挑战。为了解决这一问题,我们引入了Video-3DGS,这是一种基于3D高斯涂抹(3DGS)的视频细化器,旨在增强零样本视频编辑器的时间一致性。我们的方法采用了为编辑动态单眼视频量身定制的两阶段3D高斯优化过程。在第一阶段,Video-3DGS采用了改进版的COLMAP,称为MC-COLMAP,该技术使用遮罩和剪辑方法处理原始视频。对于每个视频片段,MC-COLMAP生成动态前景对象和复杂背景的点云。这些点云被用来初始化两组3D高斯(Frg-3DGS和Bkg-3DGS),旨在代表前景和背景视图。然后将前景和背景视图与2D可学习参数图合并以重建完整视图。在第二阶段,我们利用第一阶段开发的重建能力来对视频扩散模型施加时间约束。为了证明Video-3DGS在这两个阶段的有效性,我们在两个相关任务上进行了广泛的实验:视频重建和视频编辑。在DAVIS数据集上,与基于NeRF和基于3DGS的最先进方法相比,经过3000次迭代训练的Video-3DGS显著提高了视频重建质量(PSNR提高3,7)和训练效率(分别快1.9倍,4.5倍)。此外,它通过确保58个动态单眼视频中的时间一致性,增强了视频编辑能力。