Robust and realistic rendering for large-scale road scenes is essential in autonomous driving simulation. Recently, 3D Gaussian Splatting (3D-GS) has made groundbreaking progress in neural rendering, but the general fidelity of large-scale road scene renderings is often limited by the input imagery, which usually has a narrow field of view and focuses mainly on the street-level local area. Intuitively, the data from the drone's perspective can provide a complementary viewpoint for the data from the ground vehicle's perspective, enhancing the completeness of scene reconstruction and rendering. However, training naively with aerial and ground images, which exhibit large view disparity, poses a significant convergence challenge for 3D-GS, and does not demonstrate remarkable improvements in performance on road views. In order to enhance the novel view synthesis of road views and to effectively use the aerial information, we design an uncertainty-aware training method that allows aerial images to assist in the synthesis of areas where ground images have poor learning outcomes instead of weighting all pixels equally in 3D-GS training like prior work did. We are the first to introduce the cross-view uncertainty to 3D-GS by matching the car-view ensemble-based rendering uncertainty to aerial images, weighting the contribution of each pixel to the training process. Additionally, to systematically quantify evaluation metrics, we assemble a high-quality synthesized dataset comprising both aerial and ground images for road scenes.
在自动驾驶仿真中,对大规模道路场景进行稳健且真实的渲染至关重要。最近,3D高斯斑点(3D-GS)在神经渲染方面取得了突破性进展,但大规模道路场景渲染的总体保真度通常受到输入图像的限制,这些图像通常具有狭窄的视野,并主要集中在街道级别的局部区域。直观上,从无人机视角获取的数据可以为地面车辆视角的数据提供补充视角,从而增强场景重建和渲染的完整性。然而,直接使用具有较大视角差异的空中图像和地面图像进行训练,对3D-GS来说会导致显著的收敛挑战,并且在道路视图上的性能提升不明显。 为了增强道路视图的新视角合成并有效利用空中信息,我们设计了一种不确定性感知训练方法,使空中图像能够帮助合成地面图像学习效果较差的区域,而不是像之前的工作那样在3D-GS训练中对所有像素进行均等加权。我们首次将交叉视图不确定性引入3D-GS,通过将汽车视角的集合基础渲染不确定性与空中图像匹配,权衡每个像素对训练过程的贡献。此外,为了系统地量化评估指标,我们组装了一个高质量合成数据集,包括道路场景的空中和地面图像。