The semantically interactive radiance field has always been an appealing task for its potential to facilitate user-friendly and automated real-world 3D scene understanding applications. However, it is a challenging task to achieve high quality, efficiency and zero-shot ability at the same time with semantics in radiance fields. In this work, we present FastLGS, an approach that supports real-time open-vocabulary query within 3D Gaussian Splatting (3DGS) under high resolution. We propose the semantic feature grid to save multi-view CLIP features which are extracted based on Segment Anything Model (SAM) masks, and map the grids to low dimensional features for semantic field training through 3DGS. Once trained, we can restore pixel-aligned CLIP embeddings through feature grids from rendered features for open-vocabulary queries. Comparisons with other state-of-the-art methods prove that FastLGS can achieve the first place performance concerning both speed and accuracy, where FastLGS is 98x faster than LERF and 4x faster than LangSplat. Meanwhile, experiments show that FastLGS is adaptive and compatible with many downstream tasks, such as 3D segmentation and 3D object inpainting, which can be easily applied to other 3D manipulation systems.
语义交互式辐射场一直是一个吸引人的任务,因为它有助于促进用户友好和自动化的现实世界3D场景理解应用。然而,在辐射场中同时实现高质量、高效率和零样本能力是一个挑战性任务。在这项工作中,我们提出了FastLGS,这是一种支持实时开放词汇查询的方法,适用于高分辨率下的3D高斯涂抹(3DGS)。我们提出了语义特征网格来保存基于Segment Anything Model(SAM)掩码提取的多视图CLIP特征,并通过3DGS将网格映射到低维特征进行语义场训练。一旦训练完成,我们可以通过从渲染特征中恢复的特征网格恢复像素对齐的CLIP嵌入,以进行开放词汇查询。与其他最先进方法的比较证明,FastLGS在速度和准确性方面均能取得第一名的表现,其中FastLGS比LERF快98倍,比LangSplat快4倍。同时,实验表明FastLGS能够适应并兼容许多下游任务,例如3D分割和3D对象修复,这些任务可以轻松应用于其他3D操作系统。