This paper introduces OpenGaussian, a method based on 3D Gaussian Splatting (3DGS) capable of 3D point-level open vocabulary understanding. Our primary motivation stems from observing that existing 3DGS-based open vocabulary methods mainly focus on 2D pixel-level parsing. These methods struggle with 3D point-level tasks due to weak feature expressiveness and inaccurate 2D-3D feature associations. To ensure robust feature presentation and 3D point-level understanding, we first employ SAM masks without cross-frame associations to train instance features with 3D consistency. These features exhibit both intra-object consistency and inter-object distinction. Then, we propose a two-stage codebook to discretize these features from coarse to fine levels. At the coarse level, we consider the positional information of 3D points to achieve location-based clustering, which is then refined at the fine level. Finally, we introduce an instance-level 3D-2D feature association method that links 3D points to 2D masks, which are further associated with 2D CLIP features. Extensive experiments, including open vocabulary-based 3D object selection, 3D point cloud understanding, click-based 3D object selection, and ablation studies, demonstrate the effectiveness of our proposed method.
本文介绍了一种基于3D高斯涂抹(3DGS)的方法,名为OpenGaussian,能够实现3D点级开放词汇理解。我们的主要动机来源于观察到现有基于3DGS的开放词汇方法主要关注2D像素级解析。这些方法在3D点级任务上挣扎,原因是特征表达能力弱和2D-3D特征关联不准确。为了确保稳健的特征表现和3D点级理解,我们首先使用不涉及跨帧关联的SAM掩模来训练具有3D一致性的实例特征。这些特征展示了对象内部的一致性和对象间的区别。然后,我们提出了一个两阶段码本,用于将这些特征从粗到细的层次进行离散化。在粗层次上,我们考虑了3D点的位置信息,实现基于位置的聚类,然后在细层次上进行细化。最后,我们引入了一个实例级的3D-2D特征关联方法,将3D点与2D掩模相连,这些掩模进一步与2D CLIP特征相关联。包括基于开放词汇的3D对象选择、3D点云理解、点击式3D对象选择和消融研究在内的广泛实验,证明了我们提出方法的有效性。