FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation

This repository is the official implementation of FrozenSeg introduced in the paper:

FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation

Abstract

Open-vocabulary segmentation is challenging, with the need of segmenting and recognizing objects for an open set of categories in unconstrained environments. Building on the success of powerful vision-language (ViL) foundation models like CLIP, recent efforts sought to harness their zero-short capabilities to recognize unseen categories. Despite demonstrating strong performances, they still face a fundamental challenge of generating precise mask proposals for unseen categories and scenarios, resulting in inferior segmentation performance eventually. To address this, we introduce a novel approach, FrozenSeg, designed to integrate spatial knowledge from a localization foundation model (e.g., SAM) and semantic knowledge extracted from a ViL model (e.g., CLIP), in a synergistic framework. Taking the ViL model's visual encoder as the feature backbone, we inject the space-aware feature into learnable query and CLIP feature in the transformer decoder. In addition, we devise a mask proposal ensemble strategy for further improving the recall rate and mask quality. To fully exploit pre-trained knowledge while minimizing training overhead, we freeze both foundation models, focusing optimization efforts solely on a light transformer decoder for mask proposal generation – the performance bottleneck. Extensive experiments show that FrozenSeg advances state-of-the-art results across various segmentation benchmarks, trained exclusively on COCO panoptic data and tested in a zero-shot manner.

Dependencies and Installation

See installation instructions.

Getting Started

See Preparing Datasets.

See Getting Started.

Models

	ADE20K(A-150)				Cityscapes			Mapillary Vistas		BDD 100K		A-847		PC-459		PAS-21		Lvis	COCO (training dataset)			download
	PQ	mAP	mIoU	FWIoU	PQ	mAP	mIoU	PQ	mIoU	PQ	mIoU	mIoU	FWIoU	mIoU	FWIoU	mIoU	FWIoU	APr	PQ	mAP	mIoU
FrozenSeg (ResNet50x64)	23.1	13.5	30.7	56.6	45.2	28.9	56.0	18.1	27.7	12.9	46.2	11.8	52.8	18.7	60.1	82.3	92.1	23.5	55.7	47.4	65.4	checkpoint
FrozenSeg (ConvNeXt-Large)	25.9	16.4	34.4	59.9	45.8	28.4	56.8	18.5	27.3	19.3	52.3	14.8	51.4	19.7	60.2	82.5	92.1	25.6	56.2	47.3	65.5	checkpoint

Citing

If you use FrozenSeg in your research, please use the following BibTeX entry.

@misc{FrozenSeg,
  title={FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation},
  author={Xi Chen and Haosen Yang and Sheng Jin and Xiatian Zhu and Hongxun Yao},
  publisher={arXiv:5835590},
  year={2024}
}

Acknowledgement

Detectron2, Mask2Former, Segment Anything, OpenCLIP and FC-CLIP.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs/coco		configs/coco
datasets		datasets
demo		demo
frozenseg		frozenseg
images		images
logs/testing		logs/testing
segment_anything		segment_anything
.gitignore		.gitignore
GETTING_STARTED.md		GETTING_STARTED.md
INSTALL.md		INSTALL.md
README.md		README.md
requirements.txt		requirements.txt
save_sam_masks.py		save_sam_masks.py
train_net.py		train_net.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation

Abstract

Dependencies and Installation

Getting Started

Models

Citing

Acknowledgement

About

Releases 1

Packages

Languages

chenxi52/FrozenSeg

Folders and files

Latest commit

History

Repository files navigation

FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation

Abstract

Dependencies and Installation

Getting Started

Models

Citing

Acknowledgement

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages