【NeurIPS 2024】Dense Connector for MLLMs

Huanjin Yao^1,3, Wenhao Wu^2✉️, Taojiannan Yang⁴, Yuxin Song³, Mengxi Zhang³, Haocheng Feng³, Yifan Sun³,

Zhiheng Li¹, Wanli Ouyang⁵, Jingdong Wang³

¹Tsinghua University, ²The University of Sydney, ³Baidu, ⁴AWS AI Labs, ⁵CUHK

^*Equal Contribution, ^✉️Corresponding Author

News

[2024/09/26] 🎉 Our Dense Connector has been accepted by NeurIPS-2024.
[2024/06/17] 🔥 The Dense Connector with LLaVA-NeXT is coming! By combining the Dense Connector with dynamic high-resolution (i.e., the AnyRes technology in LLaVA-NeXT), we have further enhanced model performance and demonstrated the broader application range of the Dense Connector. Using only the llava-1.5 dataset, the Dense Connector surpassed LLaVA-NeXT on several benchmarks. Check out its performance in the Model Zoo! The eval code and model are released!
[2024/05/30] We are grateful to ZeroGPU org and Merve for providing ZeroGPUs, which have enabled us to build a Online Demo!
[2024/05/24] Special thanks to @_akhaliq for promptly sharing our work on Twitter!
[2024/05/24] We relase Dense Connector in arxiv! The code and models are now open source! Seamlessly integrate and enhance LLaVA-1.5 and Mini-Gemini!

Overview

We introduce the Dense Connector - a simple, effective, and plug-and-play vision-language connector that significantly enhances existing MLLMs by leveraging multi-layer visual features, with minimal additional computational overhead! We hope that this work will provide valuable experience and serve as a basic module for future MLLM development!

The Dense Connector utilizes multi-layer visual features to enhance visual representation and augment the visual perception capabilities of the Multimodal Large Language Models (MLLMs) which can be easily integrated into the current MLLMs. We provide three instantiation methods of Dense Connector: Sparse Token Integration (STI), Sparse Channel Integration (SCI), and Dense Channel Integration (DCI). The Dense Channel Integration achieves the best results.

Installation

Please follow the instructions below to install the required packages.

Clone this repository

git clone https://github.com/HJYao00/DenseConnector.git
cd DenseConnector

Install Package

conda create -n dc python=3.10 -y
conda activate dc
cd DenseConnector
pip install --upgrade pip 
pip install -e .

Install additional packages for training Dense Connector

pip install ninja
pip install flash-attn --no-build-isolation

Dataset Preparation and Training

Please refer to the document for dataset preparation and training.

Evaluation

We evaluate the Dense Connector across 19 diverse benchmarks, including 11 image benchmarks and 8 video benchmarks. The testing procedures for both images and videos can be found here.

Model Zoo

Please visit our Model Zoo to access all publicly available Dense Connector checkpoints. We scale the LLM from 2.7B to 70B, incorporating the latest open-source large language model, Llama3-8B-Instruct & Llama3-70B-Instruct

Dialogue Example

We provide several dialogue examples, with additional results available in the paper.

Citation

If you find this repository is useful, please consider star🌟 this repo and cite🖇️ our paper.

@article{yao2024dense,
  title={Dense Connector for MLLMs},
  author={Yao, Huanjin and Wu, Wenhao and Yang, Taojiannan and Song, YuXin and Zhang, Mengxi and Feng, Haocheng and Sun, Yifan and Li, Zhiheng and Ouyang, Wanli and Wang, Jingdong},
  journal={Advances in Neural Information Processing Systems},
  year={2024}
}

Acknowledgment

We extend our gratitude to the open-source efforts of LLaVA, Mini-Gemini and FreeVA.

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
dc		dc
docs		docs
images		images
scripts		scripts
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

【NeurIPS 2024】Dense Connector for MLLMs

Huanjin Yao^1,3, Wenhao Wu^2✉️, Taojiannan Yang⁴, Yuxin Song³, Mengxi Zhang³, Haocheng Feng³, Yifan Sun³,

Zhiheng Li¹, Wanli Ouyang⁵, Jingdong Wang³

¹Tsinghua University, ²The University of Sydney, ³Baidu, ⁴AWS AI Labs, ⁵CUHK

^*Equal Contribution, ^✉️Corresponding Author

News

Contents

Overview

Installation

Dataset Preparation and Training

Evaluation

Model Zoo

Dialogue Example

Citation

Acknowledgment

About

Releases

Packages

Contributors 2

Languages

License

HJYao00/DenseConnector

Folders and files

Latest commit

History

Repository files navigation

【NeurIPS 2024】Dense Connector for MLLMs

Huanjin Yao1,3*, Wenhao Wu2*✉️, Taojiannan Yang4, Yuxin Song3, Mengxi Zhang3, Haocheng Feng3, Yifan Sun3, Zhiheng Li1, Wanli Ouyang5, Jingdong Wang3 1Tsinghua University, 2The University of Sydney, 3Baidu, 4AWS AI Labs, 5CUHK *Equal Contribution, ✉️Corresponding Author

News

Contents

Overview

Installation

Dataset Preparation and Training

Evaluation

Model Zoo

Dialogue Example

Citation

Acknowledgment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Huanjin Yao^1,3, Wenhao Wu^2✉️, Taojiannan Yang⁴, Yuxin Song³, Mengxi Zhang³, Haocheng Feng³, Yifan Sun³,

Zhiheng Li¹, Wanli Ouyang⁵, Jingdong Wang³

¹Tsinghua University, ²The University of Sydney, ³Baidu, ⁴AWS AI Labs, ⁵CUHK

^*Equal Contribution, ^✉️Corresponding Author

Packages