FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
1Harbin Institute of Technology, Shenzhen
2Huawei Noah's Ark Lab
†Corresponding author
- [11/2024] 🔥 Details will be released. Stay tuned.
This is the github repository of FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers. In this work, we propose the FALCON model, which introduces a novel visual register technique to simultaneously address the issues of visual redundancy and fragmentation in the high-resolution visual encoding of MLLMs.
The framework of the proposed FALCON model:
If you find this work useful for your research, please kindly cite our paper: