-
Emu1 (ICLR 2024, 2023/07) - Generative Pretraining in Multimodality
-
Emu2 (CVPR 2024, 2023/12) - Generative Multimodal Models are In-Context Learners
-
Emu3 (arXiv 2024, 2024/09) - Next-Token Prediction is All You Need 🔥🔥🔥
- 2024.9 We introduce Emu3, a new suite of state-of-the-art multimodal models trained solely with next-token prediction. 🔥🔥🔥
- 2024.2 Emu1 and Emu2 are accepted by ICLR 2024 and CVPR 2024 respectively! 🎉
- 2023.12 Inference code, model and demo of Emu2 are available. Enjoy the demo.
- 2023.12 We have released Emu2, open and largest generative multimodal models that achieve new state of the art on multimodal understanding and generation tasks.
- 2023.7 Inference code and model of Emu are available.
- 2023.7 We have released Emu, a multimodal generalist that can seamlessly generate images and texts in multimodal context.
- State-of-the-art performance
- Next-generation capabilities
- A base model for diverse tasks
We hope to foster the growth of our community through open-sourcing and promoting collaboration👬. Let's step towards multimodal intelligence together🍻.
- We are hiring at all levels at BAAI Vision Team, including full-time researchers, engineers and interns.
If you are interested in working with us on foundation model, visual perception and multimodal learning, please contact Xinlong Wang (
wangxinlong@baai.ac.cn
).