- Summary of related single- and multi-modal pre-training surveys. SC and DC denotes Single Column and Double Column.
Paper | Link | Year | Publication | Topic | Pages |
---|---|---|---|---|---|
[01] A short survey of pre-trained language models for conversational ai-a new age in nlp. |
[Paper] | 2020 | ACSWM | NLP | DC, 4 |
[02] A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models. |
[Paper] | 2022 | arXiv | NLP | SC, 34 |
[03] A Survey of Knowledge Enhanced Pre-trained Models. |
[Paper] | 2021 | arXiv | KE | DC, 20 |
[04] A Survey of Knowledge-Intensive NLP with Pre-Trained Language Models. |
[Paper] | 2022 | arXiv | KE | DC, 8 |
[05] Commonsense Knowledge Reasoning and Generation with Pre-trained Language Models: A Survey. |
[Paper] | 2022 | arXiv | KE | DC, 11 |
[06] A survey on contextual embeddings. |
[Paper] | 2020 | arXiv | NLP | DC, 13 |
[07] Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. |
[Paper] | 2021 | arXiv | NLP | SC, 46 |
[08] Pre-trained Language Models in Biomedical Domain: A Systematic Survey. |
[Paper] | 2021 | arXiv | NLP | SC, 46 |
[09] Pre-trained models for natural language processing: A survey. |
[Paper] | 2020 | SCTS | NLP | DC, 26 |
[10] Pre-Trained Models: Past, Present and Future. |
[Paper] | 2021 | AI Open | NLP, CV, MM | DC, 45 |
[11] Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey. |
[Paper] | 2021 | arXiv | NLP | DC, 49 |
[12] A Survey of Vision-Language Pre-Trained Models. |
[Paper] | 2022 | arXiv | MM | DC, 9 |
[13] Survey: Transformer based video-language pre-training. |
[Paper] | 2022 | AI Open | CV | DC, 13 |
[14] Vision-Language Intelligence: Tasks, Representation Learning, and Large Models. |
[Paper] | 2022 | arXiv | MM | DC, 19 |
[15] A survey on vision transformer. |
[Paper] | 2022 | TPAMI | CV | DC, 23 |
[16] Transformers in vision: A survey. |
[Paper] | 2021 | CSUR | CV | SC, 38 |
[17] A Survey of Visual Transformers. |
[Paper] | 2021 | arXiv | CV | DC, 21 |
[18] Video Transformers: A Survey. |
[Paper] | 2022 | arXiv | CV | DC, 24 |
[19] Threats to Pre-trained Language Models: Survey and Taxonomy. |
[Paper] | 2022 | arXiv | NLP | DC, 8 |
[20] A survey on bias in deep NLP. |
[Paper] | 2021 | AS | NLP | SC, 26 |
[21] A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models. |
[Paper] | 2022 | arXiv | NLP | SC, 34 |
[22] An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-Trained Language Models. |
[Paper] | 2021 | arXiv | NLP | DC, 21 |
[23] A multi-layer bidirectional transformer encoder for pre-trained word embedding: A survey of BERT. |
[Paper] | 2020 | CCDSE | NLP | DC, 5 |
[24] Survey of Pre-trained Models for Natural Language Processing. |
[Paper] | 2021 | ICEIB | NLP | DC, 4 |
[25] A Roadmap for Big Model. |
[Paper] | 2022 | arXiv | NLP, CV, MM | SC, 200 |
[26] Vision-and-Language Pretrained Models: A Survey. |
[Paper] | 2022 | IJCAI | MM | DC, 8 |
[27] Multimodal Learning with Transformers: A Survey. |
[Paper] | 2022 | arXiv | MM | DC, 23 |
[28] MM-LLMs: Recent Advances in MultiModal Large Language Models. |
[Paper] | 2024 | arXiv | MM | DC, 22 |
-
The (R)Evolution of Multimodal Large Language Models: A Survey, Davide Caffagni, Federico Cocchi, Luca Barsellotti, Nicholas Moratelli, Sara Sarto, Lorenzo Baraldi, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara [Paper]
-
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models, Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, Lifang He, Lichao Sun [Paper]
-
On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models, Xinpeng Wang, Shitong Duan, Xiaoyuan Yi, Jing Yao, Shanlin Zhou, Zhihua Wei, Peng Zhang, Dongkuan Xu, Maosong Sun, Xing Xie [Paper]
-
[arXiv:2405.10739] Efficient Multimodal Large Language Models: A Survey, Yizhang Jin, Jian Li, Yexin Liu, Tianjun Gu, Kai Wu, Zhengkai Jiang, Muyang He, Bo Zhao, Xin Tan, Zhenye Gan, Yabiao Wang, Chengjie Wang, Lizhuang Ma [Paper] [Code]
-
The Evolution of Multimodal Model Architectures, Shakti N. Wadekar, Abhishek Chaurasia, Aman Chadha, Eugenio Culurciello [Paper]
-
[arXiv:2405.17247] An Introduction to Vision-Language Modeling, Florian Bordes et al. [Paper]
-
[arXiv:2406.09385] Towards Vision-Language Geo-Foundation Model: A Survey, Yue Zhou, Litong Feng, Yiping Ke, Xue Jiang, Junchi Yan, Xue Yang, Wayne Zhang [Paper] [Github]
-
[arXiv:2407.15017] Knowledge Mechanisms in Large Language Models: A Survey and Perspective, Mengru Wang, Yunzhi Yao, Ziwen Xu, Shuofei Qiao, Shumin Deng, Peng Wang, Xiang Chen, Jia-Chen Gu, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang [Paper]
-
[arXiv:2408.01319] A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks, Jiaqi Wang, Hanqi Jiang, Yiheng Liu, Chong Ma, Xu Zhang, Yi Pan, Mengyuan Liu, Peiran Gu, Sichen Xia, Wenjun Li, Yutong Zhang, Zihao Wu, Zhengliang Liu, Tianyang Zhong, Bao Ge, Tuo Zhang, Ning Qiang, Xintao Hu, Xi Jiang, Xin Zhang, Wei Zhang, Dinggang Shen, Tianming Liu, Shu Zhang [Paper]