Some vision langauage model you must know if you are interested in multi-modal AI. (continue)
Model | Variant | Params | Papers/Github | Demo | Last Released | Organization | Description |
---|---|---|---|---|---|---|---|
Idefics | idefics-9b | 9B | Github / Paper | Demo | |||
idefics-80b | 80B | Github / Paper | Demo | ||||
idefics-2 | 8B | Github / Paper | Demo | ||||
CogVLM | CogVLM | 17B | Github / Paper | Demo | 2023.12.26 | ||
CogAgent | 17B | Github / Paper | Demo | 2024.04.05 | |||
LLaVa | LLaVa-v1.5 | 7B | Github / Paper | Demo | 2023.10.05 | ||
LLaVa-v1.6 | 34B | Github / Paper | Demo | 2024.01.30 | |||
Qwen-VL | Qwen-VL-Plus | 7B | Github / Paper | Demo | 2023.11.28 | Alibaba Cloud | |
Qwen-VL-MAX | 7B | Github / Paper | Demo | 2024.01.18 | Alibaba Cloud | ||
Qwen-VL-Chat | 7B | Github / Paper | Demo | 2024.01.18 | Alibaba Cloud | ||
PaliGemma | paligemma-3b-mix | 3B | Github / Paper | Demo |