An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
-
Updated
Apr 12, 2024 - Python
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]
An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"
Research Code for Multimodal-Cognition Team in Ant Group
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021
[arXiv] Cross-Modal Adapter for Text-Video Retrieval
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
[AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.
VTC: Improving Video-Text Retrieval with User Comments
Text from the video is extracted and saved into a .docx file in the form of notes.
Add a description, image, and links to the video-text-retrieval topic page so that developers can more easily learn about it.
To associate your repository with the video-text-retrieval topic, visit your repo's landing page and select "manage topics."