Skip to content

Latest commit

 

History

History

MultimodalRAGChatwithVideos

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

"Multimodal RAG: Chat with Videos" promo banner

Dear learner,

Introducing Multimodal RAG: Chat with Videos, a short course made in collaboration with Intel!

This course, taught by Vasudev Lal, Principal AI Research Scientist, Intel Labs, teaches you to build an interactive system for querying video content using multimodal AI. You'll create a sophisticated question-answering system that processes, understands, and interacts with video.

You'll learn to create a Q&A system that interacts with a collection of videos. You’ll use multimodal transformer models, like the BridgeTower model, to combine visual and textual data into a unified semantic space. You will generate embeddings from text and images and store them in a vector database. Then, you'll build a RAG pipeline to retrieve relevant content and use a Large Vision-Language Model (LVLM) to generate responses.

In this course, you will make API calls to access multimodal models hosted by Prediction Guard on Intel’s cloud.

By the end, you'll have the expertise to create AI systems that can intelligently interact with video content.

Launch email GIFs (46)

Throughout the course, you'll get hands-on and build a complete multimodal RAG system that:

  • Processes and embeds video content (frames, transcripts, and captions)
  • Stores multimodal data in a vector database
  • Retrieves relevant video segments given text queries
  • Generates contextual responses using LVLMs
  • Maintains multi-turn conversations about video content

Whether you're looking to enhance content management systems, improve accessibility features, or push the boundaries of human-AI interaction, the techniques learned in this course will provide a solid foundation for innovation in multimodal AI applications.

Details

  • Create a sophisticated question-answering system that processes, understands, and interacts with complex multimodal data.

  • Explore the concept of multimodal semantic space and its importance in AI.

  • Learn the differences between traditional RAG and multimodal RAG systems, focusing on the complexities of integrating different models.

Lesson Video Code
Introduction video
Interactive Demo and Multimodal RAG System Architecture video code
Multimodal Embeddings video code
Preprocessing Videos for Multimodal RAG video code
Multimodal Retrieval from Vector Stores video code
Large Vision - Language Models (LVLMs) video code
Multimodal RAG with Multimodal Langchain video code
Conclusion video