Skip to content

HonzaCuhel/multimodal-rag-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-modal RAG pipeline

This is a working example of a multi-modal RAG QA application that lets users upload a PDF file and asks about the content of the file. The system return a textual answer with relevant images if available.

Author: Jan Čuhel

Date:May 2024

Architecture

Pre-processing

Pre-processing phase of the multi-modal RAG pipeline

Inference

Inference phase of the multi-modal RAG pipeline

Installation

# Activate a python environment of your choice (e.g. venv, Conda)
# ...
# Install the dependencies
pip intall -r requirements.txt

Execution

python multimodal_rag_pipeline.py --source_file manual.pdf

Hardware requirements

We recommend to run this application on a device with a strong GPU as it utilizes several Deep Learning Models.

Intended use

The intended use of the system is for vehicle manuals, but it can be used for different manuals as well.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages