Multi-modal RAG pipeline

This is a working example of a multi-modal RAG QA application that lets users upload a PDF file and asks about the content of the file. The system return a textual answer with relevant images if available.

Author: Jan Čuhel

Date:May 2024

Architecture

Pre-processing

Inference

Installation

# Activate a python environment of your choice (e.g. venv, Conda)
# ...
# Install the dependencies
pip intall -r requirements.txt

Execution

python multimodal_rag_pipeline.py --source_file manual.pdf

Hardware requirements

We recommend to run this application on a device with a strong GPU as it utilizes several Deep Learning Models.

Intended use

The intended use of the system is for vehicle manuals, but it can be used for different manuals as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Multi-modal RAG pipeline

Architecture

Pre-processing

Inference

Installation

Execution

Hardware requirements

Intended use

Files

README.md

Latest commit

History

README.md

File metadata and controls

Multi-modal RAG pipeline

Architecture

Pre-processing

Inference

Installation

Execution

Hardware requirements

Intended use