Skip to content

Latest commit

 

History

History
40 lines (24 loc) · 1.01 KB

README.md

File metadata and controls

40 lines (24 loc) · 1.01 KB

Multi-modal RAG pipeline

This is a working example of a multi-modal RAG QA application that lets users upload a PDF file and asks about the content of the file. The system return a textual answer with relevant images if available.

Author: Jan Čuhel

Date:May 2024

Architecture

Pre-processing

Pre-processing phase of the multi-modal RAG pipeline

Inference

Inference phase of the multi-modal RAG pipeline

Installation

# Activate a python environment of your choice (e.g. venv, Conda)
# ...
# Install the dependencies
pip intall -r requirements.txt

Execution

python multimodal_rag_pipeline.py --source_file manual.pdf

Hardware requirements

We recommend to run this application on a device with a strong GPU as it utilizes several Deep Learning Models.

Intended use

The intended use of the system is for vehicle manuals, but it can be used for different manuals as well.