This repository contains the open source implementation of the paper "MM-Vid: Advancing Video Understanding with GPT-4V(ision)".
The goal of this project is to advance video understanding by leveraging the capabilities of GPT-4V(ision). The implementation follows the methodologies and experiments described in the paper, providing a comprehensive framework for scene detection, video clipping, speech recognition, and generating coherent video descriptions.
To use this repository, first clone the repository and install the required dependencies.
git clone https://github.com/yongliang-wu/MM-VID.git
cd MM-VID
pip install -r requirements.txt
Then run the code
python main.py
The input of external information is not supported yet.