This project is an advanced video analysis tool that generates comprehensive video synopses by leveraging state-of-the-art computer vision techniques. It provides a powerful solution for intelligent video summarization, particularly useful in surveillance, content analysis, and object tracking scenarios. It uses OWL-ViT or Florence 2 for object detection, SAM for segmentation, and a custom video synopsis algorithm to produce optimized outputs.
- Colab Notebook
- Installation
- Usage
- Running Streamlit App
- Running Main Script
- Parameters and Examples
- Features
- To-Do
- Related Work
Run the project using this Google Colab Notebook.
To install all dependencies, run:
pip install -r requirements.txt
To interactively run the project on a Streamlit-based web UI:
streamlit run ./app.py & npx localtunnel --port 8501
This will expose the Streamlit app through a localtunnel link.
Run main.py
with the following example:
python main.py \
--input_model "OWL-ViT" \
--video "/content/text2video_synopsis/all_rush_video.mp4" \
--classes "people,person" \
--epoch 100
--input_model
: Detection model to use (OWL-ViT
orFlorence-2-large
).--video
: Path to the input video file.--classes
: Object classes to detect.- For Florence: Provide a prompt sentence, e.g.,
- Simple ones
"People in the video" , "Car on the road"
- Complex ones
"People with black t-shirt" , "People with suitcase"
- Simple ones
- For OWL-ViT: Provide an OPEN_VOCABULARY_DETECTION comma-separated classes, e.g.,
"car,person,dog"
- For Florence: Provide a prompt sentence, e.g.,
--epoch
: Number of iterations for video synopsis optimization.
-
Motion Detection: Focuses processing on video segments with significant motion.
-
Object and Action Detection: Uses state-of-the-art models like Florence and OWL-ViT for object detection, and SAM for segmentation.
-
Flexible Synopsis Generation: Creates optimized video summaries based on user-defined object criteria
-
Versatile Use Cases:
- Surveillance video summarization
- Targeted object tracking
- Intelligent video content analysis
- Web UI (Streamlit App)
- Robust Video Synopsis
- Add diagram explaining the project - input(multiple images showing 24 hour cctv video) output(5 frames of output video and adding a gif for the same)