Text2Video Synopsis

This project is an advanced video analysis tool that generates comprehensive video synopses by leveraging state-of-the-art computer vision techniques. It provides a powerful solution for intelligent video summarization, particularly useful in surveillance, content analysis, and object tracking scenarios. It uses OWL-ViT or Florence 2 for object detection, SAM for segmentation, and a custom video synopsis algorithm to produce optimized outputs.

Colab Notebook

Run the project using this Google Colab Notebook.

Installation

To install all dependencies, run:

pip install -r requirements.txt

Usage

1. Streamlit App

To interactively run the project on a Streamlit-based web UI:

streamlit run ./app.py & npx localtunnel --port 8501

This will expose the Streamlit app through a localtunnel link.

2. Running the Main Script

Run main.py with the following example:

python main.py \
  --input_model "OWL-ViT" \
  --video "/content/text2video_synopsis/all_rush_video.mp4" \
  --classes "people,person" \
  --epoch 100

Parameters and Examples

--input_model: Detection model to use (OWL-ViT or Florence-2-large).
--video: Path to the input video file.
--classes: Object classes to detect.
- For Florence: Provide a prompt sentence, e.g.,
  - Simple ones "People in the video" , "Car on the road"
  - Complex ones "People with black t-shirt" , "People with suitcase"
- For OWL-ViT: Provide an OPEN_VOCABULARY_DETECTION comma-separated classes, e.g.,
  - "car,person,dog"
--epoch: Number of iterations for video synopsis optimization.

Features

Motion Detection: Focuses processing on video segments with significant motion.
Object and Action Detection: Uses state-of-the-art models like Florence and OWL-ViT for object detection, and SAM for segmentation.
Flexible Synopsis Generation: Creates optimized video summaries based on user-defined object criteria
Versatile Use Cases:
- Surveillance video summarization
- Targeted object tracking
- Intelligent video content analysis

To-Do

Web UI (Streamlit App)
Robust Video Synopsis
Add diagram explaining the project - input(multiple images showing 24 hour cctv video) output(5 frames of output video and adding a gif for the same)

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
static		static
supplementary		supplementary
LICENSE		LICENSE
README.md		README.md
app.py		app.py
energy.py		energy.py
main.py		main.py
requirements.txt		requirements.txt
sort.py		sort.py
tube_util.py		tube_util.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text2Video Synopsis

Table of Contents

Colab Notebook

Installation

Usage

1. Streamlit App

2. Running the Main Script

Parameters and Examples

Features

To-Do

Related Work

About

Releases

Packages

Languages

License

PranayLendave/text2video_synopsis

Folders and files

Latest commit

History

Repository files navigation

Text2Video Synopsis

Table of Contents

Colab Notebook

Installation

Usage

1. Streamlit App

2. Running the Main Script

Parameters and Examples

Features

To-Do

Related Work

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages