Skip to content

Video Synopsis: Intelligent Video Object Summarization using Florence/OWL-ViT and SAM. It uses OWL-ViT or Florence 2 for object detection, SAM for segmentation, and a custom video synopsis algorithm to produce optimized outputs.

License

Notifications You must be signed in to change notification settings

PranayLendave/text2video_synopsis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text2Video Synopsis

Open In Colab

This project is an advanced video analysis tool that generates comprehensive video synopses by leveraging state-of-the-art computer vision techniques. It provides a powerful solution for intelligent video summarization, particularly useful in surveillance, content analysis, and object tracking scenarios. It uses OWL-ViT or Florence 2 for object detection, SAM for segmentation, and a custom video synopsis algorithm to produce optimized outputs.

Table of Contents

  1. Colab Notebook
  2. Installation
  3. Usage
    • Running Streamlit App
    • Running Main Script
    • Parameters and Examples
  4. Features
  5. To-Do
  6. Related Work

Colab Notebook

Run the project using this Google Colab Notebook.

Installation

To install all dependencies, run:

pip install -r requirements.txt

Usage

1. Streamlit App

To interactively run the project on a Streamlit-based web UI:

streamlit run ./app.py & npx localtunnel --port 8501

This will expose the Streamlit app through a localtunnel link.

2. Running the Main Script

Run main.py with the following example:

python main.py \
  --input_model "OWL-ViT" \
  --video "/content/text2video_synopsis/all_rush_video.mp4" \
  --classes "people,person" \
  --epoch 100

Parameters and Examples

  • --input_model: Detection model to use (OWL-ViT or Florence-2-large).
  • --video: Path to the input video file.
  • --classes: Object classes to detect.
    • For Florence: Provide a prompt sentence, e.g.,
      • Simple ones "People in the video" , "Car on the road"
      • Complex ones "People with black t-shirt" , "People with suitcase"
    • For OWL-ViT: Provide an OPEN_VOCABULARY_DETECTION comma-separated classes, e.g.,
      • "car,person,dog"
  • --epoch: Number of iterations for video synopsis optimization.

Features

  1. Motion Detection: Focuses processing on video segments with significant motion.

  2. Object and Action Detection: Uses state-of-the-art models like Florence and OWL-ViT for object detection, and SAM for segmentation.

  3. Flexible Synopsis Generation: Creates optimized video summaries based on user-defined object criteria

  4. Versatile Use Cases:

    • Surveillance video summarization
    • Targeted object tracking
    • Intelligent video content analysis

To-Do

  • Web UI (Streamlit App)
  • Robust Video Synopsis
  • Add diagram explaining the project - input(multiple images showing 24 hour cctv video) output(5 frames of output video and adding a gif for the same)

Related Work

About

Video Synopsis: Intelligent Video Object Summarization using Florence/OWL-ViT and SAM. It uses OWL-ViT or Florence 2 for object detection, SAM for segmentation, and a custom video synopsis algorithm to produce optimized outputs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages