Skip to content

Multimedia Processing Application. This project integrates real-time video capture, background segmentation, audio recording, and machine learning for transcription and translation, creating a comprehensive multimodal system.

License

Notifications You must be signed in to change notification settings

bniladridas/video-recorder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Video Recorder with Background Replacement and Audio Transcription

Python OpenCV MediaPipe PyAudio Whisper Transformers Torch NumPy PIL

This project is a multimedia processing application that records video with background replacement, captures audio, and provides transcription and translation using machine learning models.

Features

  • Real-time video recording with background replacement.
  • Audio recording and transcription using OpenAI's Whisper.
  • Translation of transcribed text using Hugging Face's translation pipeline.

Setup

  1. Clone the repository:

    git clone https://github.com/bniladridas/video-recorder.git
    cd video-recorder
  2. Install dependencies:

    pip install -r requirements.txt
  3. Place your background image in the backgrounds/ folder and update the path in main.py if necessary.

  4. Run the project:

    python main.py

Usage

  • Press q to stop recording.
  • The output video will be saved as output.avi.
  • The transcription and translation will be saved in transcription.txt.

Project Structure

graph TD;
    A[video-recorder/] --> B[README.md]
    A --> C[requirements.txt]
    A --> D[main.py]
    A --> E[backgrounds/]
    E --> F[unsplash.jpg]
    A --> G[.gitignore]
Loading

Control Flow

graph TD;
    A[Start] --> B[Initialize VideoRecorder]
    B --> C[Set Background Image]
    C --> D[Start Recording]
    D --> E[Start Audio Recording Thread]
    E --> F[Capture Video Frame]
    F --> G[Apply Background Replacement]
    G --> H[Write Frame to Output]
    H --> I{Press 'q' to Stop?}
    I -->|No| F
    I -->|Yes| J[Stop Recording]
    J --> K[Join Audio Thread]
    K --> L[Release Video Capture]
    L --> M[Destroy All Windows]
    M --> N[Generate Transcription]
    N --> O[Save Transcription to File]
    O --> P[End]
Loading

Performance Considerations

  • Hardware: A good CPU and GPU are recommended for smooth real-time processing.
  • Frame Rate: The default frame rate is 20 FPS; adjust if necessary based on your hardware.
  • Audio Quality: Audio recording quality might vary based on your microphone.

Error Handling

The script suppresses specific warnings, but errors like camera or microphone access issues should be manually managed.

Dependencies

  • Python 3.8+
  • OpenCV
  • MediaPipe
  • PyAudio
  • Whisper
  • Transformers
  • Torch
  • NumPy
  • PIL

License

MIT License

Rating

⭐️⭐️⭐️⭐️⭐️⭐️⭐️⭐️⭐️⭐️ 10/10

About

Multimedia Processing Application. This project integrates real-time video capture, background segmentation, audio recording, and machine learning for transcription and translation, creating a comprehensive multimodal system.

Topics

Resources

License

Stars

Watchers

Forks

Languages