Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
-
Updated
Nov 8, 2024 - Jupyter Notebook
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
Repository for our Interspeech2020 general-purpose voice activity detection (GPVAD) paper
The codebase for Data-driven general-purpose voice activity detection.
Speaker change detection using SincNet and an LSTM/Transformer
Fork of the official kaldi.
A curated list of awesome voice activity detection
Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering
Detecting depressed Patient based on Speech Activity, Pauses in Speech and Using Deep learning Approach
PyAnnote Voice Activity Detection (ONNX version)
The Voxseg implementation in PyTorch. Voxseg is a python library for voice activity detection (VAD) for speech/non-speech segmentation.
Voice activity detection and speaker gender segmentation audiovisual corpus
Scoring Toolkit for the Fearless Steps Challenge Phase-02 Tasks
Lightweight speech-to-speech web-based chat app combining speech recognition, LLM completion and text-to-speech. Implemented with Python (Flask) and vanilla JavaScript only.
Add a description, image, and links to the speech-activity-detection topic page so that developers can more easily learn about it.
To associate your repository with the speech-activity-detection topic, visit your repo's landing page and select "manage topics."