Skip to content

An application that collects and preprocesses audio clips of single-word utterances found in WAV files

Notifications You must be signed in to change notification settings

c-jg/keyword-collector

Repository files navigation

Audio Keyword Collector

This is a tool to extract spoken word utterances from audio, and preprocess the clips to prepare them for machine learning algorithms. The tool can collect utterances from YouTube videos or local WAV files.

It utilizes Mozilla DeepSpeech speech-to-text models to identify spoken words. This model is used instead of other popular STT engines due to it being open source, allowing an unlimited duration of audio to be analyzed at no cost.

Features

  • Extracts and exports keyword utterances from long-form audio
  • Automatically resamples downloaded audio and converts to 1-channel
  • Allows for custom search queries to find relevant audio on YouTube

Setup

Clone repository:

git clone https://github.com/c-jg/keyword-collector.git

Create virtual environment and install dependencies:

python -m venv venv3
source venv3/bin/activate
pip install -r requirements.txt

If you want to extract keywords from YouTube videos, you will need to create an environment variable named YT_API_KEY that contains your API key for the YouTube Data API

You may also need to install the following packages on Linux:

sudo apt install libsndfile1
sudo apt install ffmpeg

Download trained DeepSpeech models: (~1.1 GB)

curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer

Run application:

python main.py

About

An application that collects and preprocesses audio clips of single-word utterances found in WAV files

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages