This is a repository demonstrating transcribing podcast audio with a few different services: IBM Watson Speech to Text, Google Cloud Speech, and CMU Sphinx.
First run fetch_podcast.sh
to pull some content down for
converting. The example from my blog post is used, though you can
replace the URL if you like to try with other content.
Ensure that sphinx is installed. On Ubuntu this is done with
sudo apt-get install pocketsphinx pocketsphinx-en-us
Then run transcribe_with_sphinx.sh
. It will produce
sphinx-transcription.log
as output.
Sign up for IBM Bluemix and create a Watson Speech to Text instance.
Copy user/password into auth.cfg in the [watson] section.
Install python prereqs
pip install watson-developer-cloud
Then run it with
watson-transcribe.py podcast.flac
Optionally specify a pretrained customized language id
watson-transcribe.py --customization a13780b0-52b7-1fe7-fbb7-77471c70c949 podcast.flac
Note
It will take 30 - 45 minutes to run, it processes slightly faster than realtime.
Sign up for Google Cloud and create a project and a service key. Go through all the setup around authentication - https://googlecloudplatform.github.io/google-cloud-python/stable/google-cloud-auth.html
Create a key for the service, put that json in auth.json
in this
directory.
Install python prereqs
pip install google-cloud-storage google-cloud-speech
Then run it as
google-transcribe.py
Note
It will take 20 - 30 minutes to run, it processes slightly faster than realtime.