Speech To Text Enhancement Engine
CMUSphinx is a speaker-independent large vocabulary continuous speech recognizer released under BSD style license .
This Enhancement engine uses Sphinx4 library to convert the captured audio. Media (audio/video) data file is parsed with the ContentItem. Audio speech is than extracted by Sphinx to 'plain/text' with the annotation of temporal position of the extracted text. Sphinix uses acoustic model, dictionary model and language model to map the utterances with the text, so the engine will also provide support of uploading acoustic model and language model.
Audio file accepted by Sphinix libraries, accepts sound in following format:
Frequency: 16 kHz
Depth: 16 bit
Type: mono
little-endian byte order
FFmpeg can be used to convert sound file in the above format ffmpeg -i input_file -acodec pcm_s16le -ar 16000 -ac 1 output.wav
- Provide the extracted text
- Enhancement Results keep track of the temporal position of the extracted text within the processed media file.
-
Install Sphinx4 OSGi bundle.
-
Install Sphinx4 Model files
-
Install Sphinx4 Model Provider Service
-
Install Speech To Text Engine Bundle
mvn install -DskipTests -PinstallBundle -Dsling.url=http://localhost:8080/system/console
Default Enhancer usage:
Acoustic Model: [EN-US Generic](http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/en-us.tar.gz/download)
Language Model: [en-us.lm.dmp](https://svn.code.sf.net/p/cmusphinx/code/trunk/sphinx4/sphinx4-data/src/main/resources/edu/cmu/sphinx/models/language/en-us.lm.dmp)
Dictionary Model: [cmudict.0.6d](https://svn.code.sf.net/p/cmusphinx/code/trunk/sphinx4/sphinx4-data/src/main/resources/edu/cmu/sphinx/models/acoustic/wsj/dict/cmudict.0.6d)
Default enhancer uses the above model to extract text from parsed sound file
Acoustic Model: Bundle name is provided as Acoustic Model files have same name for all types of bundle, stanbol.engines.speechtotext.acoustic.bundlename
Language Model: stanbol.engines.speechtotext.language.model
Dictionary Model: stanbol.engines.speechtotext.dictionary.model
curl -v -X POST -H "Accept: application/rdf+xml" -H "Content-type: audio/wav" -T temp.wav "http://localhost:8090/enhancer/engine/sphinx"
- Sound file: temp.wav in 'test/resources'
- Spoken Text: 1001-90210-01803
- Predicted Text: one zero zero zero one, nine oh two one oh, cyril one eight zero three
Test Cases are deactivated for the engine, as Sphinx4 uses lot of memory to predict results. This might hamper installation of Stanbol bundle.