Skip to content

Latest commit

 

History

History
82 lines (51 loc) · 3.81 KB

README.md

File metadata and controls

82 lines (51 loc) · 3.81 KB

Speech To Text Enhancement Engine

CMUSphinx is a speaker-independent large vocabulary continuous speech recognizer released under BSD style license .

This Enhancement engine uses Sphinx4 library to convert the captured audio. Media (audio/video) data file is parsed with the ContentItem. Audio speech is than extracted by Sphinx to 'plain/text' with the annotation of temporal position of the extracted text. Sphinix uses acoustic model, dictionary model and language model to map the utterances with the text, so the engine will also provide support of uploading acoustic model and language model.

Audio file accepted by Sphinix libraries, accepts sound in following format:

Frequency: 16 kHz 
Depth: 16 bit
Type: mono
little-endian byte order

FFmpeg can be used to convert sound file in the above format ffmpeg -i input_file -acodec pcm_s16le -ar 16000 -ac 1 output.wav

Features

  1. Provide the extracted text
  2. Enhancement Results keep track of the temporal position of the extracted text within the processed media file.

Installation

  1. Install Sphinx4 OSGi bundle.

  2. Install Sphinx4 Model files

  3. Install Sphinx4 Model Provider Service

  4. Install Speech To Text Engine Bundle

    mvn install -DskipTests -PinstallBundle -Dsling.url=http://localhost:8080/system/console

Usage

Default Enhancer usage:

Acoustic Model: [EN-US Generic](http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/en-us.tar.gz/download)
Language Model: [en-us.lm.dmp](https://svn.code.sf.net/p/cmusphinx/code/trunk/sphinx4/sphinx4-data/src/main/resources/edu/cmu/sphinx/models/language/en-us.lm.dmp)
Dictionary Model: [cmudict.0.6d](https://svn.code.sf.net/p/cmusphinx/code/trunk/sphinx4/sphinx4-data/src/main/resources/edu/cmu/sphinx/models/acoustic/wsj/dict/cmudict.0.6d)

Default enhancer uses the above model to extract text from parsed sound file

Custom Enhancer usage:
Acoustic Model: Bundle name is provided as Acoustic Model files have same name for all types of bundle, stanbol.engines.speechtotext.acoustic.bundlename
Language Model: stanbol.engines.speechtotext.language.model
Dictionary Model: stanbol.engines.speechtotext.dictionary.model
Run enhancer
curl -v -X POST -H "Accept: application/rdf+xml" -H "Content-type: audio/wav" -T temp.wav "http://localhost:8090/enhancer/engine/sphinx"
Test Cases Result
  1. Sound file: temp.wav in 'test/resources'
  2. Spoken Text: 1001-90210-01803
  3. Predicted Text: one zero zero zero one, nine oh two one oh, cyril one eight zero three
Note:

Test Cases are deactivated for the engine, as Sphinx4 uses lot of memory to predict results. This might hamper installation of Stanbol bundle.