Skip to content

This Engine is used to add Speech to Text capabilities to Stanbol. Enhancement Results keep track of the temporal position of the extracted text within the processed media file.

Notifications You must be signed in to change notification settings

sumansaurabh/SpeechToTextEngine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speech To Text Enhancement Engine

CMUSphinx is a speaker-independent large vocabulary continuous speech recognizer released under BSD style license .

This Enhancement engine uses Sphinx4 library to convert the captured audio. Media (audio/video) data file is parsed with the ContentItem. Audio speech is than extracted by Sphinx to 'plain/text' with the annotation of temporal position of the extracted text. Sphinix uses acoustic model, dictionary model and language model to map the utterances with the text, so the engine will also provide support of uploading acoustic model and language model.

Audio file accepted by Sphinix libraries, accepts sound in following format:

Frequency: 16 kHz 
Depth: 16 bit
Type: mono
little-endian byte order

FFmpeg can be used to convert sound file in the above format ffmpeg -i input_file -acodec pcm_s16le -ar 16000 -ac 1 output.wav

Features

  1. Provide the extracted text
  2. Enhancement Results keep track of the temporal position of the extracted text within the processed media file.

Installation

  1. Install Sphinx4 OSGi bundle.

  2. Install Sphinx4 Model files

  3. Install Sphinx4 Model Provider Service

  4. Install Speech To Text Engine Bundle

    mvn install -DskipTests -PinstallBundle -Dsling.url=http://localhost:8080/system/console

Usage

Default Enhancer usage:

Acoustic Model: [EN-US Generic](http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/en-us.tar.gz/download)
Language Model: [en-us.lm.dmp](https://svn.code.sf.net/p/cmusphinx/code/trunk/sphinx4/sphinx4-data/src/main/resources/edu/cmu/sphinx/models/language/en-us.lm.dmp)
Dictionary Model: [cmudict.0.6d](https://svn.code.sf.net/p/cmusphinx/code/trunk/sphinx4/sphinx4-data/src/main/resources/edu/cmu/sphinx/models/acoustic/wsj/dict/cmudict.0.6d)

Default enhancer uses the above model to extract text from parsed sound file

Custom Enhancer usage:
Acoustic Model: Bundle name is provided as Acoustic Model files have same name for all types of bundle, stanbol.engines.speechtotext.acoustic.bundlename
Language Model: stanbol.engines.speechtotext.language.model
Dictionary Model: stanbol.engines.speechtotext.dictionary.model
Run enhancer
curl -v -X POST -H "Accept: application/rdf+xml" -H "Content-type: audio/wav" -T temp.wav "http://localhost:8090/enhancer/engine/sphinx"
Test Cases Result
  1. Sound file: temp.wav in 'test/resources'
  2. Spoken Text: 1001-90210-01803
  3. Predicted Text: one zero zero zero one, nine oh two one oh, cyril one eight zero three
Note:

Test Cases are deactivated for the engine, as Sphinx4 uses lot of memory to predict results. This might hamper installation of Stanbol bundle.

About

This Engine is used to add Speech to Text capabilities to Stanbol. Enhancement Results keep track of the temporal position of the extracted text within the processed media file.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published