Automatic Speech Recognition (speech-to-text)

Implementation based on Listen, Attend and Spell

The Listener (encoder) is a pyramidal recurrent network encoder that accepts filter bank spectra as inputs. The Speller (decoder) is an attention-based recurrent network decoder that emits characters as outputs. The network produces character sequences without making any independence assumptions between the characters.

Training objective: Predict the next phoneme in the sequence given the corresponding utterances (voice recordings) and transcripts.

Trained on the WSJ0 dataset

Name		Name	Last commit message	Last commit date
Latest commit History 219 Commits
models		models
.gitignore		.gitignore
README.md		README.md
character_list.py		character_list.py
data_analyzer.py		data_analyzer.py
las.png		las.png
logger.py		logger.py
model.py		model.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Speech Recognition (speech-to-text)

About

Releases

Packages

Languages

catapulta/attention-speech-recognition

Folders and files

Latest commit

History

Repository files navigation

Automatic Speech Recognition (speech-to-text)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages