For Speech-to-Text problems, our training data consists of:
The purpose of this project is to fine tune the the automatic speech recognition model or apply the technique of transfer learning so that it can convert atypical speech (voice of people with speech impairments) into text.
We will start with the state of the art end to end speech Recognition model with high accuracy. This high quality ASR model will be trained on hundreds of hours of typical or standard speech with no impairements. After we achieve high accuracy for the end to end model, then we will start fine-tuning parts of the model to an individual with speech impairement.
So our main aproach is training a base model on a large dataset of normal speech and then training a personalised model using a much smaller slurred speech dataset. We can use tranfer learning for fine tuning parts of our base model.
The base ASR model was trained on 100 hours of Librispeech Dataset.
- Final Epoch Average Loss: 0.46
- Final Epoch Average CER: 0.10
- Final Epoch Average WER: 0.11
After we train our ASR model on hundreds of hours of typical speech, we are good to go for fine-tuning our model on impaired speech. We need to collect impaired speech dataset. We build web app using django framework to do the same.