Project: Utterance to word Splitting

This project develops a new database (containing audio files with a single word in it) from a standard speech corpus. So, the python scripts (in this repository) take a speech corpus (e.g. LibriSpeech) and split the audio files (containing utterances) into audio files (each containing a single word). Hence, the output of the scripts will audio files corresponding to one word per file.

General steps to be followed:

Run Google cloud speech-to-text or any standard ASR API on the speech corpus
Calculate the utterance level WER by comparing the obtained hypothesis with the available ground-truth transcript.

If WER==0:
1. Perform utterance-to-word trimming based on time-tags
2. Store the meaningful meta-Info from the original dataset, such as speaker ID, speaker gender, etc.

Possible challenges

For most spoken languages, the boundaries between lexical units are difficult to identify. There is hardly any pauses in between two words.
Inter-word spaces used by many written languages, would rarely correspond to pauses in their spoken version. Only in case of very slow speech, the speaker deliberately inserts those pauses.
In normal speech, one typically finds many consecutive words being said with no pauses between them, and often the final sounds of one word blend smoothly or fuse with the initial sounds of the next word (Co-articulation effect).

Instruction to the LibriSpeech users:

The script is by default written for LibriSpeech corpus. If you want to use some other corpus, please check config.py and instructions given below
Set the LibriSpeech corpus path in config.py
Set the path of Google-API key which is required to run Google Cloud ASR.
Use the conda environment: environment_utt_to_words.yaml and then run the top-level script main_utt_to_words.py

Start your shell (terminal):

$ conda env create -f environment_utt_to_words.yaml

$ source activate utterance_to_word

$ python3 main_utt_to_words.py

Instruction to the users who want to use corpus other than LibriSpeech:

First of all, set the path_to_corpus in config.py
Set the speaker_meta_info_file_name and path of the same in config.py.
Set the path of Google-API key (in config.py) which is required to run Google Cloud ASR.
There are two customized functions in corpus_functions.py, namely extract_reference_transcript and find_speaker_gender. You need to modify them as per corpus requirement.
If the file-formats and sampling rate of the audio files in your corpus are different from LibriSpeech, then please change the following portion in run_ASR method in transcribe.py: config = types.RecognitionConfig( encoding=enums.RecognitionConfig.AudioEncoding.FLAC, sample_rate_hertz=16000, language_code='en-US', enable_word_time_offsets=True)

For queries and further information:

please contact: samuisuman@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
output		output
LICENSE		LICENSE
README.md		README.md
Utterance_to_Words_Splitting.pdf		Utterance_to_Words_Splitting.pdf
audio_splitting.py		audio_splitting.py
audio_transcribe.py		audio_transcribe.py
config.py		config.py
corpus_functions.py		corpus_functions.py
environment_utt_to_words.yaml		environment_utt_to_words.yaml
main_utt_to_words.py		main_utt_to_words.py
utility_functions.py		utility_functions.py
wer_calculator.py		wer_calculator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project: Utterance to word Splitting

General steps to be followed:

Possible challenges

Instruction to the LibriSpeech users:

Instruction to the users who want to use corpus other than LibriSpeech:

For queries and further information:

© 2020 Suman Samui All Rights Reserved

About

Releases

Packages

Languages

License

sumansamui/Utterance_to_Words_Splitting

Folders and files

Latest commit

History

Repository files navigation

Project: Utterance to word Splitting

General steps to be followed:

Possible challenges

Instruction to the LibriSpeech users:

Instruction to the users who want to use corpus other than LibriSpeech:

For queries and further information:

© 2020 Suman Samui All Rights Reserved

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages