Aligning short audio using long text #344

MusiCode1 · 2024-04-14T00:27:07Z

MusiCode1
Apr 14, 2024

I have long audio files (of the reading of the Bible), and their exact transcript (the Bible, of course), and I want to split them into small files using VAD, and get the aligned transcript of each file. The files are intended to be published in Dataset for training new models.

When I put in the full transcript, and ran the alignment on the full audio file, I got good timestamps.

But when I inserted the full transcript (text of an entire Bible passage), and ran the alignment on a short audio file of a few seconds, I got the result of all the text that was in the full transcript file, and not just the words spoken in the short file.

How can you insert a short file and a long text into alignment, and receive only the text that was said in the file, and not all the text that was in the text for alignment?

jianfch · 2024-04-14T04:46:45Z

jianfch
Apr 14, 2024
Maintainer

The alignment approach in by stable-ts is not designed to align text with parts not contained in the audio however you can try remove_instant_words=True. It will only remove the remaining words in the text that failed to align (i.e. words with zero duration and it will not remove any words before the first spoken word).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aligning short audio using long text #344

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Aligning short audio using long text #344

MusiCode1 Apr 14, 2024

Replies: 1 comment

jianfch Apr 14, 2024 Maintainer

MusiCode1
Apr 14, 2024

jianfch
Apr 14, 2024
Maintainer