Assuming you want good audio quality and want to avoid using YouTube for the generation of subtitles for some reason (time e.g.).
- Make a presentation. Avoid:
- Animations, make incremental slides for that
- If really necessary, you should extract the animated slides to a separate presentation later and render that out as a video.
- Type the Audio track in the slide notes. You can check overall length by running notes-to-wav.ps1 and loading the audio into your favorite music player, like foobar2000
- Export all slides as PNGs
- Use some service to generate high-quality speech from the
allnotes.txt
, like Play.ht - Use your favorite video editor to arrange the slides according to the audio track, fine tune etc.
- Render the video
- Extract the final audio track, e.g. using
ffmpeg -i <video> -vn output-audio.wav
- Feed it back into picovoice using
audio-to-srt.py
- Import the srt into the video editor
- Fix subtitles and timing
- Re-Export subtitles
- Profit.
This script generates several things for you:
Slide[\n+].txt
containing the notes text separated by slideSlide[\n+].wav
containing the text synthesized by System.Speech.Synthesis.SpeechSynthesizer - this is helpful to keep track of the total talking time mostlyallnotes.txt
the whole text in a single file
This script generates a .srt
from an audio track using the strategy described here