Takes a song and its lyrics, extracts the vocals, splits the syllables and computes a forced alignment to generate a karaoke in an Aegisub subtitles file (.ass).
Open the notebook in Google Colab to use their offered GPU resources:
The full pipeline will be completed in less than a minute in their environment.
Requirements:
uvx --from git+https://github.com/Japan7/yohane.git[cli] --python 3.11 yohane --help
Requirement: pixi
git clone https://github.com/Japan7/yohane.git
cd yohane/
pixi run yohane --help
- Yohane's syllable splitting is only optimized for Japanese lyrics at the moment
- Syllables at the end of lines are often shortened
- Forced alignment can't deal with overlapping vocals
- It is not fully accurate, you should still check and edit the result!
- Get the song and its lyrics
- Use the yohane notebook or the CLI locally to generate the karaoke file
In Aegisub:
- Load the .ass and the video
- Replace the Default style with your own
- Due to the normalization during the process, lines are lowercased and special characters have been removed: use the original lines in comments to fix the timed lines
- Subtitle > Select Lines… > check Comments and Set selection > OK and delete the selected lines
- Listen to each line and fix their End time
- Add a 1s karaoke lead-in to every line
- Iterate over each line in karaoke mode and merge/fix syllable timings