User-friendly Colab notebooks for various text prompt steered synthetic audio generators.
Available notebooks:
- AudioLDM – text-to-audio
- TorToiSe TTS – text-to-speech w/ voice-cloning
- MubertAI Text-to-Music – text-to-music
- TTS Voice Cloning – text-to-speech w/ voice-cloning
Paper: Text-to-Audio Generation with Latent Diffusion Models
Colab for AudioLDM. Generates audio based on text description. This is probably the beginning of "Stable Diffusion of audio". Currently capable of producing 16 kHz audio only.
Paper: TorToiSe - Spending Compute for High Quality TTS
Colab for TorToiSe text-to-speech voice-cloning. This notebook takes a text string and an audio file (or files) of a speaker's voice, and attempts to synthesize the text using the given voice. Currently works with English text only.
UPDATE: it seems like Mubert API now requires (paid) API key.
Colab for MubertAI Text-to-Music. Generates music using predefined blocks created by the community (afaik) based on text description. See the source repository for information, such as licensing.
Paper: Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Colab for Real-Time-Voice-Cloning text-to-speech voice-cloning. This notebook takes a text string and an audio file of a speaker's voice, and attempt to synthesize the text using the given voice. Fair warning: results are not great.