Colab notebooks for text-to-audio generators

User-friendly Colab notebooks for various text prompt steered synthetic audio generators.

Available notebooks:

AudioLDM – text-to-audio
TorToiSe TTS – text-to-speech w/ voice-cloning
MubertAI Text-to-Music – text-to-music
TTS Voice Cloning – text-to-speech w/ voice-cloning

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

Paper: Text-to-Audio Generation with Latent Diffusion Models

Colab for AudioLDM. Generates audio based on text description. This is probably the beginning of "Stable Diffusion of audio". Currently capable of producing 16 kHz audio only.

TorToiSe: Text-to-speech

Paper: TorToiSe - Spending Compute for High Quality TTS

Colab for TorToiSe text-to-speech voice-cloning. This notebook takes a text string and an audio file (or files) of a speaker's voice, and attempts to synthesize the text using the given voice. Currently works with English text only.

MubertAI Text-to-Music

UPDATE: it seems like Mubert API now requires (paid) API key.

Colab for MubertAI Text-to-Music. Generates music using predefined blocks created by the community (afaik) based on text description. See the source repository for information, such as licensing.

TTS Voice Cloning

Paper: Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

Colab for Real-Time-Voice-Cloning text-to-speech voice-cloning. This notebook takes a text string and an audio file of a speaker's voice, and attempt to synthesize the text using the given voice. Fair warning: results are not great.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Colab notebooks for text-to-audio generators

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

TorToiSe: Text-to-speech

MubertAI Text-to-Music

TTS Voice Cloning

Files

README.md

Latest commit

History

README.md

File metadata and controls

Colab notebooks for text-to-audio generators

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

TorToiSe: Text-to-speech

MubertAI Text-to-Music

TTS Voice Cloning