The Voice Cloner is a Python-based project that leverages Tacotron 2 and WaveGlow models for text-to-speech (TTS) synthesis and basic voice cloning. This project supports 22 official Indian languages, including Sanskrit, making it versatile for multilingual text input.
The Voice Cloner is a Python-based project that leverages Tacotron 2 and WaveGlow models for text-to-speech (TTS) synthesis and basic voice cloning. This project supports 22 official Indian languages, including Sanskrit, making it versatile for multilingual text input.
The project is optimized for CPU usage using pre-trained models, enabling developers and enthusiasts to quickly synthesize speech.
- Generate speech audio for the provided text.
- Supports English and Indian languages such as Hindi, Bengali, Tamil, Telugu, and Sanskrit.
- Mimics voice patterns and generates audio with similar speech characteristics.
- User-friendly command-line interface for generating and saving audio files.
Below is the organized structure of the project:
VoiceCloner/
├── data/
│ ├── samples/ # Sample audio clips for cloning
│ ├── synthesized_audio/ # Directory for storing generated audio
├── models/
│ ├── tacotron2/ # Pre-trained Tacotron 2 model for text-to-mel
│ ├── waveglow/ # Pre-trained WaveGlow model for audio synthesis
├── utils/
│ ├── __init__.py # Initializes the utils module
│ ├── text_processing.py # Cleans and preprocesses input text
│ ├── voice_cloning.py # Core logic for voice synthesis
│ ├── language_support.py # Provides support for multiple languages
├── main.py # Entry point to run the project
├── requirements.txt # Project dependencies
└── README.md # Detailed documentation (this file)
Follow these steps to set up and run the project on your local machine:
git clone https://github.com/thekartikeyamishra/VoiceCloner.git
cd VoiceCloner
Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
Install the required libraries:
pip install -r requirements.txt
Execute the CLI application:
python main.py
- Text Input: Enter any text and choose the desired language (supports 22 languages).
- Speech Synthesis: The text is processed using Tacotron 2 and converted into mel spectrograms.
- Audio Generation: The WaveGlow vocoder generates high-quality audio from the mel spectrogram.
- Save Output: The generated audio is saved in the
data/synthesized_audio/
directory as a.wav
file.
The project requires the following Python libraries:
torch
numpy
librosa
Install these dependencies using the provided requirements.txt
.
The following languages are currently supported:
- English
- Hindi
- Bengali
- Tamil
- Telugu
- Gujarati
- Malayalam
- Marathi
- Kannada
- Punjabi
- Odia
- Assamese
- Urdu
- Sindhi
- Sanskrit
Additional languages can be added in the future with phonetic support.
This is the basic version of the Voice Cloner. Future plans include:
- GUI Support: A graphical interface for ease of use.
- Advanced Voice Cloning: Speaker embedding for personalized voice synthesis.
- Support for Additional Models: Integration with FastSpeech and other synthesis models.
- Multi-Language Extensions: Support for more global languages.
Contributions are welcome! Feel free to:
- Fork the Repository.
- Create a Feature Branch.
- Submit a Pull Request with your improvements.
If you have any questions, feedback, or suggestions, feel free to reach out!
Let’s bring multilingual speech synthesis to the next level. 🚀
Star ⭐ the project if you find it useful!
git clone https://github.com/thekartikeyamishra/VoiceCloner.git