gemini-2-tts

AI-Powered Podcast Generator: A Python-based tool that converts text scripts into realistic audio podcasts using Google's Generative AI API. This project leverages advanced text-to-speech technology to create dynamic, multi-speaker conversations with customizable voices.

Features:

Text-to-speech conversion using Google's Generative AI
Support for multiple speakers with distinct voices
Automatic audio file generation and combination
Customizable voice selection
Robust error handling and retry mechanisms

Prerequisites:

Python 3.8 or higher
FFmpeg installed and accessible in system PATH
Google API key for Generative AI services

System Dependencies:

Windows:

Microsoft Visual C++ 14.0 or greater
FFmpeg

Linux:

sudo apt-get install portaudio19-dev python3-dev ffmpeg

macOS:

brew install portaudio ffmpeg

Installation:

Clone the repository:

git clone https://github.com/agituts/gemini-2-tts.git
cd gemini-2-tts

Create and activate virtual environment:

For Windows:

python -m venv venv
.\venv\Scripts\activate

For Linux/MacOS:

python3 -m venv venv
source venv/bin/activate

Install required Python packages:

pip install -r requirements.txt

Create a .env file in the project root:

GOOGLE_API_KEY=your_google_api_key_here
VOICE_A=Puck    # Optional: Default is Puck; Current options are Puck, Charon, Kore, Fenrir, Aoede
VOICE_B=Kore    # Optional: Default is Kore; Current options are Puck, Charon, Kore, Fenrir, Aoede

Note: To deactivate the virtual environment when you're done, simply run:

deactivate

Project Structure:

podcast_script.txt: Contains the conversation script in the format:

Speaker A: Welcome to our podcast! Today we'll be discussing...
Speaker B: Thanks for having me! I'm excited to...
Speaker A: Let's start with...
Speaker B: That's an interesting point...

system_instructions.txt: Contains system-level instructions for voice generation in the format:

You are a real-time energetic and enthusiastic narrator for a podcast.
The entire podcast script is provided below this instruction.
Your job is to narrate only the specific dialogue line provided to you in subsequent messages, responding immediately as if in real-time, using a natural, friendly, and engaging tone.
When narrating, use the context of the entire podcast script to inform your delivery.
Speak smoothly and conversationally, not like you are reading off a script.
Pause naturally at commas, periods, and question marks.
Vary your pacing slightly as a person would in real conversation.
Do not narrate anything assigned to other speakers or identify which speaker is talking.
Only narrate the specific dialogues provided to you.
Do not introduce yourself or any other speaker; simply speak the dialogues as you receive them, as if they were being spoken in that moment.
The script is designed for a podcast and contains conversational exchanges between speakers.
Do not add any additional information unless asked.
Remember, you must receive and acknowledge the full script first before you begin receiving and narrating individual dialogue lines.

.env: Environment variables configuration requirements.txt: Python package dependencies

Usage:

Prepare your conversation script in podcast_script.txt
Run the generator:

python app.py

Find the generated podcast as final_podcast.wav

Environment Variables:

Create a .env file with the following variables:

GOOGLE_API_KEY=your_google_api_key_here
VOICE_A=Puck    # Optional: Default is Puck; Current options are Puck, Charon, Kore, Fenrir, Aoede
VOICE_B=Kore    # Optional: Default is Kore; Current options are Puck, Charon, Kore, Fenrir, Aoede

Error Handling:

The system automatically retries on connection failures Maximum retry attempts: 3 Temporary files are automatically cleaned up

Output:

Individual speaker audio files are generated temporarily Final output is combined into final_podcast.wav All temporary files are automatically cleaned up

License:

MIT License

Contributing:

Fork the repository
Create your feature branch
Commit your changes
Push to the branch
Create a new Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
podcast_script.txt		podcast_script.txt
requirements.txt		requirements.txt
system_instructions.txt		system_instructions.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gemini-2-tts

About

Releases

Languages

agituts/gemini-2-tts

Folders and files

Latest commit

History

Repository files navigation

gemini-2-tts

About

Topics

Resources

Stars

Watchers

Forks

Releases

Languages