Voice Agent with Open Source Models

This repository demonstrates how to build a voice agent using open-source Large Language Models (LLMs), text-to-speech (TTS), and speech-to-text (STT) models. It utilizes Pipecat voice pipeline and is deployed with BentoML. The voice agent is accessible via a phone number, leveraging Twilio as the communication transport. This example can be easily extended to incorporate additional voice agent features and functionality.

This voice agent the following models:

Llama 3.1
XTTS text-to-speech model
Whisper speech-to-text model

The LLM and XTTS models are deployed as separate API endpoints, as outlined in the instructions below. These API endpoints are provided to the voice agent through environment variables.

See here for a full list of BentoML example projects.

Prerequisites

This repository has been verified with Python 3.11 and BentoML 1.3.9.

pip install -U bentoml

Dependent models

Deploy the LLM and XTTS models by following the instructions provided in their respective repositories.

Deploy LLM with BentoVLLM
Deploy XTTS with BentoXTTSStreaming

Once the models are deployed, you can obtain their API endpoints from BentoCloud. These endpoints should then be set as environment variables for the voice agent deployment.

XTTS_SERVICE_URL
OPENAI_SERVICE_URL

Run the voice agent

Install the following system packages to run the voice agent locally.

ffmpeg

Install the required Python packages.

pip install -U -r requirements.txt

Start the server with endpoint URLs environment variables. Update the values as the endpoint URLs of your deployments.

XTTS_SERVICE_URL=https://xtts-streaming-rvpg-d3767914.mt-guc1.bentoml.ai OPENAI_SERVICE_URL=https://llama-3-1-zwu6-d3767914.mt-guc1.bentoml.ai/v1 bentoml serve

The server exposes two key endpoints:

/voice/start_call: An HTTP endpoint that serves as a Twilio webhook to initiate calls.
/voice/ws: A WebSocket endpoint that processes voice data in real-time.

On Twilio's voice configuration page, set the voice agent endpoint (including the /voice/start_call path) as a webhook URL.

Deploy to BentoCloud

After the Service is ready, you can deploy the application to BentoCloud for better management and scalability. Sign up if you haven't got a BentoCloud account.

Make sure you have logged in to BentoCloud, then run the following command to deploy it.

bentoml deploy . --env XTTS_SERVICE_URL=https://xtts-streaming-rvpg-d3767914.mt-guc1.bentoml.ai --env OPENAI_SERVICE_URL=https://llama-3-1-zwu6-d3767914.mt-guc1.bentoml.ai/v1

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.bentoignore		.bentoignore
.gitignore		.gitignore
README.md		README.md
bentofile.yaml		bentofile.yaml
bot.py		bot.py
flow_diagram.png		flow_diagram.png
requirements.txt		requirements.txt
service.py		service.py
service_arch.png		service_arch.png
simple_xtts.py		simple_xtts.py
twilio_setup.png		twilio_setup.png
whisper_bento.py		whisper_bento.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Agent with Open Source Models

Prerequisites

Dependent models

Run the voice agent

Deploy to BentoCloud

About

Releases

Packages

Contributors 4

Languages

bentoml/BentoVoiceAgent

Folders and files

Latest commit

History

Repository files navigation

Voice Agent with Open Source Models

Prerequisites

Dependent models

Run the voice agent

Deploy to BentoCloud

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages