LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
-
Updated
Sep 24, 2024 - Python
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
The official code for the SALMon🍣 benchmark
A collections of audio codecs with a standardized API
A Survey of Spoken Dialogue Models (60 pages)
Add a description, image, and links to the speech-language-model topic page so that developers can more easily learn about it.
To associate your repository with the speech-language-model topic, visit your repo's landing page and select "manage topics."