An optimized FastAPI server implementation of OpenAI's Whisper large-v3-turbo model using MLX optimization, designed for low-latency, asynchronous and synchronous audio transcription on macOS.
- 🎙️ Fast Audio Transcription: Leverage the turbocharged, MLX-optimized Whisper large-v3-turbo model for quick and accurate transcriptions.
- 🌐 RESTful API Access: Easily integrate with any environment that supports HTTP requests.
- ⚡ Async/Sync Support: Seamlessly handle both asynchronous and synchronous transcription requests.
- 🔄 Low Latency: Optimized for minimal delay by preloading models and efficient processing.
- 🔧 High Throughput: Capable of handling multiple concurrent transcription requests.
- 🚀 Easy Setup: Simple setup process.
- Operating System: macOS with Apple Silicon (recommended for optimal performance due to MLX optimizations)
- Python Version: 3.12.3 (required for MLX optimization)
- Python Version Manager: pyenv (recommended but optional)
git clone https://github.com/kristofferv98/whisper_turboapi.git
cd whisper_turboapi
The setup.sh
script will handle the environment setup, including checking for the required Python version, creating a virtual environment, installing dependencies, and verifying the installation.
./setup.sh
Optional Flags:
-y
or--yes
: Run in headless mode (assume 'yes' to all prompts)-p
or--python-version
: Specify the required Python version (default: 3.12.3)-f
or--force
: Force actions like recreating the virtual environment-h
or--help
: Display help message
Example:
./setup.sh -y
This will run the setup in headless mode, assuming 'yes' to all prompts.
After the setup completes, you can start the server using:
./start_server.sh
The server supports several command-line options:
./start_server.sh --host=0.0.0.0 # Custom host (default: 0.0.0.0)
./start_server.sh --port=8080 # Custom port (default: 8000)
./start_server.sh --help # Show help message
You can combine options:
./start_server.sh --host=127.0.0.1 --port=8080
The server will automatically:
- Activate the virtual environment if not already active
- Verify all required packages are installed
- Start the FastAPI server with the specified host and port
Note: The start_server.sh
script ensures that the virtual environment is activated and that all required packages are installed before starting the server.
Here’s a basic synchronous client example using requests
.
import requests
import os
def transcribe_audio(file_path, quick=True, any_lang=True, server_url="http://localhost:8000"):
"""
Simplified function to transcribe an audio file using the WhisperTurboAPI server.
Args:
file_path (str): Path to the audio file.
quick (bool, optional): Whether to use quick mode. Default is True.
any_lang (bool, optional): Whether to allow any language detection. Default is True.
server_url (str, optional): URL of the transcription server.
Returns:
str: The transcribed text.
"""
if not os.path.exists(file_path):
raise FileNotFoundError(f"Audio file not found: {file_path}")
if not file_path.lower().endswith(('.wav', '.mp3', '.m4a', '.flac')):
raise ValueError("Unsupported audio format. Use WAV, MP3, M4A, or FLAC")
with open(file_path, 'rb') as f:
files = {'file': (os.path.basename(file_path), f, 'audio/wav')}
params = {
'quick': str(quick).lower(),
'any_lang': str(any_lang).lower()
}
response = requests.post(f"{server_url}/transcribe", files=files, params=params)
if response.status_code == 200:
return response.json()['text']
else:
raise Exception(f"Request failed: {response.status_code}, {response.text}")
# Example usage
if __name__ == "__main__":
# Replace 'path/to/audio.wav' with the path to your audio file
audio_file = 'path/to/audio.wav'
try:
transcription = transcribe_audio(audio_file, quick=True, any_lang=False)
print("Transcription:")
print(transcription)
except Exception as e:
print(f"An error occurred: {e}")
An asynchronous client example using aiohttp
.
import aiohttp
import asyncio
import os
async def transcribe_async(file_path, quick=True, any_lang=True, server_url="http://localhost:8000"):
"""
Asynchronous function to transcribe an audio file using the WhisperTurboAPI server.
Args:
file_path (str): Path to the audio file.
quick (bool, optional): Whether to use quick mode. Default is True.
any_lang (bool, optional): Whether to allow any language detection. Default is True.
server_url (str, optional): URL of the transcription server.
Returns:
str: The transcribed text.
"""
if not os.path.exists(file_path):
raise FileNotFoundError(f"Audio file not found: {file_path}")
if not file_path.lower().endswith(('.wav', '.mp3', '.m4a', '.flac')):
raise ValueError("Unsupported audio format. Use WAV, MP3, M4A, or FLAC")
async with aiohttp.ClientSession() as session:
with open(file_path, 'rb') as f:
data = aiohttp.FormData()
data.add_field('file', f, filename=os.path.basename(file_path))
params = {
'quick': str(quick).lower(),
'any_lang': str(any_lang).lower()
}
async with session.post(f"{server_url}/transcribe", data=data, params=params) as response:
if response.status == 200:
result = await response.json()
return result['text']
else:
error_detail = await response.text()
raise Exception(f"Error: {response.status} - {error_detail}")
# Example usage
if __name__ == "__main__":
async def main():
audio_file = 'path/to/audio.wav'
try:
transcription = await transcribe_async(audio_file, quick=False, any_lang=True)
print("Transcription:")
print(transcription)
except Exception as e:
print(f"An error occurred: {e}")
asyncio.run(main())
You can use curl
to test the API. Here's a working example:
curl -X POST \
-H "Content-Type: multipart/form-data" \
-F "file=@path/to/your/audio.wav" \
http://localhost:8000/transcribe
For example with a sample audio file:
curl -X POST \
-H "Content-Type: multipart/form-data" \
-F "file=@sample_audio.wav" \
http://localhost:8000/transcribe
The response will be JSON formatted:
{
"text": "Transcribed text here...",
"elapsed_time": 1.44,
"quick_mode": true,
"any_lang": true
}
Transcribe an audio file.
Headers:
Content-Type: multipart/form-data
(required)
Parameters:
file
(form data, required): Audio file (WAV, MP3, M4A, FLAC)
Example Request:
curl -X POST \
-H "Content-Type: multipart/form-data" \
-F "file=@audio.wav" \
http://localhost:8000/transcribe
Response:
{
"text": "The transcribed text will appear here...",
"elapsed_time": 1.44,
"quick_mode": true,
"any_lang": true
}
Status Codes:
200 OK
: Successful transcription400 Bad Request
: Invalid file format or missing file500 Internal Server Error
: Server processing error
Check server status.
Response:
{
"status": "healthy",
"version": "1.0.0"
}
For testing, example clients are provided in:
examples/demo.py
: Demonstrates both synchronous and asynchronous clientexamples/simple_demo.py
: Basic synchronous client
MIT License
Based on whisper-turbo-mlx
by JosefAlbers, which provides a fast and lightweight implementation of the Whisper model using MLX.