WhisperTurboAPI

An optimized FastAPI server implementation of OpenAI's Whisper large-v3-turbo model using MLX optimization, designed for low-latency, asynchronous and synchronous audio transcription on macOS.

Features

🎙️ Fast Audio Transcription: Leverage the turbocharged, MLX-optimized Whisper large-v3-turbo model for quick and accurate transcriptions.
🌐 RESTful API Access: Easily integrate with any environment that supports HTTP requests.
⚡ Async/Sync Support: Seamlessly handle both asynchronous and synchronous transcription requests.
🔄 Low Latency: Optimized for minimal delay by preloading models and efficient processing.
🔧 High Throughput: Capable of handling multiple concurrent transcription requests.
🚀 Easy Setup: Simple setup process.

System Requirements

Operating System: macOS with Apple Silicon (recommended for optimal performance due to MLX optimizations)
Python Version: 3.12.3 (required for MLX optimization)
Python Version Manager: pyenv (recommended but optional)

Installation

1. Clone the Repository

git clone https://github.com/kristofferv98/whisper_turboapi.git
cd whisper_turboapi

2. Run the Setup Script

The setup.sh script will handle the environment setup, including checking for the required Python version, creating a virtual environment, installing dependencies, and verifying the installation.

./setup.sh

Optional Flags:

-y or --yes: Run in headless mode (assume 'yes' to all prompts)
-p or --python-version: Specify the required Python version (default: 3.12.3)
-f or --force: Force actions like recreating the virtual environment
-h or --help: Display help message

Example:

./setup.sh -y

This will run the setup in headless mode, assuming 'yes' to all prompts.

3. Start the Server

After the setup completes, you can start the server using:

./start_server.sh

The server supports several command-line options:

./start_server.sh --host=0.0.0.0  # Custom host (default: 0.0.0.0)
./start_server.sh --port=8080     # Custom port (default: 8000)
./start_server.sh --help          # Show help message

You can combine options:

./start_server.sh --host=127.0.0.1 --port=8080

The server will automatically:

Activate the virtual environment if not already active
Verify all required packages are installed
Start the FastAPI server with the specified host and port

Note: The start_server.sh script ensures that the virtual environment is activated and that all required packages are installed before starting the server.

Usage Examples

Simple Python Client

Here’s a basic synchronous client example using requests.

import requests
import os

def transcribe_audio(file_path, quick=True, any_lang=True, server_url="http://localhost:8000"):
    """
    Simplified function to transcribe an audio file using the WhisperTurboAPI server.

    Args:
        file_path (str): Path to the audio file.
        quick (bool, optional): Whether to use quick mode. Default is True.
        any_lang (bool, optional): Whether to allow any language detection. Default is True.
        server_url (str, optional): URL of the transcription server.

    Returns:
        str: The transcribed text.
    """
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"Audio file not found: {file_path}")
        
    if not file_path.lower().endswith(('.wav', '.mp3', '.m4a', '.flac')):
        raise ValueError("Unsupported audio format. Use WAV, MP3, M4A, or FLAC")
        
    with open(file_path, 'rb') as f:
        files = {'file': (os.path.basename(file_path), f, 'audio/wav')}
        params = {
            'quick': str(quick).lower(),
            'any_lang': str(any_lang).lower()
        }
        response = requests.post(f"{server_url}/transcribe", files=files, params=params)
    if response.status_code == 200:
        return response.json()['text']
    else:
        raise Exception(f"Request failed: {response.status_code}, {response.text}")

# Example usage
if __name__ == "__main__":
    # Replace 'path/to/audio.wav' with the path to your audio file
    audio_file = 'path/to/audio.wav'
    try:
        transcription = transcribe_audio(audio_file, quick=True, any_lang=False)
        print("Transcription:")
        print(transcription)
    except Exception as e:
        print(f"An error occurred: {e}")

Asynchronous Client

An asynchronous client example using aiohttp.

import aiohttp
import asyncio
import os

async def transcribe_async(file_path, quick=True, any_lang=True, server_url="http://localhost:8000"):
    """
    Asynchronous function to transcribe an audio file using the WhisperTurboAPI server.

    Args:
        file_path (str): Path to the audio file.
        quick (bool, optional): Whether to use quick mode. Default is True.
        any_lang (bool, optional): Whether to allow any language detection. Default is True.
        server_url (str, optional): URL of the transcription server.

    Returns:
        str: The transcribed text.
    """
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"Audio file not found: {file_path}")

    if not file_path.lower().endswith(('.wav', '.mp3', '.m4a', '.flac')):
        raise ValueError("Unsupported audio format. Use WAV, MP3, M4A, or FLAC")

    async with aiohttp.ClientSession() as session:
        with open(file_path, 'rb') as f:
            data = aiohttp.FormData()
            data.add_field('file', f, filename=os.path.basename(file_path))
            params = {
                'quick': str(quick).lower(),
                'any_lang': str(any_lang).lower()
            }

            async with session.post(f"{server_url}/transcribe", data=data, params=params) as response:
                if response.status == 200:
                    result = await response.json()
                    return result['text']
                else:
                    error_detail = await response.text()
                    raise Exception(f"Error: {response.status} - {error_detail}")

# Example usage
if __name__ == "__main__":
    async def main():
        audio_file = 'path/to/audio.wav'
        try:
            transcription = await transcribe_async(audio_file, quick=False, any_lang=True)
            print("Transcription:")
            print(transcription)
        except Exception as e:
            print(f"An error occurred: {e}")

    asyncio.run(main())

CURL

You can use curl to test the API. Here's a working example:

curl -X POST \
  -H "Content-Type: multipart/form-data" \
  -F "file=@path/to/your/audio.wav" \
  http://localhost:8000/transcribe

For example with a sample audio file:

curl -X POST \
  -H "Content-Type: multipart/form-data" \
  -F "file=@sample_audio.wav" \
  http://localhost:8000/transcribe

The response will be JSON formatted:

{
    "text": "Transcribed text here...",
    "elapsed_time": 1.44,
    "quick_mode": true,
    "any_lang": true
}

API Reference

POST /transcribe

Transcribe an audio file.

Headers:

Content-Type: multipart/form-data (required)

Parameters:

file (form data, required): Audio file (WAV, MP3, M4A, FLAC)

Example Request:

curl -X POST \
  -H "Content-Type: multipart/form-data" \
  -F "file=@audio.wav" \
  http://localhost:8000/transcribe

Response:

{
    "text": "The transcribed text will appear here...",
    "elapsed_time": 1.44,
    "quick_mode": true,
    "any_lang": true
}

Status Codes:

200 OK: Successful transcription
400 Bad Request: Invalid file format or missing file
500 Internal Server Error: Server processing error

GET /health

Check server status.

Response:

{
    "status": "healthy",
    "version": "1.0.0"
}

Demo:

For testing, example clients are provided in:

examples/demo.py: Demonstrates both synchronous and asynchronous client
examples/simple_demo.py: Basic synchronous client

License

MIT License

Acknowledgements

Based on whisper-turbo-mlx by JosefAlbers, which provides a fast and lightweight implementation of the Whisper model using MLX.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
examples		examples
scripts		scripts
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
setup.sh		setup.sh
start_server.sh		start_server.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WhisperTurboAPI

Features

System Requirements

Installation

1. Clone the Repository

2. Run the Setup Script

3. Start the Server

Usage Examples

Simple Python Client

Asynchronous Client

CURL

API Reference

POST /transcribe

GET /health

Demo:

License

Acknowledgements

About

Releases

Packages

Languages

License

kristofferv98/whisper_turboapi

Folders and files

Latest commit

History

Repository files navigation

WhisperTurboAPI

Features

System Requirements

Installation

1. Clone the Repository

2. Run the Setup Script

3. Start the Server

Usage Examples

Simple Python Client

Asynchronous Client

CURL

API Reference

POST /transcribe

GET /health

Demo:

License

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages