Whisper Inference Server is an OpenAI compatible Elixir-based HTTP server for running inference on audio files using OpenAI’s Whisper model. The server supports batching for efficient inference, CPU/GPU execution via EXLA backend, and can be configured dynamically at runtime through command-line parameters.
- Batching: Process multiple audio files simultaneously to optimize inference.
- CPU/GPU support: Choose between host (CPU) or cuda (GPU) backends for inference.
- Dynamic configuration: Configure model, batch size, timeout, and other parameters at runtime.
- Modular design: Clean architecture for easy extension and maintenance.
- OpenAI compatible.
- Elixir installation If Elixir is not installed, follow the official guide.
- FFmpeg installation The server requires FFmpeg for audio preprocessing. Install it using:
sudo apt update
sudo apt install ffmpeg
- Clone the repository:
git clone git@github.com:dailydaniel/cool-whisper-server.git
cd cool-whisper-server
- Fix .env if you need: current.env:
DEFAULT_DEVICE_ID=0
MEMORY_FRACTION=0.9
- Install dependencies:
mix deps.get
mix deps.compile
- Run the server:
mix run --no-halt -- \
--batch_size 3 \
--batch_timeout 3000 \
--client host \
--model openai/whisper-tiny \
--port 4000
- With curl:
curl -X POST -F "file=@some_audio.wav" http://localhost:4000/infer
- With Python:
from openai import OpenAI
HOST = 'localhost'
PORT = 4000
client = OpenAI(api_key="None", base_url=f"http://{HOST}:{PORT}/v1/")
file_path = "some_audio.wav"
audio_file = open(file_path, "rb")
transcription = client.audio.transcriptions.create(
model="whisper-1", file=audio_file, response_format="text" # response format: text | json
)
- --batch_size (default: 3): Number of audio files to process in a batch.
- --batch_timeout (default: 3000): Maximum wait time (in ms) for batch formation.
- --client (default: host): Backend type for inference (host or cuda).
- --model (default: openai/whisper-tiny): Name of the Whisper model from Hugging Face Hub.
- --port (default: 4000): HTTP port to run the server.
Contributions, issues, and feature requests are welcome. Feel free to submit a pull request or open an issue in the repository.