Whisper Inference Server

Whisper Inference Server is an OpenAI compatible Elixir-based HTTP server for running inference on audio files using OpenAI’s Whisper model. The server supports batching for efficient inference, CPU/GPU execution via EXLA backend, and can be configured dynamically at runtime through command-line parameters.

Features

Batching: Process multiple audio files simultaneously to optimize inference.
CPU/GPU support: Choose between host (CPU) or cuda (GPU) backends for inference.
Dynamic configuration: Configure model, batch size, timeout, and other parameters at runtime.
Modular design: Clean architecture for easy extension and maintenance.
OpenAI compatible.

Installation

Elixir installation If Elixir is not installed, follow the official guide.
FFmpeg installation The server requires FFmpeg for audio preprocessing. Install it using:

sudo apt update
sudo apt install ffmpeg

Clone the repository:

git clone git@github.com:dailydaniel/cool-whisper-server.git
cd cool-whisper-server

Fix .env if you need: current.env:

DEFAULT_DEVICE_ID=0
MEMORY_FRACTION=0.9

Install dependencies:

mix deps.get
mix deps.compile

Run the server:

mix run --no-halt -- \
    --batch_size 3 \
    --batch_timeout 3000 \
    --client host \
    --model openai/whisper-tiny \
    --port 4000

How to use

With curl:

curl -X POST -F "file=@some_audio.wav" http://localhost:4000/infer

With Python:

from openai import OpenAI

HOST = 'localhost'
PORT = 4000
client = OpenAI(api_key="None", base_url=f"http://{HOST}:{PORT}/v1/")

file_path = "some_audio.wav"
audio_file = open(file_path, "rb")
transcription = client.audio.transcriptions.create(
    model="whisper-1", file=audio_file, response_format="text"  # response format: text | json
)

Configuration Parameters

--batch_size (default: 3): Number of audio files to process in a batch.
--batch_timeout (default: 3000): Maximum wait time (in ms) for batch formation.
--client (default: host): Backend type for inference (host or cuda).
--model (default: openai/whisper-tiny): Name of the Whisper model from Hugging Face Hub.
--port (default: 4000): HTTP port to run the server.

Contributing

Contributions, issues, and feature requests are welcome. Feel free to submit a pull request or open an issue in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
config		config
lib		lib
test		test
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mix.exs		mix.exs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Inference Server

Features

Installation

How to use

Configuration Parameters

Contributing

About

Releases

Packages

Languages

License

dailydaniel/cool-whisper-server

Folders and files

Latest commit

History

Repository files navigation

Whisper Inference Server

Features

Installation

How to use

Configuration Parameters

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages