WhisperKit Android (Alpha)

WhisperKit Android brings Foundation Models On Device for Automatic Speech Recognition. It extends the performance and feature set of WhisperKit from Apple platforms to Android and (soon) Linux.

[Example App (Coming with Beta)] [Blog Post] [Python Tools Repo]

Installation

(Click to expand)

The following setup was tested on macOS 15.1.

Ensure you have the required build tools using:

make setup

Download Whisper models (<1.5GB) and auxiliary files

make download-models

Build development environment in Docker with all development tools (~12GB):

make env

The first time running make env command will take several minutes.

After Docker image builds, the next time running make env will execute inside the Docker container right away.

You can use the following to rebuild the Docker image, if needed:

make rebuild-env

Getting Started

(Click to expand)

ArgmaX Inference Engine (axie) orchestration for TFLite is provided as the axie_tflite CLI.

Execute into the Docker build environment:

make env

Inside the Docker environment, build the axie_tflite CLI using:

make build

On the host machine (outside Docker shell), push dependencies to the Android device:

make adb-push

You can reuse this target to push the axie_tflite if you rebuild it.

If you want to include audio files, place them in the /path/to/WhisperKitAndroid/inputs folder and they will be copied to /sdcard/argmax/tflite/inputs/.

Connect to the Android device using:

make adb-shell

Run axie_tflite

Usage: axie_tflite <audio input> <tiny | base | small>

Contributing & Roadmap

WhisperKit Android is currently in the v0.1 Alpha stage. Contributions from the community will be encouraged after the project reaches the v0.1 Beta milestone.

v0.1 Beta (November 2024)

Temperature fallbacks for decoding guardrails
Input audio file format coverage for wav, flac, mp4, m4a, mp3
Output file format coverage for SRT, VTT, and OpenAI-compatible JSON
WhisperKit Benchmarks performance and quality data publication

v0.2 (Q1 2025)

Whisper Large v3 Turbo (v20240930) support
Streaming real-time inference
Model compression

License

We release WhisperKit Android under MIT License.
OpenAI Whisper model open-source checkpoints were released under the MIT License.
Qualcomm AI Hub .tflite models and QNN libraries for NPU deployment are released under the Qualcomm AI Model & Software License.

Citation

If you use WhisperKit for something cool or just find it useful, please drop us a note at info@argmaxinc.com!

If you are looking for managed enterprise deployment with Argmax, please drop us a note at info+sales@argmaxinc.com.

If you use WhisperKit for academic work, here is the BibTeX:

@misc{whisperkit-argmax,
   title = {WhisperKit},
   author = {Argmax, Inc.},
   year = {2024},
   URL = {https://github.com/argmaxinc/WhisperKit}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
inc		inc
scripts		scripts
src		src
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WhisperKit Android (Alpha)

Table of Contents

Installation

Getting Started

Contributing & Roadmap

v0.1 Beta (November 2024)

v0.2 (Q1 2025)

License

Citation

About

Releases

Packages

Contributors 5

Languages

License

argmaxinc/WhisperKitAndroid

Folders and files

Latest commit

History

Repository files navigation

WhisperKit Android (Alpha)

Table of Contents

Installation

Getting Started

Contributing & Roadmap

v0.1 Beta (November 2024)

v0.2 (Q1 2025)

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages