Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add whisper quadlet, move whisper model-service, update docs #83

Merged
merged 6 commits into from
Mar 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions audio-to-text/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Audio to Text Application

This sample application is a simple recipe to transcribe an audio file.
This provides a simple recipe to help developers start building out their own custom LLM enabled
audio-to-text applications. It consists of two main components; the Model Service and the AI Application.

There are a few options today for local Model Serving, but this recipe will use [`whisper-cpp`](https://github.com/ggerganov/whisper.cpp.git)
and its included Model Service. There is a Containerfile provided that can be used to build this Model Service within the repo,
[`model_servers/whispercpp/Containerfile`](/model_servers/whispercpp/Containerfile).

Our AI Application will connect to our Model Service via it's API endpoint.

<p align="center">
<img src="../assets/whisper.png" width="70%">
</p>

# Build the Application

In order to build this application we will need a model, a Model Service and an AI Application.

* [Download a model](#download-a-model)
* [Build the Model Service](#build-the-model-service)
* [Deploy the Model Service](#deploy-the-model-service)
* [Build the AI Application](#build-the-ai-application)
* [Deploy the AI Application](#deploy-the-ai-application)
* [Interact with the AI Application](#interact-with-the-ai-application)
* [Input audio files](#input-audio-files)

### Download a model

If you are just getting started, we recommend using [ggerganov/whisper.cpp](https://huggingface.co/ggerganov/whisper.cpp).
This is a well performant model with an MIT license.
It's simple to download a pre-converted whisper model from [huggingface.co](https://huggingface.co)
here: https://huggingface.co/ggerganov/whisper.cpp. There are a number of options, but we recommend to start with `ggml-small.bin`.

The recommended model can be downloaded using the code snippet below:

```bash
cd models
wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin
cd ../
```

_A full list of supported open models is forthcoming._


### Build the Model Service

The Model Service can be built from the root directory with the following code snippet:

```bash
cd model_servers/whispercpp
podman build -t whispercppserver .
```

### Deploy the Model Service

The local Model Service relies on a volume mount to the localhost to access the model files. You can start your local Model Service using the following podman command:
```
podman run --rm -it \
-p 8001:8001 \
-v Local/path/to/locallm/models:/locallm/models \
-e MODEL_PATH=models/<model-filename> \
-e HOST=0.0.0.0 \
-e PORT=8001 \
whispercppserver
```

### Build the AI Application

Now that the Model Service is running we want to build and deploy our AI Application. Use the provided Containerfile to build the AI Application
image from the `audio-to-text/` directory.

```bash
cd audio-to-text
podman build -t audio-to-text . -f builds/Containerfile
```
### Deploy the AI Application

Make sure the Model Service is up and running before starting this container image.
When starting the AI Application container image we need to direct it to the correct `MODEL_SERVICE_ENDPOINT`.
This could be any appropriately hosted Model Service (running locally or in the cloud) using a compatible API.
The following podman command can be used to run your AI Application:

```bash
podman run --rm -it -p 8501:8501 -e MODEL_SERVICE_ENDPOINT=http://0.0.0.0:8001/inference audio-to-text
```

### Interact with the AI Application

Once the streamlit application is up and running, you should be able to access it at `http://localhost:8501`.
From here, you can upload audio files from your local machine and translate the audio files as shown below.

By using this recipe and getting this starting point established,
users should now have an easier time customizing and building their own LLM enabled applications.

#### Input audio files

Whisper.cpp requires as an input 16-bit WAV audio files.
To convert your input audio files to 16-bit WAV format you can use `ffmpeg` like this:

```bash
ffmpeg -i <input.mp3> -ar 16000 -ac 1 -c:a pcm_s16le <output.wav>
```
30 changes: 30 additions & 0 deletions audio-to-text/quadlet/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
### Run audio-text locally as a podman pod

There are pre-built images and a pod definition to run this audio-to-text example application.
This sample converts an audio waveform (.wav) file to text.

To run locally,

```bash
podman kube play ./quadlet/audio-to-text.yaml
```
To monitor locally,

```bash
podman pod list
podman ps
podman logs <name of container from the above>
```

The application should be acessible at `http://localhost:8501`. It will take a few minutes for the model to load.

### Run audio-text as a systemd service

```bash
cp audio-text.yaml /etc/containers/systemd/audio-text.yaml
cp audio-text.kube.example /etc/containers/audio-text.kube
cp audio-text.image /etc/containers/audio-text.image
/usr/libexec/podman/quadlet --dryrun (optional)
systemctl daemon-reload
systemctl start audio-text
```
7 changes: 7 additions & 0 deletions audio-to-text/quadlet/audio-text.image
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[Install]
WantedBy=audio-text.service

[Image]
Image=quay.io/redhat-et/locallm-whisper-ggml-small:latest
Image=quay.io/redhat-et/locallm-whisper-service:latest
Image=quay.io/redhat-et/locallm-audio-to-text:latest
16 changes: 16 additions & 0 deletions audio-to-text/quadlet/audio-text.kube.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[Unit]
Description=Python script to run against downloaded LLM
Documentation=man:podman-generate-systemd(1)
Wants=network-online.target
After=network-online.target
RequiresMountsFor=%t/containers

[Kube]
# Point to the yaml file in the same directory
Yaml=audio-text.yaml

[Service]
Restart=always

[Install]
WantedBy=default.target
45 changes: 45 additions & 0 deletions audio-to-text/quadlet/audio-text.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
apiVersion: v1
kind: Pod
metadata:
labels:
app: audio-to-text
name: audio-to-text
spec:
initContainers:
- name: model-file
image: quay.io/redhat-et/locallm-whisper-ggml-small:latest
command: ['/usr/bin/install', "/model/ggml-small.bin", "/shared/"]
volumeMounts:
- name: model-file
mountPath: /shared
containers:
- env:
- name: MODEL_SERVICE_ENDPOINT
value: http://0.0.0.0:8001/inference
image: quay.io/redhat-et/locallm-audio-to-text:latest
name: audio-to-text
ports:
- containerPort: 8501
hostPort: 8501
securityContext:
runAsNonRoot: true
- env:
- name: HOST
value: 0.0.0.0
- name: PORT
value: 8001
- name: MODEL_PATH
value: /model/ggml-small.bin
image: quay.io/redhat-et/locallm-whisper-service:latest
name: whisper-model-service
ports:
- containerPort: 8001
hostPort: 8001
securityContext:
runAsNonRoot: true
volumeMounts:
- name: model-file
mountPath: /model
volumes:
- name: model-file
emptyDir: {}
46 changes: 46 additions & 0 deletions model_servers/whispercpp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
## Whisper

Whisper models are useful for converting audio files to text. The sample application [audio-to-text](../audio-to-text/README.md)
describes how to run an inference application. This document describes how to build a service for a Whisper model.

### Build model service

To build a Whisper model service container image from this directory,

```bash
podman build -t whisper:image .
```

### Download Whisper model

You can to download the model from HuggingFace. There are various Whisper models available which vary in size and can be found
[here](https://huggingface.co/ggerganov/whisper.cpp). We will be using the `small` model which is about 466 MB.

- **small**
- Download URL: [https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin](https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin)

```bash
cd ../models
wget --no-config --quiet --show-progress -O ggml-small.bin <Download URL>
cd ../
```

### Deploy Model Service

Deploy the LLM and volume mount the model of choice.
Here, we are mounting the `ggml-small.bin` model as downloaded from above.

```bash
# Note: the :Z may need to be omitted from the model volume mount if not running on Linux

podman run --rm -it \
-p 8001:8001 \
-v /local/path/to/locallm/models/ggml-small.bin:/models/ggml-small.bin:Z,ro \
-e HOST=0.0.0.0 \
-e MODEL_PATH=/models/ggml-small.bin \
-e PORT=8001 \
whisper:image
```

By default, a sample `jfk.wav` file is included in the whisper image. This can be used to test with.
The environment variable `AUDIO_FILE`, can be passed with your own audio file to override the default `/app/jfk.wav` file within the whisper image.
4 changes: 4 additions & 0 deletions model_servers/whispercpp/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#! bin/bash

./server -tr --model ${MODEL_PATH} --host ${HOST:=0.0.0.0} --port ${PORT:=8001}

1 change: 1 addition & 0 deletions models/Containerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_S.gguf
#https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_S.gguf
#https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF/resolve/main/codellama-7b-instruct.Q4_K_M.gguf
#https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin
# podman build --build-arg MODEL_URL=https://... -t quay.io/yourimage .
FROM registry.access.redhat.com/ubi9/ubi-micro:9.3-13
ARG MODEL_URL
Expand Down
2 changes: 1 addition & 1 deletion playground/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,4 +69,4 @@ podman run --rm -it -d \
-v Local/path/to/locallm/models:/locallm/models:ro,Z \
-e CONFIG_PATH=models/<config-filename> \
playground:image
```
```
77 changes: 0 additions & 77 deletions whisper-playground/README.md

This file was deleted.

3 changes: 0 additions & 3 deletions whisper-playground/run.sh

This file was deleted.