From dd1fc7edbfa302d4827a2c90683773a23b49f7a7 Mon Sep 17 00:00:00 2001 From: sallyom Date: Mon, 18 Mar 2024 01:48:18 -0400 Subject: [PATCH 1/6] add whisper quadlet & update docs --- audio-to-text/README.md | 105 ++++++++++++++++++ .../client/Containerfile | 0 .../client/requirements.txt | 0 .../client/whisper_client.py | 0 audio-to-text/quadlet/README.md | 30 +++++ audio-to-text/quadlet/audio-text.image | 7 ++ audio-to-text/quadlet/audio-text.kube.example | 16 +++ audio-to-text/quadlet/audio-text.yaml | 45 ++++++++ .../whispercpp}/Containerfile | 0 model_servers/whispercpp/README.md | 46 ++++++++ model_servers/whispercpp/run.sh | 4 + models/Containerfile | 1 + playground/README.md | 2 +- whisper-playground/README.md | 77 ------------- whisper-playground/run.sh | 3 - 15 files changed, 255 insertions(+), 81 deletions(-) create mode 100644 audio-to-text/README.md rename {whisper-playground => audio-to-text}/client/Containerfile (100%) rename {whisper-playground => audio-to-text}/client/requirements.txt (100%) rename {whisper-playground => audio-to-text}/client/whisper_client.py (100%) create mode 100644 audio-to-text/quadlet/README.md create mode 100644 audio-to-text/quadlet/audio-text.image create mode 100644 audio-to-text/quadlet/audio-text.kube.example create mode 100644 audio-to-text/quadlet/audio-text.yaml rename {whisper-playground => model_servers/whispercpp}/Containerfile (100%) create mode 100644 model_servers/whispercpp/README.md create mode 100644 model_servers/whispercpp/run.sh delete mode 100644 whisper-playground/README.md delete mode 100644 whisper-playground/run.sh diff --git a/audio-to-text/README.md b/audio-to-text/README.md new file mode 100644 index 00000000..5fc43b34 --- /dev/null +++ b/audio-to-text/README.md @@ -0,0 +1,105 @@ +# Audio to Text Application + + This sample application is a simple recipe to transcribe an audio file. + This provides a simple recipe to help developers start building out their own custom LLM enabled + audio-to-text applications. It consists of two main components; the Model Service and the AI Application. + + There are a few options today for local Model Serving, but this recipe will use [`whisper-cpp`](https://github.com/ggerganov/whisper.cpp.git) + and their OpenAI compatible Model Service. There is a Containerfile provided that can be used to build this Model Service within the repo, + [`model_servers/whispercpp/Containerfile`](/model_servers/whispercpp/Containerfile). + + Our AI Application will connect to our Model Service via it's OpenAI compatible API. + +

+ +

+ +# Build the Application + +In order to build this application we will need a model, a Model Service and an AI Application. + +* [Download a model](#download-a-model) +* [Build the Model Service](#build-the-model-service) +* [Deploy the Model Service](#deploy-the-model-service) +* [Build the AI Application](#build-the-ai-application) +* [Deploy the AI Application](#deploy-the-ai-application) +* [Interact with the AI Application](#interact-with-the-ai-application) + * [Input audio files](#input-audio-files) + +### Download a model + +If you are just getting started, we recommend using [ggerganov/whisper.cpp](https://huggingface.co/ggerganov/whisper.cpp). +This is a well performant mid-sized model with an apache-2.0 license. +It's simple to download a pre-converted whisper model from [huggingface.co](https://huggingface.co) +here: https://huggingface.co/ggerganov/whisper.cpp. There are a number of options, but we recommend to start with `ggml-small.bin`. + +The recommended model can be downloaded using the code snippet below: + +```bash +cd models +wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin +cd ../ +``` + +_A full list of supported open models is forthcoming._ + + +### Build the Model Service + +The Model Service can be built from the root directory with the following code snippet: + +```bash +cd model_servers/whispercpp +podman build -t whispercppserver . +``` + +### Deploy the Model Service + +The local Model Service relies on a volume mount to the localhost to access the model files. You can start your local Model Service using the following podman command: +``` +podman run --rm -it \ + -p 8001:8001 \ + -v Local/path/to/locallm/models:/locallm/models \ + -e MODEL_PATH=models/ \ + -e HOST=0.0.0.0 \ + -e PORT=8001 \ + whispercppserver +``` + +### Build the AI Application + +Now that the Model Service is running we want to build and deploy our AI Application. Use the provided Containerfile to build the AI Application +image from the `audio-to-text/` directory. + +```bash +cd audio-to-text +podman build -t audio-to-text . -f builds/Containerfile +``` +### Deploy the AI Application + +Make sure the Model Service is up and running before starting this container image. +When starting the AI Application container image we need to direct it to the correct `MODEL_SERVICE_ENDPOINT`. +This could be any appropriately hosted Model Service (running locally or in the cloud) using an OpenAI compatible API. +The following podman command can be used to run your AI Application: + +```bash +podman run --rm -it -p 8501:8501 -e MODEL_SERVICE_ENDPOINT=http://0.0.0.0:8001/inference audio-to-text +``` + +### Interact with the AI Application + +Once the streamlit application is up and running, you should be able to access it at `http://localhost:8501`. +From here, you can upload audio files from your local machine and translate the audio files as shown below. + +Everything should now be up an running with the chat application available at [`http://localhost:8501`](http://localhost:8501). +By using this recipe and getting this starting point established, +users should now have an easier time customizing and building their own LLM enabled chatbot applications. + +#### Input audio files + +Whisper.cpp requires as an input 16-bit WAV audio files. +To convert your input audio files to 16-bit WAV format you can use `ffmpeg` like this: + +```bash +ffmpeg -i -ar 16000 -ac 1 -c:a pcm_s16le +``` diff --git a/whisper-playground/client/Containerfile b/audio-to-text/client/Containerfile similarity index 100% rename from whisper-playground/client/Containerfile rename to audio-to-text/client/Containerfile diff --git a/whisper-playground/client/requirements.txt b/audio-to-text/client/requirements.txt similarity index 100% rename from whisper-playground/client/requirements.txt rename to audio-to-text/client/requirements.txt diff --git a/whisper-playground/client/whisper_client.py b/audio-to-text/client/whisper_client.py similarity index 100% rename from whisper-playground/client/whisper_client.py rename to audio-to-text/client/whisper_client.py diff --git a/audio-to-text/quadlet/README.md b/audio-to-text/quadlet/README.md new file mode 100644 index 00000000..2bebaef4 --- /dev/null +++ b/audio-to-text/quadlet/README.md @@ -0,0 +1,30 @@ +### Run audio-text locally as a podman pod + +There are pre-built images and a pod definition to run this audio-to-text example application. +This sample converts an audio waveform (.wav) file to text. + +To run locally, + +```bash +podman kube play ./quadlet/audio-to-text.yaml +``` +To monitor locally, + +```bash +podman pod list +podman ps +podman logs +``` + +The application should be acessible at `http://localhost:8501`. It will take a few minutes for the model to load. + +### Run audio-text as a systemd service + +```bash +cp audio-text.yaml /etc/containers/systemd/audio-text.yaml +cp audio-text.kube.example /etc/containers/audio-text.kube +cp audio-text.image /etc/containers/audio-text.image +/usr/libexec/podman/quadlet --dryrun (optional) +systemctl daemon-reload +systemctl start audio-text +``` diff --git a/audio-to-text/quadlet/audio-text.image b/audio-to-text/quadlet/audio-text.image new file mode 100644 index 00000000..19d5fcc3 --- /dev/null +++ b/audio-to-text/quadlet/audio-text.image @@ -0,0 +1,7 @@ +[Install] +WantedBy=audio-text.service + +[Image] +Image=quay.io/redhat-et/locallm-whisper-ggml-small:latest +Image=quay.io/redhat-et/locallm-whisper-service:latest +Image=quay.io/redhat-et/locallm-audio-to-text:latest diff --git a/audio-to-text/quadlet/audio-text.kube.example b/audio-to-text/quadlet/audio-text.kube.example new file mode 100644 index 00000000..391408f3 --- /dev/null +++ b/audio-to-text/quadlet/audio-text.kube.example @@ -0,0 +1,16 @@ +[Unit] +Description=Python script to run against downloaded LLM +Documentation=man:podman-generate-systemd(1) +Wants=network-online.target +After=network-online.target +RequiresMountsFor=%t/containers + +[Kube] +# Point to the yaml file in the same directory +Yaml=audio-text.yaml + +[Service] +Restart=always + +[Install] +WantedBy=default.target diff --git a/audio-to-text/quadlet/audio-text.yaml b/audio-to-text/quadlet/audio-text.yaml new file mode 100644 index 00000000..2307c478 --- /dev/null +++ b/audio-to-text/quadlet/audio-text.yaml @@ -0,0 +1,45 @@ +apiVersion: v1 +kind: Pod +metadata: + labels: + app: audio-to-text + name: audio-to-text +spec: + initContainers: + - name: model-file + image: quay.io/redhat-et/locallm-whisper-ggml-small:latest + command: ['/usr/bin/install', "/model/ggml-small.bin", "/shared/"] + volumeMounts: + - name: model-file + mountPath: /shared + containers: + - env: + - name: MODEL_SERVICE_ENDPOINT + value: http://0.0.0.0:8001/inference + image: quay.io/redhat-et/locallm-audio-to-text:latest + name: audio-to-text + ports: + - containerPort: 8501 + hostPort: 8501 + securityContext: + runAsNonRoot: true + - env: + - name: HOST + value: 0.0.0.0 + - name: PORT + value: 8001 + - name: MODEL_PATH + value: /model/ggml-small.bin + image: quay.io/redhat-et/locallm-whisper-service:latest + name: whisper-model-service + ports: + - containerPort: 8001 + hostPort: 8001 + securityContext: + runAsNonRoot: true + volumeMounts: + - name: model-file + mountPath: /model + volumes: + - name: model-file + emptyDir: {} diff --git a/whisper-playground/Containerfile b/model_servers/whispercpp/Containerfile similarity index 100% rename from whisper-playground/Containerfile rename to model_servers/whispercpp/Containerfile diff --git a/model_servers/whispercpp/README.md b/model_servers/whispercpp/README.md new file mode 100644 index 00000000..f38c6839 --- /dev/null +++ b/model_servers/whispercpp/README.md @@ -0,0 +1,46 @@ +## Whisper + +Whisper models are useful for converting audio files to text. The sample application [audio-to-text](../audio-to-text/README.md) +describes how to run an inference application. This document describes how to build a service for a Whisper model. + +### Build model service + +To build a Whisper model service container image from this directory, + +```bash +podman build -t whisper:image . +``` + +### Download Whisper model + +You can to download the model from HuggingFace. There are various Whisper models available which vary in size and can be found +[here](https://huggingface.co/ggerganov/whisper.cpp). We will be using the `small` model which is about 466 MB. + +- **small** + - Download URL: [https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin](https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin) + +```bash +cd ../models +wget --no-config --quiet --show-progress -O ggml-small.bin +cd ../ +``` + +### Deploy Model Service + +Deploy the LLM and volume mount the model of choice. +Here, we are mounting the `ggml-small.bin` model as downloaded from above. + +```bash +# Note: the :Z may need to be omitted from the model volume mount if not running on Linux + +podman run --rm -it \ + -p 8001:8001 \ + -v /local/path/to/locallm/models/ggml-small.bin:/models/ggml-small.bin:Z,ro \ + -e HOST=0.0.0.0 \ + -e MODEL_PATH=/models/ggml-small.bin \ + -e PORT=8001 \ + whisper:image +``` + +By default, a sample `jfk.wav` file is included in the whisper image. This can be used to test with. +The environment variable `AUDIO_FILE`, can be passed with your own audio file to override the default `/app/jfk.wav` file within the whisper image. diff --git a/model_servers/whispercpp/run.sh b/model_servers/whispercpp/run.sh new file mode 100644 index 00000000..7e640b76 --- /dev/null +++ b/model_servers/whispercpp/run.sh @@ -0,0 +1,4 @@ +#! bin/bash + +./server -tr --model ${MODEL_PATH} --host ${HOST:=0.0.0.0} --port ${PORT:=8001} + diff --git a/models/Containerfile b/models/Containerfile index 981c85f4..e359bf7c 100644 --- a/models/Containerfile +++ b/models/Containerfile @@ -1,6 +1,7 @@ #https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_S.gguf #https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_S.gguf #https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF/resolve/main/codellama-7b-instruct.Q4_K_M.gguf +#https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin # podman build --build-arg MODEL_URL=https://... -t quay.io/yourimage . FROM registry.access.redhat.com/ubi9/ubi-micro:9.3-13 ARG MODEL_URL diff --git a/playground/README.md b/playground/README.md index 556c98a6..cdc8caae 100644 --- a/playground/README.md +++ b/playground/README.md @@ -69,4 +69,4 @@ podman run --rm -it -d \ -v Local/path/to/locallm/models:/locallm/models:ro,Z \ -e CONFIG_PATH=models/ \ playground:image -``` \ No newline at end of file +``` diff --git a/whisper-playground/README.md b/whisper-playground/README.md deleted file mode 100644 index 519b6d6b..00000000 --- a/whisper-playground/README.md +++ /dev/null @@ -1,77 +0,0 @@ -### Pre-Requisites - -If you are using an Apple MacBook M-series laptop, you will probably need to do the following configurations: - -* `brew tap cfergeau/crc` -* `brew install vfkit` -* `export CONTAINERS_MACHINE_PROVIDER=applehv` -* Edit your `/Users//.config/containers/containers.conf` file to include: -```bash -[machine] -provider = "applehv" -``` -* Ensure you have enough resources on your Podman machine. Recommended to have atleast `CPU: 8, Memory: 10 GB` - -### Build Model Service - -From this directory, - -```bash -podman build -t whisper:image . -``` - -### Download Model - -We need to download the model from HuggingFace. There are various Whisper models available which vary in size and can be found [here](https://huggingface.co/ggerganov/whisper.cpp). We will be using the `small` model which is about 466 MB. - -- **small** - - Download URL: [https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin](https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin) - -```bash -cd ../models -wget --no-config --quiet --show-progress -O ggml-small.bin -cd ../ -``` - -### Download audio files - -Whisper.cpp requires as an input 16-bit WAV audio files. -By default, a sample `jfk.wav` file is included in the whisper image. This can be used to test with. -To convert your input audio files to 16-bit WAV format you can use `ffmpeg` like this: - -```bash -ffmpeg -i -ar 16000 -ac 1 -c:a pcm_s16le -``` - -The environment variable `AUDIO_FILE`, can be passed with your own audio file to override the default `/app/jfk.wav` file within the whisper image. - -### Deploy Model Service - -Deploy the LLM and volume mount the model of choice. -Here, we are mounting the `ggml-small.bin` model as downloaded from above. - -```bash -podman run --rm -it \ - -p 8001:8001 \ - -v /local/path/to/locallm/models/ggml-small.bin:/models/ggml-small.bin:Z,ro \ - -e HOST=0.0.0.0 \ - -e PORT=8001 \ - whisper:image -``` - -### Build and run the client application - -We will use Streamlit to create a front end application with which you can interact with the Whisper model through a simple UI. - -```bash -podman build -t whisper_client whisper-playground/client -``` - -```bash -podman run -p 8501:8501 -e MODEL_ENDPOINT=http://0.0.0.0:8000/inference whisper_client -``` -Once the streamlit application is up and running, you should be able to access it at `http://localhost:8501`. From here, you can upload audio files from your local machine and translate the audio files as shown below. - -

- -

diff --git a/whisper-playground/run.sh b/whisper-playground/run.sh deleted file mode 100644 index 7fcd0a91..00000000 --- a/whisper-playground/run.sh +++ /dev/null @@ -1,3 +0,0 @@ -#! bin/bash - -./server -tr -m /models/ggml-small.bin --host ${HOST:=0.0.0.0} --port ${PORT:=8001} \ No newline at end of file From 205c9d3ff6ab2e7e004da71a0f7137c2a848861e Mon Sep 17 00:00:00 2001 From: Sally O'Malley Date: Fri, 22 Mar 2024 14:49:38 -0400 Subject: [PATCH 2/6] Update audio-to-text/README.md Co-authored-by: Michael Clifford --- audio-to-text/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/audio-to-text/README.md b/audio-to-text/README.md index 5fc43b34..fc73ed9b 100644 --- a/audio-to-text/README.md +++ b/audio-to-text/README.md @@ -29,7 +29,7 @@ In order to build this application we will need a model, a Model Service and an ### Download a model If you are just getting started, we recommend using [ggerganov/whisper.cpp](https://huggingface.co/ggerganov/whisper.cpp). -This is a well performant mid-sized model with an apache-2.0 license. +This is a well performant model with an MIT license. It's simple to download a pre-converted whisper model from [huggingface.co](https://huggingface.co) here: https://huggingface.co/ggerganov/whisper.cpp. There are a number of options, but we recommend to start with `ggml-small.bin`. From 4369d01a1dbf31922399142b94b7803d3f4b1407 Mon Sep 17 00:00:00 2001 From: Sally O'Malley Date: Fri, 22 Mar 2024 14:50:33 -0400 Subject: [PATCH 3/6] Update audio-to-text/README.md Co-authored-by: Michael Clifford --- audio-to-text/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/audio-to-text/README.md b/audio-to-text/README.md index fc73ed9b..12d0ae95 100644 --- a/audio-to-text/README.md +++ b/audio-to-text/README.md @@ -79,7 +79,7 @@ podman build -t audio-to-text . -f builds/Containerfile Make sure the Model Service is up and running before starting this container image. When starting the AI Application container image we need to direct it to the correct `MODEL_SERVICE_ENDPOINT`. -This could be any appropriately hosted Model Service (running locally or in the cloud) using an OpenAI compatible API. +This could be any appropriately hosted Model Service (running locally or in the cloud) using a compatible API. The following podman command can be used to run your AI Application: ```bash From da9dc79738cd1f03318f9f3b7360c0d871848e06 Mon Sep 17 00:00:00 2001 From: Sally O'Malley Date: Fri, 22 Mar 2024 14:50:53 -0400 Subject: [PATCH 4/6] Update audio-to-text/README.md Co-authored-by: Michael Clifford --- audio-to-text/README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/audio-to-text/README.md b/audio-to-text/README.md index 12d0ae95..920e9c0d 100644 --- a/audio-to-text/README.md +++ b/audio-to-text/README.md @@ -91,7 +91,6 @@ podman run --rm -it -p 8501:8501 -e MODEL_SERVICE_ENDPOINT=http://0.0.0.0:8001/i Once the streamlit application is up and running, you should be able to access it at `http://localhost:8501`. From here, you can upload audio files from your local machine and translate the audio files as shown below. -Everything should now be up an running with the chat application available at [`http://localhost:8501`](http://localhost:8501). By using this recipe and getting this starting point established, users should now have an easier time customizing and building their own LLM enabled chatbot applications. From d0a85fd2a1966448caaacbb530cbe764b713e783 Mon Sep 17 00:00:00 2001 From: Sally O'Malley Date: Fri, 22 Mar 2024 14:51:07 -0400 Subject: [PATCH 5/6] Update audio-to-text/README.md Co-authored-by: Michael Clifford --- audio-to-text/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/audio-to-text/README.md b/audio-to-text/README.md index 920e9c0d..43a1d301 100644 --- a/audio-to-text/README.md +++ b/audio-to-text/README.md @@ -92,7 +92,7 @@ Once the streamlit application is up and running, you should be able to access i From here, you can upload audio files from your local machine and translate the audio files as shown below. By using this recipe and getting this starting point established, -users should now have an easier time customizing and building their own LLM enabled chatbot applications. +users should now have an easier time customizing and building their own LLM enabled applications. #### Input audio files From 664d32358082f9665e483e9e3fa0a8b3fc75facb Mon Sep 17 00:00:00 2001 From: Sally O'Malley Date: Fri, 22 Mar 2024 14:55:02 -0400 Subject: [PATCH 6/6] Update README.md --- audio-to-text/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/audio-to-text/README.md b/audio-to-text/README.md index 43a1d301..4e677f95 100644 --- a/audio-to-text/README.md +++ b/audio-to-text/README.md @@ -5,10 +5,10 @@ audio-to-text applications. It consists of two main components; the Model Service and the AI Application. There are a few options today for local Model Serving, but this recipe will use [`whisper-cpp`](https://github.com/ggerganov/whisper.cpp.git) - and their OpenAI compatible Model Service. There is a Containerfile provided that can be used to build this Model Service within the repo, + and its included Model Service. There is a Containerfile provided that can be used to build this Model Service within the repo, [`model_servers/whispercpp/Containerfile`](/model_servers/whispercpp/Containerfile). - Our AI Application will connect to our Model Service via it's OpenAI compatible API. + Our AI Application will connect to our Model Service via it's API endpoint.