Add genai notebook for whisper scenario

openvinotoolkit · Sep 25, 2024 · a905cb0 · a905cb0
1 parent 9d0ef18
commit a905cb0
Show file tree

Hide file tree

Showing 5 changed files with 1,302 additions and 0 deletions.
diff --git a/.ci/skipped_notebooks.yml b/.ci/skipped_notebooks.yml
@@ -585,3 +585,10 @@
     - os:
         - macos-12
         - windows-2019
+- notebook: notebooks/whisper-asr-genai/whisper-asr-genai.ipynb
+  skips:
+    - python:
+        - '3.8'
+        - '3.9'
+    - os:
+        - macos-12
diff --git a/.ci/spellcheck/.pyspelling.wordlist.txt b/.ci/spellcheck/.pyspelling.wordlist.txt
@@ -975,6 +975,7 @@ wikitext
 WIKITQ
 Wofk
 WTQ
+WhisperPipeline
 wuerstchen
 WuerstchenDiffNeXt
 Würstchen

diff --git a/notebooks/whisper-asr-genai/README.md b/notebooks/whisper-asr-genai/README.md
@@ -0,0 +1,25 @@
+# Automatic speech recognition using Whisper and OpenVINO with Generate API
+
+[Whisper](https://openai.com/index/whisper/) is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web.
+
+In this tutorial, we consider how to run Whisper using OpenVINO with Generate API. We will use the pre-trained model from the [Hugging Face Transformers](https://github.com/openvinotoolkit/openvino.genai) library. The [Hugging Face Optimum Intel](https://huggingface.co/docs/optimum/intel/index) library converts the models to OpenVINO™ IR format. To simplify the user experience, we will use [OpenVINO Generate API](https://github.com/openvinotoolkit/openvino.genai) for [Whisper automatic speech recognition scenarios](https://github.com/openvinotoolkit/openvino.genai/blob/master/samples/python/whisper_speech_recognition/README.md).
+
+## Notebook Contents
+
+This notebook demonstrates how to perform automatic speech recognition (ASR) using the Whisper model and OpenVINO.
+
+The tutorial consists of following steps:
+1. Download PyTorch model
+2. Run PyTorch model inference
+3. Convert the model using OpenVINO Integration with HuggingFace Optimum.
+4. Run the model using Generate API.
+5. Compare the performance of PyTorch and the OpenVINO model.
+6. Launch an interactive demo for speech recognition
+
+
+## Installation Instructions
+
+This is a self-contained example that relies solely on its code.</br>
+We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
+For details, please refer to [Installation Guide](../../README.md).
+<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/whisper-asr-genai/README.md" />
diff --git a/notebooks/whisper-asr-genai/gradio_helper.py b/notebooks/whisper-asr-genai/gradio_helper.py
@@ -0,0 +1,63 @@
+from pathlib import Path
+from typing import Callable
+import gradio as gr
+import requests
+
+audio_example_path = Path("example_1.wav")
+
+if not audio_example_path.exists():
+    r = requests.get("https://huggingface.co/spaces/distil-whisper/whisper-vs-distil-whisper/resolve/main/assets/example_1.wav")
+    with open(audio_example_path, "wb") as f:
+        f.write(r.content)
+
+
+def make_demo(fn: Callable, multilingual=True):
+    with gr.Blocks() as demo:
+        gr.HTML(
+            """
+                    <div style="text-align: center; max-width: 700px; margin: 0 auto;">
+                    <div
+                        style="
+                        display: inline-flex; align-items: center; gap: 0.8rem; font-size: 1.75rem;
+                        "
+                    >
+                        <h1 style="font-weight: 900; margin-bottom: 7px; line-height: normal;">
+                        OpenVINO Generate API Whisper demo
+                        </h1>
+                    </div>
+                    </div>
+                """
+        )
+        audio = gr.components.Audio(type="filepath", label="Audio input")
+        language = gr.components.Textbox(
+            label="Language.",
+            info="List of avalible language you can find in generation_config.lang_to_id dictionary. Example: <|en|>. 'auto' or empty string will mean autodetection",
+            value="auto",
+        )
+        with gr.Row():
+            button_transcribe = gr.Button("Transcribe")
+            button_translate = gr.Button("Translate", visible=multilingual)
+        with gr.Row():
+            infer_time = gr.components.Textbox(label="OpenVINO Whisper Generation Time (s)")
+        with gr.Row():
+            result = gr.components.Textbox(label="OpenVINO Whisper Result", show_copy_button=True)
+        button_transcribe.click(
+            fn=fn,
+            inputs=[audio, button_transcribe, language],
+            outputs=[result, infer_time],
+        )
+        button_translate.click(
+            fn=fn,
+            inputs=[audio, button_translate, language],
+            outputs=[result, infer_time],
+        )
+        gr.Markdown("## Examples")
+        gr.Examples(
+            [[str(audio_example_path), "<|en|>"]],
+            inputs=[audio, language],
+            outputs=[result, infer_time],
+            fn=fn,
+            cache_examples=False,
+        )
+
+    return demo
diff --git a/notebooks/whisper-asr-genai/whisper-asr-genai.ipynb b/notebooks/whisper-asr-genai/whisper-asr-genai.ipynb