Skip to content

Commit

Permalink
Add genai notebook for whisper scenario
Browse files Browse the repository at this point in the history
  • Loading branch information
sbalandi committed Sep 25, 2024
1 parent 9d0ef18 commit a905cb0
Show file tree
Hide file tree
Showing 5 changed files with 1,302 additions and 0 deletions.
7 changes: 7 additions & 0 deletions .ci/skipped_notebooks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -585,3 +585,10 @@
- os:
- macos-12
- windows-2019
- notebook: notebooks/whisper-asr-genai/whisper-asr-genai.ipynb
skips:
- python:
- '3.8'
- '3.9'
- os:
- macos-12
1 change: 1 addition & 0 deletions .ci/spellcheck/.pyspelling.wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -975,6 +975,7 @@ wikitext
WIKITQ
Wofk
WTQ
WhisperPipeline
wuerstchen
WuerstchenDiffNeXt
Würstchen
Expand Down
25 changes: 25 additions & 0 deletions notebooks/whisper-asr-genai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Automatic speech recognition using Whisper and OpenVINO with Generate API

[Whisper](https://openai.com/index/whisper/) is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web.

In this tutorial, we consider how to run Whisper using OpenVINO with Generate API. We will use the pre-trained model from the [Hugging Face Transformers](https://github.com/openvinotoolkit/openvino.genai) library. The [Hugging Face Optimum Intel](https://huggingface.co/docs/optimum/intel/index) library converts the models to OpenVINO™ IR format. To simplify the user experience, we will use [OpenVINO Generate API](https://github.com/openvinotoolkit/openvino.genai) for [Whisper automatic speech recognition scenarios](https://github.com/openvinotoolkit/openvino.genai/blob/master/samples/python/whisper_speech_recognition/README.md).

## Notebook Contents

This notebook demonstrates how to perform automatic speech recognition (ASR) using the Whisper model and OpenVINO.

The tutorial consists of following steps:
1. Download PyTorch model
2. Run PyTorch model inference
3. Convert the model using OpenVINO Integration with HuggingFace Optimum.
4. Run the model using Generate API.
5. Compare the performance of PyTorch and the OpenVINO model.
6. Launch an interactive demo for speech recognition


## Installation Instructions

This is a self-contained example that relies solely on its code.</br>
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to [Installation Guide](../../README.md).
<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/whisper-asr-genai/README.md" />
63 changes: 63 additions & 0 deletions notebooks/whisper-asr-genai/gradio_helper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
from pathlib import Path
from typing import Callable
import gradio as gr
import requests

audio_example_path = Path("example_1.wav")

if not audio_example_path.exists():
r = requests.get("https://huggingface.co/spaces/distil-whisper/whisper-vs-distil-whisper/resolve/main/assets/example_1.wav")
with open(audio_example_path, "wb") as f:
f.write(r.content)


def make_demo(fn: Callable, multilingual=True):
with gr.Blocks() as demo:
gr.HTML(
"""
<div style="text-align: center; max-width: 700px; margin: 0 auto;">
<div
style="
display: inline-flex; align-items: center; gap: 0.8rem; font-size: 1.75rem;
"
>
<h1 style="font-weight: 900; margin-bottom: 7px; line-height: normal;">
OpenVINO Generate API Whisper demo
</h1>
</div>
</div>
"""
)
audio = gr.components.Audio(type="filepath", label="Audio input")
language = gr.components.Textbox(
label="Language.",
info="List of avalible language you can find in generation_config.lang_to_id dictionary. Example: <|en|>. 'auto' or empty string will mean autodetection",
value="auto",
)
with gr.Row():
button_transcribe = gr.Button("Transcribe")
button_translate = gr.Button("Translate", visible=multilingual)
with gr.Row():
infer_time = gr.components.Textbox(label="OpenVINO Whisper Generation Time (s)")
with gr.Row():
result = gr.components.Textbox(label="OpenVINO Whisper Result", show_copy_button=True)
button_transcribe.click(
fn=fn,
inputs=[audio, button_transcribe, language],
outputs=[result, infer_time],
)
button_translate.click(
fn=fn,
inputs=[audio, button_translate, language],
outputs=[result, infer_time],
)
gr.Markdown("## Examples")
gr.Examples(
[[str(audio_example_path), "<|en|>"]],
inputs=[audio, language],
outputs=[result, infer_time],
fn=fn,
cache_examples=False,
)

return demo
1,206 changes: 1,206 additions & 0 deletions notebooks/whisper-asr-genai/whisper-asr-genai.ipynb

Large diffs are not rendered by default.

0 comments on commit a905cb0

Please sign in to comment.