-
Notifications
You must be signed in to change notification settings - Fork 798
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add genai notebook for whisper scenario
- Loading branch information
Showing
5 changed files
with
1,302 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -975,6 +975,7 @@ wikitext | |
WIKITQ | ||
Wofk | ||
WTQ | ||
WhisperPipeline | ||
wuerstchen | ||
WuerstchenDiffNeXt | ||
Würstchen | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# Automatic speech recognition using Whisper and OpenVINO with Generate API | ||
|
||
[Whisper](https://openai.com/index/whisper/) is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. | ||
|
||
In this tutorial, we consider how to run Whisper using OpenVINO with Generate API. We will use the pre-trained model from the [Hugging Face Transformers](https://github.com/openvinotoolkit/openvino.genai) library. The [Hugging Face Optimum Intel](https://huggingface.co/docs/optimum/intel/index) library converts the models to OpenVINO™ IR format. To simplify the user experience, we will use [OpenVINO Generate API](https://github.com/openvinotoolkit/openvino.genai) for [Whisper automatic speech recognition scenarios](https://github.com/openvinotoolkit/openvino.genai/blob/master/samples/python/whisper_speech_recognition/README.md). | ||
|
||
## Notebook Contents | ||
|
||
This notebook demonstrates how to perform automatic speech recognition (ASR) using the Whisper model and OpenVINO. | ||
|
||
The tutorial consists of following steps: | ||
1. Download PyTorch model | ||
2. Run PyTorch model inference | ||
3. Convert the model using OpenVINO Integration with HuggingFace Optimum. | ||
4. Run the model using Generate API. | ||
5. Compare the performance of PyTorch and the OpenVINO model. | ||
6. Launch an interactive demo for speech recognition | ||
|
||
|
||
## Installation Instructions | ||
|
||
This is a self-contained example that relies solely on its code.</br> | ||
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. | ||
For details, please refer to [Installation Guide](../../README.md). | ||
<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/whisper-asr-genai/README.md" /> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
from pathlib import Path | ||
from typing import Callable | ||
import gradio as gr | ||
import requests | ||
|
||
audio_example_path = Path("example_1.wav") | ||
|
||
if not audio_example_path.exists(): | ||
r = requests.get("https://huggingface.co/spaces/distil-whisper/whisper-vs-distil-whisper/resolve/main/assets/example_1.wav") | ||
with open(audio_example_path, "wb") as f: | ||
f.write(r.content) | ||
|
||
|
||
def make_demo(fn: Callable, multilingual=True): | ||
with gr.Blocks() as demo: | ||
gr.HTML( | ||
""" | ||
<div style="text-align: center; max-width: 700px; margin: 0 auto;"> | ||
<div | ||
style=" | ||
display: inline-flex; align-items: center; gap: 0.8rem; font-size: 1.75rem; | ||
" | ||
> | ||
<h1 style="font-weight: 900; margin-bottom: 7px; line-height: normal;"> | ||
OpenVINO Generate API Whisper demo | ||
</h1> | ||
</div> | ||
</div> | ||
""" | ||
) | ||
audio = gr.components.Audio(type="filepath", label="Audio input") | ||
language = gr.components.Textbox( | ||
label="Language.", | ||
info="List of avalible language you can find in generation_config.lang_to_id dictionary. Example: <|en|>. 'auto' or empty string will mean autodetection", | ||
value="auto", | ||
) | ||
with gr.Row(): | ||
button_transcribe = gr.Button("Transcribe") | ||
button_translate = gr.Button("Translate", visible=multilingual) | ||
with gr.Row(): | ||
infer_time = gr.components.Textbox(label="OpenVINO Whisper Generation Time (s)") | ||
with gr.Row(): | ||
result = gr.components.Textbox(label="OpenVINO Whisper Result", show_copy_button=True) | ||
button_transcribe.click( | ||
fn=fn, | ||
inputs=[audio, button_transcribe, language], | ||
outputs=[result, infer_time], | ||
) | ||
button_translate.click( | ||
fn=fn, | ||
inputs=[audio, button_translate, language], | ||
outputs=[result, infer_time], | ||
) | ||
gr.Markdown("## Examples") | ||
gr.Examples( | ||
[[str(audio_example_path), "<|en|>"]], | ||
inputs=[audio, language], | ||
outputs=[result, infer_time], | ||
fn=fn, | ||
cache_examples=False, | ||
) | ||
|
||
return demo |
1,206 changes: 1,206 additions & 0 deletions
1,206
notebooks/whisper-asr-genai/whisper-asr-genai.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.