Skip to content

Commit

Permalink
Change whisper-subtitles-generation.ipynb to genai pipeline
Browse files Browse the repository at this point in the history
  • Loading branch information
sbalandi committed Oct 3, 2024
1 parent af144be commit 1e270f1
Show file tree
Hide file tree
Showing 3 changed files with 224 additions and 603 deletions.
11 changes: 1 addition & 10 deletions notebooks/whisper-asr-genai/whisper-asr-genai.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -942,8 +942,7 @@
"+model = OVModelForSpeechSeq2Seq.from_pretrained(model_id, export=True)\n",
"```\n",
"\n",
"Like the original PyTorch model, the OpenVINO model is also compatible with HuggingFace [pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) interface for `automatic-speech-recognition`. \n",
"Pipeline can be used for long audio transcription. Distil-Whisper uses a chunked algorithm to transcribe long-form audio files. In practice, this chunked long-form algorithm is 9x faster than the sequential algorithm proposed by OpenAI in the Whisper paper. To enable chunking, pass the chunk_length_s parameter to the pipeline. For Distil-Whisper, a chunk length of 15 seconds is optimal. To activate batching, pass the argument batch_size."
"Like the original PyTorch model, the OpenVINO model is also compatible with HuggingFace [pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) interface for `automatic-speech-recognition`. "
]
},
{
Expand Down Expand Up @@ -1049,14 +1048,6 @@
"from datasets import load_dataset\n",
"from tqdm.notebook import tqdm\n",
"\n",
"def extract_input_features(sample):\n",
" input_features = processor(\n",
" sample[\"audio\"][\"array\"],\n",
" sampling_rate=sample[\"audio\"][\"sampling_rate\"],\n",
" return_tensors=\"pt\",\n",
" ).input_features\n",
" return input_features\n",
"\n",
"\n",
"\n",
"CALIBRATION_DATASET_SIZE = 30\n",
Expand Down
8 changes: 4 additions & 4 deletions notebooks/whisper-subtitles-generation/README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
# Video Subtitle Generation with OpenAI Whisper
# Video Subtitle Generation with OpenAI Whisper and OpenVINO Generate API
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/whisper-subtitles-generation/whisper-subtitles-generation.ipynb)
[Whisper](https://openai.com/index/whisper/) is a general-purpose speech recognition model from [OpenAI](https://openai.com). The model is able to almost flawlessly transcribe speech across dozens of languages and even handle poor audio quality or excessive background noise.
This notebook will run the model with OpenVINO to generate transcription of a video.
This notebook will run the model with OpenVINO Generate API to generate transcription of a video.

## Notebook Contents

This notebook demonstrates how to generate video subtitles using the open-source Whisper model. Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. It is a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.
You can find more information about this model in the [research paper](https://cdn.openai.com/papers/whisper.pdf), [OpenAI blog](https://openai.com/index/whisper/), [model card](https://github.com/openai/whisper/blob/main/model-card.md) and GitHub [repository](https://github.com/openai/whisper).

This folder contains notebook that show how to convert and quantize model with OpenVINO. We will use [NNCF](https://github.com/openvinotoolkit/nncf) improving model performance by INT8 quantization.
This folder contains notebook that show how to convert and quantize model with OpenVINO and run pipeline with [Generate API](https://github.com/openvinotoolkit/openvino.genai). We will use [NNCF](https://github.com/openvinotoolkit/nncf) improving model performance by INT8 quantization.

The notebook contains the following steps:
1. Download the model.
2. Instantiate original PyTorch model pipeline.
3. Convert model to OpenVINO IR, using model conversion API.
4. Run the Whisper pipeline with OpenVINO.
4. Run the Whisper pipeline with OpenVINO Generate API.
5. Quantize the OpenVINO model with NNCF.
6. Check quantized model result for the demo video.
7. Compare model size, performance and accuracy of FP32 and quantized INT8 models.
Expand Down
Loading

0 comments on commit 1e270f1

Please sign in to comment.