Change whisper-subtitles-generation.ipynb to genai pipeline

openvinotoolkit · Oct 3, 2024 · 1e270f1 · 1e270f1
1 parent af144be
commit 1e270f1
Show file tree

Hide file tree

Showing 3 changed files with 224 additions and 603 deletions.
diff --git a/notebooks/whisper-asr-genai/whisper-asr-genai.ipynb b/notebooks/whisper-asr-genai/whisper-asr-genai.ipynb
@@ -942,8 +942,7 @@
     "+model = OVModelForSpeechSeq2Seq.from_pretrained(model_id, export=True)\n",
     "```\n",
     "\n",
-    "Like the original PyTorch model, the OpenVINO model is also compatible with HuggingFace [pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) interface for `automatic-speech-recognition`. \n",
-    "Pipeline can be used for long audio transcription. Distil-Whisper uses a chunked algorithm to transcribe long-form audio files. In practice, this chunked long-form algorithm is 9x faster than the sequential algorithm proposed by OpenAI in the Whisper paper. To enable chunking, pass the chunk_length_s parameter to the pipeline. For Distil-Whisper, a chunk length of 15 seconds is optimal. To activate batching, pass the argument batch_size."
+    "Like the original PyTorch model, the OpenVINO model is also compatible with HuggingFace [pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) interface for `automatic-speech-recognition`. "
    ]
   },
   {
@@ -1049,14 +1048,6 @@
     "from datasets import load_dataset\n",
     "from tqdm.notebook import tqdm\n",
     "\n",
-    "def extract_input_features(sample):\n",
-    "    input_features = processor(\n",
-    "        sample[\"audio\"][\"array\"],\n",
-    "        sampling_rate=sample[\"audio\"][\"sampling_rate\"],\n",
-    "        return_tensors=\"pt\",\n",
-    "    ).input_features\n",
-    "    return input_features\n",
-    "\n",
     "\n",
     "\n",
     "CALIBRATION_DATASET_SIZE = 30\n",

diff --git a/notebooks/whisper-subtitles-generation/README.md b/notebooks/whisper-subtitles-generation/README.md
@@ -1,20 +1,20 @@
-# Video Subtitle Generation with OpenAI Whisper
+# Video Subtitle Generation with OpenAI Whisper and OpenVINO Generate API
 [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/whisper-subtitles-generation/whisper-subtitles-generation.ipynb)
 [Whisper](https://openai.com/index/whisper/) is a general-purpose speech recognition model from [OpenAI](https://openai.com). The model is able to almost flawlessly transcribe speech across dozens of languages and even handle poor audio quality or excessive background noise.
-This notebook will run the model with OpenVINO to generate transcription of a video.
+This notebook will run the model with OpenVINO Generate API to generate transcription of a video.
 
 ## Notebook Contents
 
 This notebook demonstrates how to generate video subtitles using the open-source Whisper model. Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. It is a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.
 You can find more information about this model in the [research paper](https://cdn.openai.com/papers/whisper.pdf), [OpenAI blog](https://openai.com/index/whisper/), [model card](https://github.com/openai/whisper/blob/main/model-card.md) and GitHub [repository](https://github.com/openai/whisper).
 
-This folder contains notebook that show how to convert and quantize model with OpenVINO. We will use [NNCF](https://github.com/openvinotoolkit/nncf) improving model performance by INT8 quantization.
+This folder contains notebook that show how to convert and quantize model with OpenVINO and run pipeline with [Generate API](https://github.com/openvinotoolkit/openvino.genai). We will use [NNCF](https://github.com/openvinotoolkit/nncf) improving model performance by INT8 quantization.
 
 The notebook contains the following steps:
 1. Download the model.
 2. Instantiate original PyTorch model pipeline.
 3. Convert model to OpenVINO IR, using model conversion API.
-4. Run the Whisper pipeline with OpenVINO.
+4. Run the Whisper pipeline with OpenVINO Generate API.
 5. Quantize the OpenVINO model with NNCF.
 6. Check quantized model result for the demo video.
 7. Compare model size, performance and accuracy of FP32 and quantized INT8 models.