Skip to content

Commit

Permalink
Change whisper-subtitles-generation.ipynb to genai pipeline
Browse files Browse the repository at this point in the history
  • Loading branch information
sbalandi committed Oct 3, 2024
1 parent af144be commit dc2c6bc
Show file tree
Hide file tree
Showing 2 changed files with 221 additions and 591 deletions.
11 changes: 1 addition & 10 deletions notebooks/whisper-asr-genai/whisper-asr-genai.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -942,8 +942,7 @@
"+model = OVModelForSpeechSeq2Seq.from_pretrained(model_id, export=True)\n",
"```\n",
"\n",
"Like the original PyTorch model, the OpenVINO model is also compatible with HuggingFace [pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) interface for `automatic-speech-recognition`. \n",
"Pipeline can be used for long audio transcription. Distil-Whisper uses a chunked algorithm to transcribe long-form audio files. In practice, this chunked long-form algorithm is 9x faster than the sequential algorithm proposed by OpenAI in the Whisper paper. To enable chunking, pass the chunk_length_s parameter to the pipeline. For Distil-Whisper, a chunk length of 15 seconds is optimal. To activate batching, pass the argument batch_size."
"Like the original PyTorch model, the OpenVINO model is also compatible with HuggingFace [pipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) interface for `automatic-speech-recognition`. "
]
},
{
Expand Down Expand Up @@ -1049,14 +1048,6 @@
"from datasets import load_dataset\n",
"from tqdm.notebook import tqdm\n",
"\n",
"def extract_input_features(sample):\n",
" input_features = processor(\n",
" sample[\"audio\"][\"array\"],\n",
" sampling_rate=sample[\"audio\"][\"sampling_rate\"],\n",
" return_tensors=\"pt\",\n",
" ).input_features\n",
" return input_features\n",
"\n",
"\n",
"\n",
"CALIBRATION_DATASET_SIZE = 30\n",
Expand Down
Loading

0 comments on commit dc2c6bc

Please sign in to comment.