Support the Whisper model in onnxruntime-genai #699

baijumeswani · 2024-07-15T17:24:26Z

Many changes in this PR are from @RyanUnderhill's whisper branch.

This pull-request introduces support for running the openai/whisper model. In particular, this pull-request introduces the following:

Whisper preprocessing: Given an audio file, the preprocessing stage creates the mel spectrogram and the decoder input ids that are needed for the model execution.
Whisper model execution: Given the log mel spectrogram and the decoder input ids, the whisper onnx models can be executed. The model execution is split into three phases
- EncoderDecoderInit: The corresponding ONNX model for this phase contains the Encoder and the Decoder with the Attention operator. This model is run on the first token generation only.
- DecoderInit: This phase manages the transition from first token generation to the second token generation. i.e. From managing the inputs/outputs of the EncoderDecoderInit ONNX models to managing the inputs/outputs of the Decoder ONNX model.
- Decoder: This phase manages all the remaining token generation steps. The corresponding ONNX model contains only the decoder logic with the DecoderMaskedMultiHeadAttention operator.
The model execution also manages the outputs of the model: logits and optionally the cross_qk buffers required for computing the word level timestamps.
The Python, C, C++ APIs for loading audios, preprocessing the audio files, and executing the whisper model.

With this pull-request, onnxruntime-genai can execute the openai/whisper model on CPU (fp32) and CUDA (fp16 and fp32) EPs with batch_size >= 1 and beam_size >= 1

Changes required that are not part of this pull-request:

Making the cross_qk buffers available as outputs for the CPU EP.
Refining the user API.
Add C# API to load multiple audio files for batch size > 1.
Add C# example.
Splitting the phases into two Encoder and Decoder to avoid the decoder weight duplication in EncoderDecoderInit.
An end-to-end example with word-level timestamps

examples/python/whisper.py

src/models/whisper.cpp

src/python/python.cpp

…into baijumeswani/whisper

src/models/audio_processor.cpp

…into baijumeswani/whisper

src/ort_genai_c.h

src/models/model.h

yufenglee · 2024-09-13T00:49:55Z

we need to add some tests in following PR.

src/models/whisper.cpp

yufenglee · 2024-09-13T01:24:42Z

and add a description on how to create the model

In reply to: 2347714980

…into baijumeswani/whisper

src/models/kernels.cu

src/models/audio_processor.cpp

Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>

baijumeswani force-pushed the baijumeswani/whisper branch 3 times, most recently from 37c5310 to aea7898 Compare July 24, 2024 23:49

baijumeswani force-pushed the baijumeswani/whisper branch from a8f62e9 to 82b7148 Compare August 1, 2024 20:20

github-advanced-security bot found potential problems Aug 1, 2024

View reviewed changes

examples/python/whisper.py Fixed Show fixed Hide fixed

baijumeswani and others added 14 commits August 8, 2024 20:46

Port over changes from ryanunderhill/whisper

b04140b

USE_CUDA

81cfe63

format using clang-format 17

0cf6bd9

Preprocessing whisper

a46476b

Add C, C++ API and example

14b1a2f

C/C++ example

3b8d4e9

Remove encode session info

7bd80c4

Rebase conflicts

1c2cfea

get decoder prompt ids

e080b81

Use startoftranscript token

203453b

Add FP16 CUDA DMMHA support for Whisper

3ea019f

Clean up commented out code

aa26456

Enable buffer sharing for Whisper

0eb2c44

Fix cuda fp32 and cpu fp32 breaks and address pipeline failures

8fef96a

baijumeswani force-pushed the baijumeswani/whisper branch from 17a70ae to 8fef96a Compare August 8, 2024 20:47

baijumeswani and others added 10 commits August 8, 2024 21:58

Address failing pipelines

73f6c64

Support batch size > 1 for cuda fp16

5f41155

Hide behind cuda ifdef

6c749a1

Make beam indices available on num_beams=1

1714e99

Add support for outputting cross QK values

12f093d

Remove commented out code

b9a51d9

Remove unused pointer

cb90aa5

Only run cross QK kernels when alignment heads are provided

e8598b3

Prepend startoftranscript token id

3a0454f

Add check for cache indirection

cb8f49d

kunal-vaishnavi reviewed Sep 4, 2024

View reviewed changes

src/models/whisper.cpp Show resolved Hide resolved

kunal-vaishnavi reviewed Sep 4, 2024

View reviewed changes

src/python/python.cpp Show resolved Hide resolved

Address pull-reques review comments and pipeline failures

a12c8db

baijumeswani requested a review from a team as a code owner September 4, 2024 19:41

baijumeswani added 2 commits September 4, 2024 14:45

Merge branch 'main' of https://github.com/microsoft/onnxruntime-genai …

16d8cd2

…into baijumeswani/whisper

Put the kernel consts in a struct of its own

4f97290

yufenglee reviewed Sep 6, 2024

View reviewed changes

src/models/audio_processor.cpp Outdated Show resolved Hide resolved

yufenglee reviewed Sep 6, 2024

View reviewed changes

src/models/audio_processor.cpp Outdated Show resolved Hide resolved

RyanUnderhill approved these changes Sep 9, 2024

View reviewed changes

baijumeswani added 3 commits September 11, 2024 17:33

Separate out encoder and decoder inputs for whisper

e08a025

Merge branch 'main' of https://github.com/microsoft/onnxruntime-genai …

f937c15

…into baijumeswani/whisper

Support loading multiple audio files

e06c46f

yufenglee reviewed Sep 11, 2024

View reviewed changes

src/ort_genai_c.h Show resolved Hide resolved

Address pull-request review comments

a3ddc38

yufenglee reviewed Sep 13, 2024

View reviewed changes

src/models/model.h Outdated Show resolved Hide resolved

yufenglee reviewed Sep 13, 2024

View reviewed changes

src/models/whisper.cpp Show resolved Hide resolved

baijumeswani added 2 commits September 16, 2024 21:26

Merge branch 'main' of https://github.com/microsoft/onnxruntime-genai …

028a792

…into baijumeswani/whisper

Address PR review comments

c4f4ed3

yufenglee approved these changes Sep 16, 2024

View reviewed changes

kunal-vaishnavi reviewed Sep 16, 2024

View reviewed changes

src/models/kernels.cu Outdated Show resolved Hide resolved

kunal-vaishnavi reviewed Sep 16, 2024

View reviewed changes

src/models/audio_processor.cpp Outdated Show resolved Hide resolved

baijumeswani and others added 2 commits September 16, 2024 15:05

Update src/models/kernels.cu

b775ee6

Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>

Update src/models/audio_processor.cpp

b6a58a4

Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>

kunal-vaishnavi approved these changes Sep 16, 2024

View reviewed changes

baijumeswani added 2 commits September 16, 2024 22:17

Add a note on cross_qk

2255ab3

lint

860384d

baijumeswani merged commit 01f259f into main Sep 17, 2024
13 checks passed

baijumeswani deleted the baijumeswani/whisper branch September 17, 2024 00:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support the Whisper model in onnxruntime-genai #699

Support the Whisper model in onnxruntime-genai #699

baijumeswani commented Jul 15, 2024 •

edited

Loading

yufenglee commented Sep 13, 2024

yufenglee commented Sep 13, 2024

Support the Whisper model in onnxruntime-genai #699

Support the Whisper model in onnxruntime-genai #699

Conversation

baijumeswani commented Jul 15, 2024 • edited Loading

yufenglee commented Sep 13, 2024

yufenglee commented Sep 13, 2024

baijumeswani commented Jul 15, 2024 •

edited

Loading