-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support the Whisper model in onnxruntime-genai #699
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
baijumeswani
force-pushed
the
baijumeswani/whisper
branch
3 times, most recently
from
July 24, 2024 23:49
37c5310
to
aea7898
Compare
baijumeswani
force-pushed
the
baijumeswani/whisper
branch
from
August 1, 2024 20:20
a8f62e9
to
82b7148
Compare
baijumeswani
force-pushed
the
baijumeswani/whisper
branch
from
August 8, 2024 20:47
17a70ae
to
8fef96a
Compare
…into baijumeswani/whisper
yufenglee
reviewed
Sep 6, 2024
yufenglee
reviewed
Sep 6, 2024
RyanUnderhill
approved these changes
Sep 9, 2024
yufenglee
reviewed
Sep 11, 2024
yufenglee
reviewed
Sep 13, 2024
we need to add some tests in following PR. |
yufenglee
reviewed
Sep 13, 2024
and add a description on how to create the model In reply to: 2347714980 |
…into baijumeswani/whisper
yufenglee
approved these changes
Sep 16, 2024
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
kunal-vaishnavi
approved these changes
Sep 16, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Many changes in this PR are from @RyanUnderhill's whisper branch.
This pull-request introduces support for running the openai/whisper model. In particular, this pull-request introduces the following:
Whisper preprocessing: Given an audio file, the preprocessing stage creates the mel spectrogram and the decoder input ids that are needed for the model execution.
Whisper model execution: Given the log mel spectrogram and the decoder input ids, the whisper onnx models can be executed. The model execution is split into three phases
EncoderDecoderInit
: The corresponding ONNX model for this phase contains the Encoder and the Decoder with theAttention
operator. This model is run on the first token generation only.DecoderInit
: This phase manages the transition from first token generation to the second token generation. i.e. From managing the inputs/outputs of the EncoderDecoderInit ONNX models to managing the inputs/outputs of the Decoder ONNX model.Decoder
: This phase manages all the remaining token generation steps. The corresponding ONNX model contains only the decoder logic with theDecoderMaskedMultiHeadAttention
operator.The model execution also manages the outputs of the model:
logits
and optionally thecross_qk
buffers required for computing the word level timestamps.The Python, C, C++ APIs for loading audios, preprocessing the audio files, and executing the whisper model.
With this pull-request, onnxruntime-genai can execute the openai/whisper model on CPU (fp32) and CUDA (fp16 and fp32) EPs with batch_size >= 1 and beam_size >= 1
Changes required that are not part of this pull-request:
cross_qk
buffers available as outputs for the CPU EP.