-
-
Notifications
You must be signed in to change notification settings - Fork 766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast/low compute speaker diarization #1494
Comments
Thank you for your issue. Feel free to close this issue if you found an answer in the FAQ. If your issue is a feature request, please read this first and update your request accordingly, if needed. If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:
Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users). Companies relying on
|
If you are looking for streaming diarization, you might want to have a look at @juanmc2005's diart toolkit which is (in part) based on |
Will take a look, thanks. |
I'm trying to add diarization to this repo https://github.com/collabora/WhisperLive, which has transcription and also runs a VAD model before passing audio data to the transcriber. I have it working, however, the VAD model and the diarization model both run on the CPU so they slow down each other. This greatly affects the quality of the transcription results and also slows down the transcriptions so they are no longer realtime. I was wondering if there is some way to speed things up. I was thinking of storing speaker embeddings and only processing the last n seconds, for example. Right now, I am processing the whole audio stream everytime so it will become slower as time goes on. Any suggestions are appreciated.
The text was updated successfully, but these errors were encountered: