Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing embedding batch size in pyannote/speaker-diarization-3.0 #1486

Closed
landEpita opened this issue Oct 4, 2023 · 6 comments
Closed

Comments

@landEpita
Copy link

Hello,
I noticed that in version 3.0, the batch size for embeddings is set to 1. Is it possible to increase this batch size to speed up the inference? What other measures can be taken to accelerate the inference process? Thank you.

@github-actions
Copy link

github-actions bot commented Oct 4, 2023

Thank you for your issue.
We found the following entries in the FAQ which you may find helpful:

Feel free to close this issue if you found an answer in the FAQ.

If your issue is a feature request, please read this first and update your request accordingly, if needed.

If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:

  • installation
  • data preparation
  • model download
  • etc.

Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).

Companies relying on pyannote.audio in production may contact me via email regarding:

  • paid scientific consulting around speaker diarization and speech processing in general;
  • custom models and tailored features (via the local tech transfer office).

This is an automated reply, generated by FAQtory

@hbredin
Copy link
Member

hbredin commented Oct 4, 2023

The way pyannote/speaker-diarization-3.0 uses hbredin/wespeaker-voxceleb-resnet34-LM speaker embedding does not allow batching this part of the inference for now. So, setting embedding_batch_size to a higher value will not speed things up...

... unless you switch to pyannote/embedding or speechbrain/spkrec-ecapa-voxceleb

... in which case you would get slightly lower accuracy.

@hbredin hbredin changed the title Pyannote diarization 3.0 Increasing embedding batch size in pyannote/speaker-diarization-3.0 Oct 4, 2023
@landEpita
Copy link
Author

Thank you for your response. Do you think we can deploy the model with TensorRT? I also noticed in the code that we can use Nvidia's embedding with Nemo. What are your thoughts on Nvidia's model?

@hbredin
Copy link
Member

hbredin commented Oct 5, 2023

Do you think we can deploy the model with TensorRT?

I see no reason why you could not. I'd love to hear back from you when/if you try!

I also noticed in the code that we can use Nvidia's embedding with Nemo.
What are your thoughts on Nvidia's model?

I only tested NeMo speaker embedding models in the past, and have not played with its diarization capabilities. My conclusion at the time was that NeMo TitaNet was almost on par with speechbrain's embedding. I did not try NeMo recently so there might be a better speaker embedding now...

@hbredin
Copy link
Member

hbredin commented Nov 16, 2023

Closing as it reads like the original questions has been answered.

That being said, I recommend switching to pyannote.audio 3.1 and pyannote/speaker-diarization-3.1 that do support increasing embedding_batch_size.

@hbredin hbredin closed this as completed Nov 16, 2023
@v-nhandt21
Copy link

Dear @hbredin , how can I inference audios in batching parallel in pyannote?

'''model.inference({"waveform": audio, "sample_rate": self.sample_rate})'''

And for confirmation, can pyannote support for embedding_batch_size in 3.1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants